Tuesday, November 01, 2016

Data-Driven Everything

1940 US Census. Source: Census.gov

My boyfriend wears a Fitbit so regularly that once, scrolling through my Facebook newsfeed, I mistook a friend for him—all I had seen, with half the picture cut off, was an arm and the grey, Flex-model Fitbit on the wrist.

I view the thing as half object of intrigue, half handcuff: while the data it collects (on everything from steps walked to sleep patterns) is interesting, it seems like such a lot of work to scroll through it all.

I admit that I’m a hypocrite in saying this, though. My phone’s built-in Samsung Health app counts the steps I’ve walked and can measure my heart rate. With various other tracker apps, you can note menstrual cycles, food consumption, the number of liters of water you drink in a day...it goes on. It would seem that if it exists, it can be measured.

On a larger scale, this love affair with data—what Berkeley geography graduate students Camilla Hawthorne and Brittany Meché termed “fetishized numeration” in their Space & Society article—is visible in corporate, academic, and policy circles. At UC Berkeley, Chancellor Dirks wrote in March that “Across all of higher education, faculty and administrators are increasingly recognizing the need to treat data literacy as a core competency for liberal education.” In an older article, a campus Electrical Engineering and Computer Science professor was more blunt: “There has been massive growth in job opportunities in data-science-related areas…and a shortage of people prepared to fill them, according to Culler.” Dirks’ language of data as a “core competency for liberal education” disguises the perhaps more pressing motive that Culler’s statement illuminates: market demand for data exists, and the university needs to fill it.

William Deresiewicz has written a lovely article entitled “The Neoliberal Arts” about how “college sold its soul to the market,” but that’s actually not my argument here (Deresiewicz does it better).

My concern is, instead, what we we lose when we treat quantitative data as our preeminent means of knowing things about the world.

I worry about this because people seem to gush a lot about things with the words “data-driven” placed in front of them, whether decision-making or teaching or journalism or policy. We talk about “data” as though it possesses magical qualities of complete rationality and objectivity. After all, how could numbers be wrong?

NYTimes profiled Kate Crawford, a visiting MIT professor and researcher at Microsoft Research; she criticized “Big Data fundamentalism—the idea with larger data sets, we get closer to objective truth.” In one example she provided, even something like analyzing the millions of tweets following Hurricane Sandy could provide biased data (since Twitter users tend to be younger and more affluent than the general population affected). Further, she added that “Big Data is neither color blind nor gender blind…Facebook timelines, stripped of data like names, can still be used to determine a person’s ethnicity with 95 percent accuracy.” (Indeed, ProPublica recently published a piece about Facebook using their “Ethnic Affinity” data to give advertisers the option to restrict who viewed their ads—a potential violation of the Fair Housing Act.)

“Ethnic Affinity” is only a recent inheritor of a long history of politically charged data. The late scholar of South Asia Bernard Cohn did extensive work on the first Census conducted by the British in India, pointing out that their Census had a mercantile, extractive goal—after all, counting the subjects of a state is a prerequisite for taxing them. If the British Census in India, Kate Crawford’s example of analyzing tweets, or the use of Facebook’s Ethnic Affinity by advertisers, all serve as any indicator, data is rarely objective: neither in its motives, collection, nor analysis.

But what if we lived in a happy utopia—one of both objective data and objective data analysts? There’s still a problem with privileging one form of knowledge production because of its perceived objectivity and rationality: it denigrates other academic fields. And the fields that my CS major friends might describe as “hand-wavey” are, incidentally, also fields that are heavily populated by women. The ranks of your average Anthropology or English class are very different from those of your average CS or Mathematics class. In 2014, when Berkeley offered its inaugural online data science master’s program, 78% of the course’s students were male (Daily Cal). Certainly people like my data science class’s professor and others at Berkeley are making admirable efforts resulting in tangible change (a little over than half of my intro to data science class is female).

But even if the arbiters of data are increasingly members of underrepresented groups, the issue of discrimination against certain forms of knowledge remains. There’s a clear bifurcation of disciplines into those we think of, implicitly or explicitly, as “feminine” or “masculine.” It’s something that you can witness every time you turn on the evening news, with its lineup of “hard news”—the talk of war and death, money and politics. But take a look at women’s magazines and websites, and it’s often a different set of stories. XOJane has an entire section called “It Happened To Me”: personal narratives. New York Magazine’s The Cut: a weekly feature called Sex Diaries, in which people (men and women alike) submit anonymized documentation of a week’s worth of sexual exploits. Increasingly, however, there’s cross-over—magazines like Cosmopolitan, once better known for aspirational sex positions, are covering “hard news” (documented in this Vox article, “Don’t Underestimate Cosmo: Women’s Magazines Are Taking On Trump”). And papers like the New York Times, with its stiff “All the News That’s Fit to Print,” now feature columns like Modern Love and Campus Lives that put personal narratives, not coldly outlined facts and figures, front and center.

I see this as progress. Plenty of feminist writers like Jessica Valenti use their personal experiences to illuminate global problems, but the language of memory and the personal as a language of knowledge production is not reserved for women alone. Personal narratives should not be considered the stuff of “women’s issues” any more than the 2016 election should be considered not a “women’s issue.” Consider PostSecret (which collects postcards from around the world with secrets depicted on them, posting them weekly), Story Corps (a non-profit project aiming to record stories from Americans of all backgrounds), Moth Radio Hour (a weekly series featuring true stories told live on-stage). One of my favorite Medium articles, a haunting piece with the title “You’re 16. You’re a Pedophile. You Don’t Want to Hurt Anyone. What Do You Do Now?” came from an amazing series called Matter. Matter articles talk about big issues, like “The Racism Beat: What it’s like to write about hate over and over and over,” or “Living and Dying on Airbnb: My dad died in an Airbnb rental, and he’s not the only one. What can the company do to improve safety?”

Notice anything?

These articles all provide knowledge that is grounded in the personal—in story, in life, in memory. 

Not numbers, tables, and scatterplots.

I’m not saying that we don’t need quantitative data. We do. But putting it on a pedestal, ignoring and belittling personal narratives or ethnographies or literary analyses, ignores everything that can’t be quantified.

Take Colorado State University anthropologist Jeffrey Snodgrass’s article “A Tale of Goddesses, Money, and Other Terribly Wonderful Things: Spirit Possession, Commodity Fetishism, and the Narrative of Capitalism in Rajasthan, India” as an example. It told the story of a young mother named Bedami and her husband Ramu. The story started with Bedami’s possession by a goddess (with this being her community’s understanding of her condition). That story unfolded in parallel to an exploration of Ramu’s rejection of the community’s traditional livelihood and norms; he had chosen to take a salaried job, open a bank account, and ordered Bedami to undergo sterilization because of the worry of the expense of too many children. But this parsimonious behavior meant that peers viewed him as insufferably stingy and a traitor to his community. The community concluded that her possession had occurred at least in part due to his miserliness and rejection of tradition. Most of this narrative would go unseen if not for the qualitative information of Snodgrass’s laborious ethnography. The job, bank account, and sterilization might become faceless numbers swimming about in some massive pool, but the real impact of the rocky incursion of modernity in Bedami’s community, and on individuals’ lived experiences, would be rendered invisible.

If you’re a policy-maker trying to get more Indians to sign up for bank accounts (a real priority of the current government, which in August 2014 launched the Pradhan Mantri Jan Dhan Yojana scheme to increase bank account penetration) you need to see people like Bedami and Ramu, not just the numbers of the latest World Bank report, to make effective policy. 

The information that comes from documents like personal narratives and ethnographies is often our only window into worlds that are too fraught to speak of in terms of big data. Where do you get statistics on things like pedophilia or goddess possession? Who answers the polls, or tweets, or picks up the phone, to talk about those topics?

Not everything can be quantified, and that’s a good thing.

I said at the beginning that this was an article about quantitative data, but really, this is a plea for humility. The idea that any one discipline has a monopoly on the truth is highly dangerous. Worshipping at the feet of gods we build out of numbers and code is no better than worshipping at the feet of the gods we imagine. (I’m reminded of that classic Dumbledore line: “Of course it is happening inside your head, Harry, but why on earth should that mean that it is not real?”)

Scientists and engineers who believe that their fields lend them omniscience make bad things happen: just Google “scientific racism,” “Guatemala syphilis experiment,” or take a look at the current news about Standing Rock, where engineers seem willfully ignorant of Native American history in their aptness to dismiss the protestors' cause. These all should serve as reminders that our world is much, much better off when scientists and engineers learn from, and believe in the value of, fields like the humanities and social sciences.

And right now, that equality has to start with revising how we look at numbers.


  1. Some of your CS-major friends might come round once they start the modules on Human-Computer Interaction (HCI). While some of it is sort-of quantifiable (e.g. Fitts's law applied to button sizes on control panels), a lot of it involves more qualitative reasoning, and paying attention to the experiences of individual users. Look up "cognitive dimensions of notations" and (especially) "Design for All in ICT" to see what I mean. Even some very data-science oriented researchers will listen to personal stories as well, when making high-level decisions about what to do with that data. I once shared an office with a face-recognition researcher called Rana El-Kaliouby, and told her of the frustrations that some blind and partially-blind people can have speaking to an audience without seeing their facial expressions; I said I needed a boredom detector. Rana extended that idea into an emotional "hearing aid" for autistic children. She went on to MIT, started a company and got into the New York Times top inventions list. She did a lot of data science matching up visual expressions with emotions so as to program a computer to try to recognise them (and she found nothing can possibly get it right 100% of the time, not even a human, but nevertheless she managed to build something that gets it right often enough to be useful), but let's not forget how she listened to individual stories when figuring out what to DO with that data science. (Somewhere here is a Pixar "Inside Out" script waiting to be written, but I digress.) Just thought you might find it encouraging that some do know how to keep data in its proper place.

  2. I love your writing because it comes in bursts and is so grounded in the ideas of others (well understood and explained at that) but then you head off on a new tangent. Life is made of tangents, you know.

    But where we diverge, you and I, is in what seems to be your idea that data has and will retain context and its usefulness comes from that while I see all around me data for data's sake -- data that leads to Watsonian conclusions supported only by statistically-derived algorithms we can't even write, much less explain.

    The Google Brain can predict criminal recidivism with 90 percent accuracy based solely on brain MRIs yet the Google Brain-keepers have no idea what it is that their software is picking-up on. So who do we trust?

    Just something to think about while you stir your coffee.

  3. Really, this is a plea for humility!