Big Data for Health: Promises & Perils

As described by Watched: A Wall Street Journal Privacy Report [1], big data is becoming ubiquitous, providing scientists, politicians and commercial entities alike data about us that we never thought they’d know. From the phones in our pockets to the trails we leave as we browse the web, our data provides information about our locations, our habits, our friends, our hobbies, our media preferences, and even our body movements (don’t forget the GPS capabilities built into your smartphone!) In the words of Danah Boyd and Kate Crawford [2], this explosion of data may “usher in new wave of privacy incursion.” But it may also usher in a new era of scientific research and approaches to complex biological and societal problems.

“Big Data is seen as a powerful tool to address various societal ills, offering the potential of new insights into areas as diverse as cancer research, terrorism, and climate change. On the other, Big data is seen as a troubling manifestation of Big Brother, enabling invasions of privacy, decreased civil freedoms, and increased state and corporate control” (Boyd & Crawford, 2012).

A 2009 Whitepaper by researchers at The Media Lab at MIT [3] describes “reality mining,” or the act of pulling together the digital traces that people leave as they go about their daily lives, equipped with their smartphones, to study public health. Reality mining “can provide new opportunities with respect to diagnosis, patient and treatment monitoring, health services planning, surveillance of disease and risk factors, and public health investigation and disease control” (2009, p. 1). Most of this reality mining is based upon sensors embedded into our mobile phones. These sensors, including accelerometers and GPS units, can theoretically detect and measure our physical activity, our body movements, even our conversational cadences – MIT researcher Max Little is using mobile phone GPS sensors and voice algorithms to detect Parkinson’s disease with smartphones [4]. In other applications, diagnostic data based on the tone of our voices may be able to detect depression, as depressed persons tend to speak more slowly.

Do these valuable public health applications trump privacy issues associated with collecting medical information through our smartphones? As the authors point out, it will be important in the future that, despite the promise of reality mining for public health, behavior-logging through smartphone technologies is not forced on individuals. It will be important to make sure such technology is used within an appropriate legal and ethical framework. For example, what if during behavior-logging through mobile phone sensors for health applications, other privacy-infringing data is collected? How would researchers treat and secure this data? These are questions that must be addressed in the near future.

“Reality mining of behavior data is just beginning. In the near future it may be common for smart phones to continuously monitor a person’s motor activity, social interactions, sleep patterns, and other health indicators. The system’s software can use these data to build a personalized profile of an individual’s physical performance and nervous system activation throughout the entire day. If these rich data streams were combined with self-reports and personal health records, including medical tests and taken and the medicines prescribed, there is the possibility of dramatic improvements in health care” (Pentland, Lazer, Brewer & Heibeck, 2009, p. 11-12).

The authors suggest putting all of this data in the hands of the mobile phone user him- or herself in order to handle privacy concerns. But just the fact that this data might be continuously collected throughout the day, even when users are not fully aware of their activities, is potentially worrisome with regards to privacy concerns. Users might not fully understand the implications of releasing this data to outside entities – just as users in the past have had concerns about genetic profiling only after they allowed their genomes to be sequenced, for example.

Perhaps a safer approach is for scientists to combine “reality mining” data into large anonymous datasets for broad public good, as opposed to analyzing individual-level data with privacy concerns. However, as the authors point out, current legal statutes surrounding reality mining for public health applications “are lagging far behind our data collection capabilities, making it particularly important to begin discussing how this technology will and should be used” (2009, p. 12). Strict ethical research regulations should also be imposed on reality mining research for public health applications, just as regulations have been imposed on social science research in the past.

Do you have suggestions for how "reality mining" data could be harnessed for public health research while preserving individual privacy? Please comment below.


[2] Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662-679.

[3] Pentland, A., Lazer, D., Brewer, D., & Heibeck, T. (2009). Using reality mining to improve public health and medicine. Stud Health Technol Inform, 149, 93-102.

[4] Max Little: A test for Parkinson’s with a phone call, http://www.ted.com/talks/max_little_a_test_for_parkinson_s_with_a_phone_call.html