Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-pftt2 Total loading time: 0 Render date: 2024-06-07T04:23:14.936Z Has data issue: false hasContentIssue false

6 - The Value of Big Data for Urban Science

Published online by Cambridge University Press:  05 July 2014

Steven E. Koonin
Affiliation:
New York University
Michael J. Holland
Affiliation:
New York University
Julia Lane
Affiliation:
American Institutes for Research, Washington DC
Victoria Stodden
Affiliation:
Columbia University, New York
Stefan Bender
Affiliation:
Institute for Employment Research of the German Federal Employment Agency
Helen Nissenbaum
Affiliation:
New York University
Get access

Summary

Introduction

The past two decades have seen rapid advances in sensors, database technologies, search engines, data mining, machine learning, statistics, distributed computing, visualization, and modeling and simulation. These technologies, which collectively underpin ‘big data’, are allowing organizations to acquire, transmit, store, and analyze all manner of data in greater volume, with greater velocity, and of greater variety. Cisco, the multinational manufacturer of networking equipment, estimates that by 2017 there will be three networked devices for every person on the globe. The ‘instrumenting of society’ that is taking place as these technologies are widely deployed is producing data streams of unprecedented granularity, coverage, and timeliness.

The tsunami of data is increasingly impacting the commercial and academic spheres. A decade ago, it was news that Walmart was using predictive analytics to anticipate inventory needs in the face of upcoming severe weather events. Today, retail (inventory management), advertising (online recommendation engines), insurance (improved stratification of risk), finance (investment strategy, fraud detection), real estate, entertainment, and political campaigns routinely acquire, integrate, and analyze large amounts of societal data to improve their performance. Scientific research is also seeing the rise of big data technologies. Large federated databases are now an important asset in physics, astronomy, the earth sciences, and biology. The social sciences are beginning to grapple with the implications of this transformation. The traditional data paradigm of social science relies upon surveys and experiments, both qualitative and quantitative, as well as exploitation of administrative records created for non-research purposes. Well-designed surveys generate representative data from comparatively small samples, and the best administrative datasets provide high-quality data covering a total population of interest. The opportunity now presents to understand how these traditional tools can be complemented by large volumes of ‘organic’ data that are being generated as a natural part of a modern, technologically advanced society. Depending upon how sampling errors, coverage errors, and biases are accounted for, we believe the combination can yield new insights into human behavior and social norms.

Type
Chapter
Information
Privacy, Big Data, and the Public Good
Frameworks for Engagement
, pp. 137 - 152
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Hays, Constance L., “What Wal-Mart Knows About Customers’ Habits,” The New York Times, November 14, 2004
King, G., “Ensuring the Data-Rich Future of the Social Sciences,” Science 331, no. 6018 (2011): 719–721CrossRefGoogle ScholarPubMed
Groves, Robert M., “Three eras of survey research,” Public Opinion Quarterly 75, no. 5 (2011): 861–871CrossRefGoogle Scholar
Couper, Mick P., Singer, Eleanor, Conrad, Frederick G., and Groves, Robert M., “Experimental Studies of Disclosure Risk, Disclosure Harm, Topic Sensitivity, and Survey Participation,” Journal of Official Statistics 26, no. 2 (2010): 287–300Google ScholarPubMed
Making Open and Machine Readable the New Default for Government Information, 78 FR 28111, May 14, 2013
Manyika, James, Chui, Michael, Farrell, Diana, Van Kuiken, Steve, Groves, Peter, and Almasi Doshi, Elizabeth, Open Data: Unlocking Innovation and Performance with Liquid Information (McKinsey Global Institute, October 2013)Google Scholar
The World Factbook 2013–14 (Washington, DC: Central Intelligence Agency, 2013)
Batty, M., Axhausen, K. W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M., Ouzounis, G., and Portugali, Y., “Smart Cities of the Future,” European Physical Journal – Special Topics 214 (2012): 481–518CrossRefGoogle Scholar
Bettencourt, Luís M. A., Lobo, José, Helbing, Dirk, Kühnert, Christian, and West, Geoffrey B., “Growth, Innovation, Scaling, and the Pace of Life in Cities,” PNAS 104, no. 17 (2007): 7301–7306CrossRefGoogle ScholarPubMed
Bettencourt, L., Lobo, J., and Strumsky, D., “Invention in the City: Increasing Returns to Patenting as a Scaling Function of Metropolitan Size,” Research Policy 36 (2007): 107–120CrossRefGoogle Scholar
Gonzalez, Marta C., Hidalgo, Cesar A., and Barabasi, Albert-Laszlo, “Understanding Individual Human Mobility Patterns,” Nature 453, no. 5 (2008): 779–782CrossRefGoogle ScholarPubMed
Wang, P., Hunter, T., Bayen, A. M., Schechtner, K., and Gonzalez, M. C., “Understanding Road Usage Patterns in Urban Areas,” Scientific Reports 2 (2012)CrossRefGoogle ScholarPubMed
Giannetsos, T., Dimitriou, T., and Prasad, N. R., “People-centric Sensing in Assistive Healthcare: Privacy Challenges and Directions,” Security and Communication Networks 4 (2011): 1295–1307CrossRefGoogle Scholar
Farabet, Clément, Couprie, Camille, Najman, Laurent, and LeCun, Yann, “Learning Hierarchical Features for Scene Labeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8 (2013): 1915–1929CrossRefGoogle ScholarPubMed
Briffault, Richard, “A Government for Our Time? Business Improvement Districts and Urban Governance,” Columbia Law Review 99, no. 2 (1999): 365–477CrossRefGoogle Scholar
Yang, Xiaojun, Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment (Hoboken, NJ: Wiley-Blackwell, 2011)CrossRefGoogle Scholar
Buckingham Shum, S. et al., “Towards a Global Participatory Platform,” European Physical Journal – Special Topics 214 (2012): 109–152CrossRefGoogle Scholar
Dunn, Erica H. et al., “Enhancing the Scientific Value of the Christmas Bird Count,” The Auk 122 (2005): 338–346CrossRefGoogle Scholar
Maisonneuve, Nicolas, Stevens, Matthias, and Ochab, Bartek, “Participatory Noise Pollution Monitoring using Mobile Phones,” Information Polity 15 (2010): 51–71Google Scholar
Butt, Nathalie, Slade, Eleanor, Thompson, Jill, Malhi, Yadvinder, and Riutta, Terhi, “Quantifying the Sampling Error in Tree Census Measurements by Volunteers and Its Effect on Carbon Stock Estimates,” Ecological Applications 23, no. 4 (2013): 936–943CrossRefGoogle ScholarPubMed
Kanhere, Salil S., “Participatory Sensing: Crowdsourcing Data from Mobile Smartphones in Urban Spaces,” in Distributed Computing and Internet Technology, 19–26 (Berlin: Springer, 2013)CrossRefGoogle Scholar
Capps, C. and Wright, T., “Toward a Vision: Official Statistics and Big Data,” Amstat News, August 1, 2013
National Research Council, Frontiers in Massive Data Analysis (Washington, DC: The National Academies Press, 2013)Google Scholar
Sweeney, Latanya, “K-anonymity: A Model for Protecting Privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, no. 5 (2002): 557–570CrossRefGoogle Scholar
Zhang, Xuyun, Liu, Chang, Nepal, Surya, Pandey, Suraj, and Chen, Jinjun, “A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of Intermediate Datasets in Cloud,” IEEE Transactions on Parallel and Distributed Systems 24, no. 6 (2013): 1192–1202CrossRefGoogle Scholar
Ferreira, N., Poco, J., Vo, H. T., Freire, J., and Silva, C. T., “Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips,” IEEE Transactions on Visualization and Computer Graphics 19, no. 12 (2013): 2149–2158CrossRefGoogle ScholarPubMed
Dasgupta, Aritra and Kosara, Robert, “Privacy-Preserving Data Visualization Using Parallel Coordinates,” in Proc. Visualization and Data Analysis (VDA), 78680O-1–78680O-12 (International Society for Optics and Photonics, 2011)
Chui, Michael, Farrell, Diana, and Van Ku, Steve, “Generating Economic Value through Open Data,” in Beyond Transparency: Open Data and the Future of Civic Innovation, ed. Goldstein, Brett and Dyson, Lauren (San Francisco, CA: Code for America Press, 2013), 169Google Scholar
Kamal Dankar, Fida, El Emam, Khaled, Neisa, Angelica, and Roffey, Tyson, “Estimating the Re-identification Risk of Clinical Data Sets,” BMC Medical Informatics & Decision Making 12, no. 1 (2012): 66–80CrossRefGoogle Scholar
Climate Change 2007: Synthesis Report. Contribution of Working Groups I, II and III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, ed. Core Writing Team, Pachauri, R. K., and Reisinger, A. (Geneva: IPCC, 2007)
Dawes, S. S. and Helbig, N., “Information Strategies for Open Government: Challenges and Prospects for Deriving Public Value from Government Transparency,” in Electronic Government, ed. Wimmer, M. A. et al., Lecture Notes in Computer Science 6228 (Berlin: Springer, 2010), 50–60CrossRefGoogle Scholar
World Population Prospects: The 2010 Revision, Volume I: Comprehensive Tables, ST/ESA/SER.A/313 (United Nations, Department of Economics and Social Affairs, Population Division, 2011)
Jones, D. R., “Protecting the Treasure: An Assessment of State Court Rules and Policies for Access to Online Civil Court Records,” Drake Law Review 61 (2013): 375Google Scholar
Porter, Theodore M., Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton, NJ: Princeton University Press, 1996)CrossRefGoogle Scholar
Flood, Joe, The Fires: How a Computer Formula, Big Ideas, and The Best of Intentions Burned Down New York City—and Determined the Future of Cities (New York: Riverhead Books, 2010)Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×