Historically, data sources for urban planning have remained relatively stable. Planners relied on a collection of well-known government-produced datasets to do their work, including statistics and geographic layers from federal, state and local sources. Produced by regulatory processes or occasional surveys, the strengths and limitations of these sources are well known to planners and many citizens. However all this is beginning to change. Not only has the U.S. Census Bureau's American Community Survey introduced a bewildering variety of data products, all with margins of error, three interrelated categories of new data are growing rapidly: crowdsourced, private, and "big" data.
First, crowdsourcing projects like Open Street Map and tools like Foursquare have created new maps and place data. Although tapping these sources currently requires technical skills beyond most users' capabilities, it seems likely access will become easier. Although not created through systematic surveys, they can often fill in blanks with higher levels of detail on topics previously difficult to study, such as crises.
Second, private consultants are peddling data to planners, often from proprietary sources and methods. ESRI's business and demographic data are notable examples. The latter is the result of projections from U.S. Census data that cannot be independently calculated. At the American Planning Association (APA) conference in Boston, the marketing firm Buxton advertised their services in the expo. The representative I spoke to admitted their products were similar to other marketing firms which stitch together public and private sources using proprietary techniques for market analysis.
Finally, the much-discussed "big data" may finally be coming to planning. Big data refers to large datasets that usually describe people and places, and are generated from administrative systems, cell phone networks, or other sources. Often created or compiled by the private sector, these datasets require specialized software to analyze and visualize. Some scholars are confident this data will shed new light on existing problems. However, the data are also useful if your aspirations are more prosaic -- illustrating a trend or creating a thematic map.
As consumers of this new data, planners and other public-sector clients are in a position to set expectations for data providers, as well as shed new light on old problems with new sources of information. Three guidelines are below, and I welcome additional discussion in the comments.
Demand transparent, replicable sources and methodologies. If the data will inform a public decision, even indirectly, the public interest demands it come from known sources and methods. Too many private data sources are rife with variables derived from "proprietary methodologies," which public clients can and should demand explanations about. All too often, what is hidden inside the black box is no better than a qualitative guess.
Use crowdsourced data thoughtfully. Although people may trust private data too much, the reverse is true for crowdsourced data. It can often be more accurate and detailed than the "official" sources. Therefore they should be used, but with a special effort to compare them to "known" sources and explain the variations and biases discovered. If no advantage is discovered their use can be abandoned, but often these sources can be amazingly detailed and useful for planners.
Link "big data" sources with planning issues. Too often data wonks are enthralled with the new sources of data, spending hours making pretty maps or visualizations that are interesting but ultimately irrelevant to substantive policy or planning questions. Although there is a place for visual exploratory data analysis for complex data and good design to communicate clearly with stakeholders, planners can play a key roll in these discussions. Instead of dismissing these efforts, urban planners should engage with data analysts and designers, guiding them towards relevant questions and policy issues. The Columbia Spatial Information Design Lab's projects epitomize this approach, mining public data to tell a policy-relevant stories about issues like prison spending or industrial zoning in Manhattan.
What are your favorite new sources? What additional guidelines should planners consider?