Scraping Social Media to Analyze Public Engagement

Planners are increasingly using social media for public engagement. How can you analyze what people are saying? NodeXL is an easy to use free tool for scraping and analyzing social media. I use this tool to report from the APA National Conference.

3 minute read

April 4, 2016, 9:00 AM PDT

By Jennifer Evans-Cowley @EvansCowley


Every year I report on Twitter activity at the American Planning Association National Conference. With thousands of planners descending on Phoenix, Arizona this weekend, the #APA16 hashtag is already starting to heat up. Throughout the conference myself, @EvansCowley, and @p18holland will be scraping everything that participants tweet about from the conference. In this blog, Patrick Holland and I share how easy it is to scrape and analyze social media.

What is scraping? Scraping is a technique for extracting information from websites. In this case, I am scraping information from Twitter, looking specifically for anyone using the #APA16 hashtag.

How can I scrape? I use an Excel plugin called NodeXL. NodeXL has a free and a paid version. Included here is part of the free version. NodeXL allows you to scrape social media and also allows for visualization of the data.

After you have downloaded NodeXL, you can get started.

Screenshot of NodeXLSelect the NodeXL tab in the excel spreadsheet. Choose import—then the social media network you would like to analyze. We are using Twitter for this conference, but options for Facebook, YouTube, and Flicker are also available. Once "From Twitter Search Network" has been selected you have the option to "Search for tweets that match this query." Attendees at this conference will be tweeting using the hashtag #APA16 so this is the query we want to match. You can also search for tweets associated with a particular user or keywords. The free version will limit you to 2,000 tweets at a time. In our case, we are scraping using NodeXL each day to ensure we don’t miss any tweets.   

NodeXL screenshotNodeXL imports the social media data onto five separate sheets in Excel. One feature I will highlight is if you select an individual tweet (known as a vertex) on the graph, you can see the interactions people had with that tweet. Pre-conference, people showed particular interest in retweeting and favoriting tweets about technology. The most popular tweet as of April 1st was "@medialabOn 4/3, @MITCities presents its CityScope urban planning/simulation tools at #apa16 mitsha.re/ZTl6H".

Once you have imported your scraped data you will find that NodeXL is quite similar to using your regular Excel spreadsheets. Using NodeXL, you will see who sent each tweet, the time of the tweet, whether the tweet was mentioned by other people, any links included in the tweets, and how connected users are to each other. You will be able to create graphs, filter data, and analyze to your hearts content. For example, in the case of @medialab's tweet on simulation tools was retweeted 31 times and favorited 59 times. We can see this was retweeted by @MeagBooth and favorited by @mysidewalkHQ

Tutorials on YouTube help anyone who would like to learn how to use the program. 

Those at the conference: be sure to use the #APA16 hashtag, and stay tuned for the results of this analysis at the end of the conference.


Jennifer Evans-Cowley

Jennifer Evans-Cowley, PhD, FAICP, is the Provost and Vice President for Academic Affairs at th eUniversity of North Texas. Dr. Evans-Cowley regularly teaches courses to prepare candidates to take the AICP exam. In 2011, Planetizen named Cowley as one of the leading thinkers in planning and technology. Her research regularly appears in planning journals, she is the author of four Planning Advisory Service Reports for the American Planning Association, and regularly blogs for Planetizen.

Aeriel view of white sheep grazing on green grass between rows of solar panels.

Coming Soon to Ohio: The Largest Agrivoltaic Farm in the US

The ambitious 6,000-acre project will combine an 800-watt solar farm with crop and livestock production.

April 24, 2024 - Columbus Dispatch

Large blank mall building with only two cars in large parking lot.

Pennsylvania Mall Conversion Bill Passes House

If passed, the bill would promote the adaptive reuse of defunct commercial buildings.

April 18, 2024 - Central Penn Business Journal

Workers putting down asphalt on road.

U.S. Supreme Court: California's Impact Fees May Violate Takings Clause

A California property owner took El Dorado County to state court after paying a traffic impact fee he felt was exorbitant. He lost in trial court, appellate court, and the California Supreme Court denied review. Then the U.S. Supreme Court acted.

April 18, 2024 - Los Angeles Times

Texas

Dallas Surburb Bans New Airbnbs

Plano’s city council banned all new permits for short-term rentals as concerns about their impacts on housing costs grow.

50 minutes ago - FOX 4 News

Divvy Chicago

Divvy Introduces E-Bike Charging Docks

New, circular docks let e-bikes charge at stations, eliminating the need for frequent battery swaps.

1 hour ago - Streetsblog Chicago

Freeway sign with "severe weather - use caution" over multilane freeway in rainy weather.

How Freeway Projects Impact Climate Resilience

In addition to displacement and public health impacts, highway expansions can also make communities less resilient to flooding and other climate-related disasters.

2 hours ago - Transportation for America

News from HUD User

HUD's Office of Policy Development and Research

Call for Speakers

Mpact Transit + Community

New Updates on PD&R Edge

HUD's Office of Policy Development and Research

Write for Planetizen

Urban Design for Planners 1: Software Tools

This six-course series explores essential urban design concepts using open source software and equips planners with the tools they need to participate fully in the urban design process.

Planning for Universal Design

Learn the tools for implementing Universal Design in planning regulations.