Category Archives: Pre-Pitches

Describe and link to your Week 1 data.

Gomez: pre-pitches 2

1. See whether there is a relationship between median household income in New York City and the presence of laundromats in certain areas. Does the availability of laundromats coincide with a particular range of income? Where are they most predominant and why?

I created a spreadsheet of the median household income in each Community District using data from I got the locations of laundromats from

2. See whether there is a relationship between median household income in New York City and the presence of Green Markets in certain areas. Does the availability of these Green Markets coincide with a particular range of income? Where are they most predominant and why?

I created a spreadsheet of the median household income in each Community District using data from I got the locations of laundromats from



Keith Pre-Pitches 2

1. The Center for Responsive Politics, a non-profit political research group, maintains an immense dataset of political finance information. I propose creating a data visualization of the increase and decrease of contributions and lobbying through the biggest issues of 2013 i.e. who was contributing & donating the most money before the financial shutdown, the post-Newtown push for gun legislation, the healthcare roll-out and the stall-out of immigration reform. This would be a tremendous project so alternatively I suggest biting off just one of these issues, preferably one that reemerges in the mainstream in the next month. Lastly I propose visualizing financial disclosure data from US senators. The Huffington Post did a piece about this data in January but they created a print article and all this data is begging to be free.

The Environmental Protection Agency maintains a datasets on air quality, sites that generate pollution and water quality, all likely candidates for visualization. However the EPA also tracks more than 650 different toxic chemicals “that are being used, manufactured, treated, transported, or released into the environment.” I propose mapping this data in the spirit of the wind map we viewed at the beginning of the semester. While this data does not have the amazing potential for animation that the wind map did, I still want to try something.

The Empire Center for New York State Policy maintains government payroll data across the state. I propose visualizing New York City’s municipal employee salaries. This is a particularly immense project but could be interesting watching the money flow.

This last one is just a hypothetical because I don’t know if the data exists but I would like to map out presidential approval ratings throughout history, including their popularity today to see how they hold out over time. So far I have only been able to find gallup polls which only go back to beginning of the century.  But I feel that someone had to be keeping track of public opinion back in the day i.e. old newspaper archives, surveys ect.



Three Department of Buildings data sets: Camilo Gomez

The following three data sets come from NYC Open Data (

The first one is about Department of Buildings license fees:

In it, we can see a table with a list of 25 construction jobs. For each, we find the cost of obtaining a license, the duration of the license and the cost of renewing it. All licenses expire after either three years or one year except for that of a journeyman plumber or journeyman fire suppression piping installer, whose  license ever expires.

With this information it is possible to calculate how much each worker has to pay for license fees over a twelve-year period from the moment that he pays his original fee. Then, it would be possible to create a line graph, each line representing a profession, with dollar amounts in the y-axis and the number of years in the x-axis. This would show how the cost of fees compare in each profession.

The second one is about Department of Buildings complaints received:

It shows the date of each complaint, the type of complaint (the key for each complaint category can be found here: and the street and house number of the complaint. With this information, a map could be created to show which community districts in New York’s five boroughs have had the most complaints, what types of complaints have been the most common during a particular time period (the table’s oldest complaint is from 1989 and the latest from 2013) and compare how the number and type of complaints have varied through time.

The third data set is about Department of Building permits:

These are building permits that were issued in 06/07/2013 but it includes buildings that were begun as early as the 1990s. One could concentrate only on the new constructions (those that were started after the issuance of this permit) plot the sites in a map, and see how many of those correspond with the map of complaints from the previous data set.


Matt MacVey Data Sets

  1. Vacant publicly owned land.
    Vacant private land and tax assessment
    Bill de Blasio has stated that he wants to increase taxes on vacant lots  to encourage building on these sites. Where are these spaces and what effect are they having on their communities. These vacant spaces are often turned into community gardens. Vacant lots  provide a significant portion of the wild habitat that supports biodiversity in New York City. I could talk with people that use community gardens made from similar lots. What species that might lose habitat?
  2. Census Computer and Internet Access in the United States 2012
    Smartphone and internet access data. There is potential for stories about differing access between income levels or states. I think that it could be interesting to find some regions that are being held back by a lack of internet access. I could look at the difference between rural and urban New York state or look at the differences within New York City.
    This new census data is the first to show information about access to smartphones. Some people are looking to smartphones to help with inequality in access to the internet, the digital divide. I want to look at what this new data shows about smartphones and the digital divide and whether having smartphone access has the same economic impact as other types of broadband access.
    Another angle, 25% of households don’t have internet access. What do those households look like?
  3. Starbucks locations.
    I read a piece about a new Starbucks uptown and the economic changes this signifies for commercial renting in the area. (
    De Blasio’s mayorship has put a spotlight on inequality, gentrification, and the economic identity of the outer boroughs. So, what does it mean when a Starbucks arrives in a new part of town? Readers will find out what neighborhoods Starbucks are clustered in, median household income, median commercial rent, how those numbers have changed in the last five or 10 years, and if those neighborhoods seen big demographic shifts.
    I can pull out a few stories to focus on, maybe someplace where a Starbucks closed or a place where Starbucks challenged a local coffee shop.

Willens Data Sets

The musiXmatch dataset

Created in partnership with the Million Song Dataset, a dataset created by the Echo Nest for developers who are looking to create music-related digital tools and apps, this app compiles information about songs spread out across many genres and eras. It is maintained by Columbia University, while the Million Song Dataset is maintained by the Echo Nest and MIT.

Rather than full lists of lyrics for every song, the most important and common words from songs are included and grouped together, allowing researchers to identify broad trends in music.

Food Scrap Drop-Off sites

A list of (primarily greenmarkets) that accept food scrap drop-offs. The city does not maintain all of these sites, but it does house the map at its data portal,

The distribution of these sites ought to be able to tell us a lot about how ideas about composting can travel in communities across New York.

Piracy Data

Data about the year’s Oscar-nominated films and how long they took to leak onto piracy networks. The data’s been compiled and hosted by a guy named Andy Baio, a developer and programmer who’s worked on a variety of projects, including the initial team that built Kickstarter.

I think this data’s interesting because it deals with a topic that’s of ongoing interest in the media industry, it’s got a good peg (the upcoming Academy Awards), and it’s a manageable set that people will understand quickly.

Smiley Data Sets






This data was compiled by ERNS, the Emergency Response Notification System. It provides information for toxic chemical spills and other accidents for 2012, including substance, number of incidents, deaths, hospitalizations, injuries, evacuations, and property damage. It’s interesting because these incidents have been in the news recently- the chemical spill in West Virginia, the toxic ash spill into a North Carolina river, etc. Incidents like this affect everyone because many times they affect drinking water. I think graphing this data could help give more insight into these incidents and possibly lead to a deeper story.








This data was compiled by the Department of Health and includes birth summaries in New York State for 2011 broken down by race and ethnicity. A recent government study found that in 28 states (including NYC), first-time C-sections declined to 21.5% in 2012, from 22.1% in 2009. Since this data includes the method of delivery, it would be interesting to map this out and find out if there is any correlation between method of delivery and race/ethnicity in New York State.








I found this data from NYC Open Data. It’s based on 311 Service Requests from 2010 until the present, so it’s changing every day. It includes exact date & time of complaint, complaint type (water quality or water system, drinking water) and even sometimes includes a description of what is wrong with the water (tastes bitter/metallic, looks cloudy, etc.) I think this data would be interesting into mostly because I think it might show patterns (certain boroughs, neighborhoods, streets having more problems than others, etc.) Analyzing this data could also help when it comes to looking into other data about water in NYC. For example, if complaints from a particular area in Queens keep resurfacing over time, it may be worth looking data about that area’s water system/quality.




NYC DoB Complaints, Healthcare Surveys & Costs, and Transportation Fatalities by Mode

1. The New York City Department of Buildings maintains a dataset that records all complaints made to the department. This particular set covers 2013. The complaints range from malfunctioning elevators, boilers and electrical wiring to unsafe working conditions at construction sites. I think that the most interesting visualization would be a map of complaints regarding vital building hardware.  It would work as a service piece to those that live in these buildings as well as prospective buyers and renters.  Highlighting those with repeated complaints that have been open for an extended period of time could expose negligent landlords.

2. This dataset is maintained by the federal Centers for Medicare and Medicaid Services. It contains the data from the Hospital Consumer Assessment of Healthcare Providers and Systems, a national, standardized survey of hospital patients about their experiences during inpatient hospital stays in 2013.  This is a large dataset that would be difficult to visualize. The best approach would likely be a large map with all of the data with the option to narrow the data down to a specific area or zip code. Would also like to connect this in someway to the voluminous amounts of data present on social media sites like google plus and yelp. Would be a fairly broad service piece that would be interesting to most but espiecally the elderly and chronically ill.

3. This dataset is also maintained by the Center for Medicare and Medicaid Services. This contains data on hospital costs organized by hospital and specific operation throughout 2013. Costs are averaged out by charges paid by patient and costs covered by insurance with a separate cell that provides information on the number of patients. Similarly to the hospital survey data this would probably be best visualized as a large map with all of the data as well as several graphs and charts to highlight the major differences. This would also be a great service piece for those in constant contact with hospitals as well as the average reader. However unlike the survey data this data is going to be much harder to work. How do you accurately break down averaged costs?

4.  These datasets are maintained by the federal National Highway Traffic Safety Agency, the federal Railroad Safety Administration and the National Transportation Safety Board respectively. This data would again be best demonstrated over a large map (they actually have location codes for all the accidents) with the option to filter out methods of transportation and see them individually as well as all at once. This data would also be complimented nicely by some comparative graphs and charts showing the differences between modes of transportation fatalities.
train data
plane data

Hartman Data Sets – All of the Olympics

Data Set 1: Olympic Countries

Beginning in 1998, a group of ten members of the International Society of Olympic Historians from around the world came together to create a database of Olympic-related information. Each member of the team is responsible for a certain data set and maintains it individually before sending it to the editors for final approval. The website is part of the Fox Sports Network. This data is interesting because it breaks down a multitude of Olympic information in one easy table. The data shows when each country began competing in the summer and winter games, their medal counts and number of athletes ever to compete. I think using this data set to highlight how some countries, for example a small country like Andorra, has sent more athletes to the Winter games than other countries with exceedingly larger populations.


Data Set 2: LGBT Representation on TV

Each year GLAAD (Gay & Lesbian Alliance Against Defamation) evaluates LGBT representation on both network and cable television. GLAAD’s entertainment team, led by Associate Director of Entertainment Media, Matt Kane, research and monitor all network television as well as a number of cable channels throughout the year to track the progress of LGBT inclusion in both television and major motion pictures. The data can be found on their website for the past several years and in the included link, the 2013 report, there are references to past years and how the inclusion has risen or fallen, broken down by channel. This would be interested data to display in a more visual manner because the long report makes it difficult to compare network-to-network, cable channel to cable channel etc. The heavy text document is full of information that would be far more digestible if presented in a series of graphics.


Data 3: Olympic Injuries

The British Journal of Sports Medicine published an article following the 2010 Winter Olympics detailing the injuries and illnesses incurred by athletes during the games. The data comes from the 82 National Olympic Committee’s head physicians who were asked to report daily occurrences as well as medical centers in Vancouver and Whistler clinics. I found this data interesting because of the timeliness of it, with the 2014 Winter Olympics beginning this week. Leading up to and during the Olympics, the winter version in particular, the media calls attention to the athletes who cannot return to the games due to injury as well as those who are injured during qualifying rounds etc. If I delve further into this data set, I would compare it to the article published by this journal in 2012, following the Summer Olympics, and compare the amount of injuries during the lastest summer and winter games to see which is more dangerous, and specifically which sports produce the most injuries as well as to what body parts.