All posts by Ross Keith

Festival of Data – Keith

USA Today – The Ones That Get Away

If you scroll down a little bit you can find an interactive map of arrest warrant data. I really couldn’t figure out how I felt about this visualization, but I wanted to talk about it.

There appears to be some excellent data here and the drilldown is pretty slick. The information compliments the piece excellently and there is tons of data to play with. However, the starting point is blank and that really bugged me.

It’s just this empty slate until you input some information into it and I think this visualization loses a great deal of impact because of this. Could just be me.

Keith Pre-Pitches 2

1. The Center for Responsive Politics, a non-profit political research group, maintains an immense dataset of political finance information. I propose creating a data visualization of the increase and decrease of contributions and lobbying through the biggest issues of 2013 i.e. who was contributing & donating the most money before the financial shutdown, the post-Newtown push for gun legislation, the healthcare roll-out and the stall-out of immigration reform. This would be a tremendous project so alternatively I suggest biting off just one of these issues, preferably one that reemerges in the mainstream in the next month. Lastly I propose visualizing financial disclosure data from US senators. The Huffington Post did a piece about this data in January but they created a print article and all this data is begging to be free.

The Environmental Protection Agency maintains a datasets on air quality, sites that generate pollution and water quality, all likely candidates for visualization. However the EPA also tracks more than 650 different toxic chemicals “that are being used, manufactured, treated, transported, or released into the environment.” I propose mapping this data in the spirit of the wind map we viewed at the beginning of the semester. While this data does not have the amazing potential for animation that the wind map did, I still want to try something.

The Empire Center for New York State Policy maintains government payroll data across the state. I propose visualizing New York City’s municipal employee salaries. This is a particularly immense project but could be interesting watching the money flow.

This last one is just a hypothetical because I don’t know if the data exists but I would like to map out presidential approval ratings throughout history, including their popularity today to see how they hold out over time. So far I have only been able to find gallup polls which only go back to beginning of the century.  But I feel that someone had to be keeping track of public opinion back in the day i.e. old newspaper archives, surveys ect.



Willens/Keith Pitch – Pirate Bay Filled with Oscar Gold

Pirate Bay Filled with Oscar Gold

Arrr you killing the film industry

Ross Keith & Max Willens

Over the past decade, the Academy of Motion Picture Arts and Sciences – the body that presides over the Oscars – has been fighting against online piracy. However in the past eleven years 62% of films  sent to Academy members for consideration have appeared on sites like The Pirate Bay weeks and even months before their official release dates.

The Motion Picture Association of America has stated that the film industry employs over two million people  and provides $104 billion dollars in wages in the United States.  Estimates by the organization put online piracy costs the film industry at more than $20 billion per year. While that number has been questioned by some experts the MPAA spent at least $2.2 million on lobbying efforts in 2013 alone.

The economic structure of the film industry is centered around the overwhelming success of the blockbusters, overwhelming profitable films which account for a majority of industry revenue. However since most of these films end up nominated for Oscars they are usually pirated before official release dates.

We propose creating a visualization that displays how quickly Oscar nominated films have been pirated over the past eleven years. We have not determined how this will best be visualized but our early discussion has leaned towards a interactive bubble chart, filterable along data points.

During our initial examination of the data it appears that the average number of days until the screener leaked has gone through two distinct slides. The averages in 2008 and 2014 were about three weeks until leak compared too two months in 2005 and 2011.

We have also obtained data on how frequently the most pirated films have been available on legal streaming sites like Netflix, Hulu and Amazon Video. The vast majority of frequently pirated films are not available legally leading to the conclusion that piracy may stem from availability rather then cost or malice. While we are not certain if we will include this dataset in our final visualization it does seem like an appropriate conclusion to our data narrative.

Our deadline is set for after the Oscar award ceremony and consequently the end of awards season. This will allow us to gather a complete dataset on film leaks for 2014 as well allow us to piggy back on Oscar related buzz.

The above links to a spreadsheet showing how quickly Academy Award-nominated films appeared on torrent sites dating back to 2002. It’s maintained by Andy Baio, a web developer and programmer who’s worked on a variety of platforms and projects over the years.

The second links to data on frequently pirated movies availability on legal streaming services. The site is maintained by Jerry BritoEli Dourado, and Matt Sherman and the data is collected from sites TorrentFreak and Can I Stream It.


Nobody’s gotten back to us yet, but we plan on contacting the Academy of Motion Picture Arts and Sciences and the Motion Picture Association of America, as well as the editors of TorrentFreak, a website with news and insights into the news related to torrenting across the globe.

The Academy of Motion Picture Arts and Sciences

Press Team

Natalie Kojen

Emily Benedict

Gail Silverman


Vice President, Corporate Communications


Media Contact, MPAA Washington D.C

Andy Baio

NYC DoB Complaints, Healthcare Surveys & Costs, and Transportation Fatalities by Mode

1. The New York City Department of Buildings maintains a dataset that records all complaints made to the department. This particular set covers 2013. The complaints range from malfunctioning elevators, boilers and electrical wiring to unsafe working conditions at construction sites. I think that the most interesting visualization would be a map of complaints regarding vital building hardware.  It would work as a service piece to those that live in these buildings as well as prospective buyers and renters.  Highlighting those with repeated complaints that have been open for an extended period of time could expose negligent landlords.

2. This dataset is maintained by the federal Centers for Medicare and Medicaid Services. It contains the data from the Hospital Consumer Assessment of Healthcare Providers and Systems, a national, standardized survey of hospital patients about their experiences during inpatient hospital stays in 2013.  This is a large dataset that would be difficult to visualize. The best approach would likely be a large map with all of the data with the option to narrow the data down to a specific area or zip code. Would also like to connect this in someway to the voluminous amounts of data present on social media sites like google plus and yelp. Would be a fairly broad service piece that would be interesting to most but espiecally the elderly and chronically ill.

3. This dataset is also maintained by the Center for Medicare and Medicaid Services. This contains data on hospital costs organized by hospital and specific operation throughout 2013. Costs are averaged out by charges paid by patient and costs covered by insurance with a separate cell that provides information on the number of patients. Similarly to the hospital survey data this would probably be best visualized as a large map with all of the data as well as several graphs and charts to highlight the major differences. This would also be a great service piece for those in constant contact with hospitals as well as the average reader. However unlike the survey data this data is going to be much harder to work. How do you accurately break down averaged costs?

4.  These datasets are maintained by the federal National Highway Traffic Safety Agency, the federal Railroad Safety Administration and the National Transportation Safety Board respectively. This data would again be best demonstrated over a large map (they actually have location codes for all the accidents) with the option to filter out methods of transportation and see them individually as well as all at once. This data would also be complimented nicely by some comparative graphs and charts showing the differences between modes of transportation fatalities.
train data
plane data