All posts by Max Willens

Keith-Willens Oscar Storyboard



We plan on creating a column chart with drill-down. The first layer will be organized along the X axis by year and the average number of days it takes for an oscar nominated film to leak along the Y axis. Clicking on a column will zoom in to the specifics information from each year. The X axis will change to the names of nominated films. Depending on the difficulty working with the code we could create separate columns by leak method. The Y would change into a measure of days until leak although it would still essentially remain a day counter. This chart could also potentially work as a filterable scatter/bubble chart. The pie chart would show how often each leak method is first. Also depending on fair use, maybe a little oscar guy on the side.

Story Update: We found that more than 60% of the time, the oscar screener is the first leak for nominated films. Research has shown that the average number of days until leak has fluctuated so widely due to the varying levels of enforcement from the MPAA. For several years Oscar screeners were sent with special DVD players that could only be watched in certain places, a certain number of times or not sent at all. However the process seems changes to change randomly.

We contacted Andy Baio, the owner of the data-set who explained that he began the database after reading several press releases by the MPAA discussing the leak of oscar screeners. Baio was surprised by the attention these leaks were receiving because they happen all the time. So he found the data to demonstrate how often leaks happen.

Over the past decade, the Academy of Motion Picture Arts and Sciences – the organization that presides over the Oscars – has been waging war against online piracy.

It is a war that they have been losing.

Since 2003, over 60 percent of the films  sent to Academy members for consideration have appeared on sites like The Pirate Bay weeks, sometimes months, before they become available for sale or  legal stream.

The Motion Picture Association of America has stated that the film industry employs over two million people  and provides $104 billion dollars in wages in the United States.  It also estimates that online piracy costs the film industry more than $20 billion per year.

That number has been called into question by some experts, but there is no denying that the movie industry’s biggest hits have a way of winding up on file-sharing networks.

Hollywood depends on the overwhelming success of its blockbusters to buoy the rest of the industry. But since most of these films end up nominated for Oscars they are usually pirated before official release dates.

We propose creating a visualization that displays how quickly Oscar nominated films have been pirated over the past eleven years.

During our initial examination of the data it appears that the average number of days until the screener leaked has gone through two distinct slides. The averages in 2008 and 2014 were about three weeks until leak compared too two months in 2005 and 2011.

We have also obtained data on how frequently the most pirated films have been available on legal streaming sites like Netflix, Hulu and Amazon Video. The vast majority of frequently pirated films are not available legally leading to the conclusion that piracy may stem from availability rather then cost or malice. While we are not certain if we will include this dataset in our final visualization it does seem like an appropriate conclusion to our data narrative.

Our deadline is set for after the Oscar award ceremony and consequently the end of awards season. This will allow us to gather a complete dataset on film leaks for 2014 as well allow us to piggy back on Oscar related buzz.

Willens Data Sets

The musiXmatch dataset

Created in partnership with the Million Song Dataset, a dataset created by the Echo Nest for developers who are looking to create music-related digital tools and apps, this app compiles information about songs spread out across many genres and eras. It is maintained by Columbia University, while the Million Song Dataset is maintained by the Echo Nest and MIT.

Rather than full lists of lyrics for every song, the most important and common words from songs are included and grouped together, allowing researchers to identify broad trends in music.

Food Scrap Drop-Off sites

A list of (primarily greenmarkets) that accept food scrap drop-offs. The city does not maintain all of these sites, but it does house the map at its data portal,

The distribution of these sites ought to be able to tell us a lot about how ideas about composting can travel in communities across New York.

Piracy Data

Data about the year’s Oscar-nominated films and how long they took to leak onto piracy networks. The data’s been compiled and hosted by a guy named Andy Baio, a developer and programmer who’s worked on a variety of projects, including the initial team that built Kickstarter.

I think this data’s interesting because it deals with a topic that’s of ongoing interest in the media industry, it’s got a good peg (the upcoming Academy Awards), and it’s a manageable set that people will understand quickly.