Willens Data Sets

The musiXmatch dataset

Created in partnership with the Million Song Dataset, a dataset created by the Echo Nest for developers who are looking to create music-related digital tools and apps, this app compiles information about songs spread out across many genres and eras. It is maintained by Columbia University, while the Million Song Dataset is maintained by the Echo Nest and MIT.

Rather than full lists of lyrics for every song, the most important and common words from songs are included and grouped together, allowing researchers to identify broad trends in music.

Food Scrap Drop-Off sites

A list of (primarily greenmarkets) that accept food scrap drop-offs. The city does not maintain all of these sites, but it does house the map at its data portal, nyc.gov/data.

The distribution of these sites ought to be able to tell us a lot about how ideas about composting can travel in communities across New York.

Piracy Data

Data about the year’s Oscar-nominated films and how long they took to leak onto piracy networks. The data’s been compiled and hosted by a guy named Andy Baio, a developer and programmer who’s worked on a variety of projects, including the initial team that built Kickstarter.

I think this data’s interesting because it deals with a topic that’s of ongoing interest in the media industry, it’s got a good peg (the upcoming Academy Awards), and it’s a manageable set that people will understand quickly.