Open Datasets

Many datasets are now available or making their way online. This is being driven by a platform-minded tech industry and major government initiatives. It's a massive opportunity for, well, everyone--advertisers, health care providers and researchers, big energy, investors, civic hackers, educators, governments,  startups, etc. McKinsey estimates the potential annual value of open data at $3-5T. For perspective that's 18-30% of the US's GDP in 2013. That's... unbelievable.

Here's a necessarily incomplete collection of datasets and sources to get you started.

Products

  • Quandl - great source of complete and summarized datasets. Like the Google of open data
  • Kaggle competitions and data - a good place to pick up problems, join a team and learn about winning machine learning methods
  • KD nuggets - a portal for data mining, analytics and big data resources
  • http://www.reddit.com/r/datasets/ - subreddit devoted to datasets
  • Enigma.io - feels like the enterprise version of Quandl--better UI and support, though fairly costly after the initial 30-day trial
  • Wolfram|Alpha API

Additional resources
  • KDD Cup - the annual Data Mining and Knowledge Discovery competition. Sort of the world series of data science  
  • Wikipedia - naturally, Wikipedia has a page dedicated to open data with links to data sources and relevant definitions, context, and related controversies
Powered by Create your own unique website with customizable templates.
  James Beveridge
  • Blog
  • Data Science
    • Projects
    • Datasets
    • Useful Links
    • Notes
    • Books
  • Travel
  • Contact
  • Blog
  • Data Science
    • Projects
    • Datasets
    • Useful Links
    • Notes
    • Books
  • Travel
  • Contact