Open Datasets
Many datasets are now available or making their way online. This is being driven by a platform-minded tech industry and major government initiatives. It's a massive opportunity for, well, everyone--advertisers, health care providers and researchers, big energy, investors, civic hackers, educators, governments, startups, etc. McKinsey estimates the potential annual value of open data at $3-5T. For perspective that's 18-30% of the US's GDP in 2013. That's... unbelievable.
Here's a necessarily incomplete collection of datasets and sources to get you started.
Here's a necessarily incomplete collection of datasets and sources to get you started.
Products
- Quandl - great source of complete and summarized datasets. Like the Google of open data
- Kaggle competitions and data - a good place to pick up problems, join a team and learn about winning machine learning methods
- KD nuggets - a portal for data mining, analytics and big data resources
- http://www.reddit.com/r/datasets/ - subreddit devoted to datasets
- Enigma.io - feels like the enterprise version of Quandl--better UI and support, though fairly costly after the initial 30-day trial
- Wolfram|Alpha API
Additional resources