Online Data Sets for Machine Learning

Here is a list of publicly available data sets that can be used for self-training in Machine Learning.

 

US Government Open Data http://www.data.gov
US Census Bureau http://www.census.gov/
UC Irvine Machine Learning Repository http://archive.ics.uci.edu/ml/, http://archive.ics.uci.edu/ml/datasets.html, http://www.sgi.com/tech/mlc/db/ (archive)
John Hopkins University http://www.biostat.jhsph.edu/courses/bio624/datasets/datasets.htm
Microsoft Azure Marketplace http://datamarket.azure.com/browse/Data
Amazon Public Data Sets http://aws.amazon.com/public-data-sets/
DataMarket http://datamarket.com/
World Bank http://data.worldbank.org/
Kaggle.com http://kaggle.com/, particularly http://inclass.kaggle.com/
National UFO Reporting Center Just for fun… http://nuforc.org/, UFO report database at http://nuforc.org/webreports.html

 

If you are aware of any other interesting data sets, please suggest in the comments.  I will add it to the list.

Leave a Reply