Free Data sets for machine learning practice

There are many places data sets are available for free, I have collected following links over the periods, many of them for learning purpose, there are few for real serious for problem solving.

UCI have famous data repository at http://archive.ics.uci.edu/ml/

Very good weather data at for at http://www.knmi.nl/climatology/daily_data/download.html

Data set for different ML problems order in terms of complexity http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

If you have Wolfarm Alpha pro account, you can download research data in CSV format. For example (Hover your mouse on the plots to get download link) http://www.wolframalpha.com/input/?i=home+prices+folsom+california http://www.wolframalpha.com/input/?i=historical+gold+prices

As previously mentioned, past and current competitions are great source data sets to experiment and validate algorithms accuracy. https://www.kaggle.com/competitions

There are many data sets including retail market and traffic at http://fimi.ua.ac.be/data/

UCI Data Archive at http://kdd.ics.uci.edu/

Statlib data arcihive http://lib.stat.cmu.edu/datasets/

Economic Time series data sets from university of Maryland http://inforumweb.umd.edu/econdata/econdata.html

There are many time series covering various aspect data library compiled for Monash University, Australia here (Interesting data sets about loans, Sheep population etc) http://datamarket.com/data/list/?q=provider:tsdl

Data is considered as commodity and few sites sells data for researchers. http://datamarket.com

Various data sets shared for learning community can be found here, http://www.cs.ucr.edu/~eamonn/time_series_data/

Collection of spatial data list at http://geodacenter.asu.edu/datalist/

Some interesting data sets for learning http://calcnet.mth.cmich.edu/org/spss/Prjs_DataSets.htm

British Election study data set http://www.britishelectionstudy.com/data/#.U9M1CvmSzoE

German social survey data http://www.gesis.org/en/allbus/allbus-home/

Research datasets from Harvard http://www.cid.harvard.edu/ciddata/ciddata.html

Various datasets compiled on International Economics, which may require some data massaging as like many data sets found here http://graduateinstitute.ch/home/study/academicdepartments/international-economics/md4stata/datasets.html

Community stackoverflow data in huge file can be found here for any data research http://blog.stackoverflow.com/category/cc-wiki-dump/

Free economics, demographics and financial data (require log in) https://www.economy.com/freelunch/

US govt data dedicated site https://www.data.gov/

Amazon host huge public data sets on AWS https://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1

Collection of many datasets compiled at http://www.kdnuggets.com/2011/02/free-public-datasets.html http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free http://www.infochimps.com/datasets http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

XML data source http://usgovxml.com/

 
2
Kudos
 
2
Kudos

Now read this

How to represent Date Values in ML datasets

Many times we encounter data sets with date values as one of the feature to handle in machine learning problems (e-g historical data of share prices of a company over the period of time, with date as one feature). One of the way to... Continue →