Free Data sets for machine learning practice

There are many places data sets are available for free, I have collected following links over the periods, many of them for learning purpose, there are few for real serious for problem solving.

UCI have famous data repository at http://archive.ics.uci.edu/ml/

Very good weather data at for at http://www.knmi.nl/climatology/daily_data/download.html

Data set for different ML problems order in terms of complexity http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

If you have Wolfarm Alpha pro account, you can download research data in CSV format. For example (Hover your mouse on the plots to get download link) http://www.wolframalpha.com/input/?i=home+prices+folsom+california http://www.wolframalpha.com/input/?i=historical+gold+prices

As previously mentioned, past and current competitions are great source data sets to experiment and validate algorithms accuracy. https://www.kaggle.com/competitions

There are many data sets including retail market and traffic at http://fimi.ua.ac.be/data/

UCI Data Archive at http://kdd.ics.uci.edu/

Statlib data arcihive http://lib.stat.cmu.edu/datasets/

Economic Time series data sets from university of Maryland http://inforumweb.umd.edu/econdata/econdata.html

There are many time series covering various aspect data library compiled for Monash University, Australia here (Interesting data sets about loans, Sheep population etc) http://datamarket.com/data/list/?q=provider:tsdl

Data is considered as commodity and few sites sells data for researchers. http://datamarket.com

Various data sets shared for learning community can be found here, http://www.cs.ucr.edu/~eamonn/time_series_data/

Collection of spatial data list at http://geodacenter.asu.edu/datalist/

Some interesting data sets for learning http://calcnet.mth.cmich.edu/org/spss/Prjs_DataSets.htm

British Election study data set http://www.britishelectionstudy.com/data/#.U9M1CvmSzoE

German social survey data http://www.gesis.org/en/allbus/allbus-home/

Research datasets from Harvard http://www.cid.harvard.edu/ciddata/ciddata.html

Various datasets compiled on International Economics, which may require some data massaging as like many data sets found here http://graduateinstitute.ch/home/study/academicdepartments/international-economics/md4stata/datasets.html

Community stackoverflow data in huge file can be found here for any data research http://blog.stackoverflow.com/category/cc-wiki-dump/

Free economics, demographics and financial data (require log in) https://www.economy.com/freelunch/

US govt data dedicated site https://www.data.gov/

Amazon host huge public data sets on AWS https://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1

Collection of many datasets compiled at http://www.kdnuggets.com/2011/02/free-public-datasets.html http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free http://www.infochimps.com/datasets http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

XML data source http://usgovxml.com/

 
2
Kudos
 
2
Kudos

Now read this

Duplicate Content Checking

I wrote this simple utility in C# to check any given document is already published anywhere in the internet or not. This application is needed for one of my friend, he runs a information portal site, and got many freelance writers for... Continue →