So if you want to teach an algorithm what a narwhal looks like, this would be a good place to start. GET-Evidence has put up public genomes for download. It might be pretty good for some kind of textual analysis project or training a machine learning algorithm maybe a spellchecker?

Google has put made all their Google Books n-gram data freely available. Good luck to you Or the corn lobbies? If one song goes viral with a unique style, do a bunch of copycats follow?

Namely, if someone searches for something, what do they click on? Most of the data can be segmented both by time and by geography.

Datasets are customizable, allowing you to select variables of interest such as age, gender, and race.

Do left-leaning blogs more often link to other left-leaning blogs than right-leaning ones? It was last updated August 21, Wikipedia informs me that Attack of the Show rated it the number 1 viral video of all time.

19 Free Public Data Sets for Your First Data Science Project

In that case, try the ImageNet database, which is structured around the WordNet hierarchy. Subject categories include criminal justice, education, energy, food and agriculture, government, health, labor and employment, natural resources and environment, and more.

But where to find the data for such a thing? Or you could, you know, try to build the next Google. This list would be a good first step in researching what sort of data comparisons people actually care about.

If you need economic census data on any industry, check out census.

If you need a database of comprehensive book data, perhaps to build a competitor to Goodreads or an online digital library, the Open Library allows people to freely download their entire database. This is good for building up classification algorithms that decide whether or not a new image is an ad or not, which might be good for, say, automatic ad blocking or spam detection.

Check out our list of free data mining tools. Ever seen a TV show where a government determines that someone is a terrorist based on their social ties? The earliest recorded chess match dates back to the 10th century, played between a historian from Baghdad and a student.

Statistical Methods & Data Sources.

Understandable Statistics Data Sets. The publisher of this textbook provides some data sets organized by data type/uses, such as: Raw data from Pew surveys is posted here six months after the survey results are published.

Includes archived data back to Data from Statistics for Experimenters, by Box, Hunter, & Hunter. Results from an industrial experiment. Results from an industrial experiment. Used to illustrate several approaches to analyzing data, in chapters 2 and 3 of that book.

Downloadable data sets are available online. UC Berkeley's principal archive of digitized social science data and statistics. It operates as a part of the new UC Berkeley's Social Science Data Lab (D-Lab). indicators and provides users with the ability to map, rank, and download the data for custom analyses.

An option for raw data. Integrated Postsecondary Education Data System (IPEDs) includes information from every college, university, and technical and vocational institution that participates in the federal student financial aid mint-body.comts include year-over-year enrollments, program completions, graduation rates, faculty and staff, finances, institutional prices.

One-stop-shopping for statistics and raw data in social science disciplines. Resources for finding raw data, collecting new data, analyzing data, and citing data.

Skip to main content. GSU users can access the download data sets, many of. The data for courts include information on the organization of the court, geographic location, type of court, level of government administering the court, number, types, and full- or part-time status of judicial and other personnel, method of appealing cases, location of court records, and types of statistics.

