In regular conversation, both words are often used interchangeably. In the world of libraries, academia and research there is an important distinction between data and statistics. Data is the raw information from which statistics are created. Put in the reverse, statistics provide an interpretation and summary of data.
If you’re looking for a quick number, you want a statistic. A statistic will answer “how much” or “how many”. A statistic repeats a pre-defined observation about reality.
Statistics are the results of data analysis. It usually comes in the form of a table or chart. This is what a statistical table looks like:
Source: Statistical Abstract of the United States
If you want to understand a phenomenon, you want data. Data can be analyzed and interpreted using statistical procedures to answer “why” or “how.” Data is used to create new information and knowledge.
Raw data is the direct result of research that was conducted as part of a study or survey. It is a primary source. It usually comes in the form of a digital data set that can be analyzed using software such as Excel, SPSS, SAS, and so on. This is what a data set looks like:
Look within a data archive that collects within the general subject area that you are searching for. There are several fee-based data archives, but there are many open access ones, such as:
Ask yourself: Who might collect and publish this type of data?
Then visit the organization’s website and see if you're right! Or, search for them as an author in the library catalog.
The government collects data to aid in policy decisions and is the largest producer of data overall. For example, the U.S. Census Bureau, Federal Election Commission, Federal Highway Administration and many other agencies collect and publish data. To better understand the structure of government agencies read the U.S. Government Manual and browse FedStats. Government data is free and publicly available, but may require access through library resources or special requests.
Many independent non-commercial and nonprofit organizations collect and publish data that supports their social platform. For example, the International Monetary Fund, United Nations, World Health Organization, and many others collect and publish data. For more information about NGOs, visit Duke Libraries NGO Research Guide. Data from NGOs may be free or fee-based.
Academic research projects funded by public and private foundations create a wealth of data. For example, the Michigan State of the State Survey, Panel Study of Income Dynamics, American National Election Studies, and many other research projects collect and publish data. Much of this type of data is free and publicly available, but may require access through library resources. Access to smaller original research projects may be dependent upon contacting individual researchers.
Commercial firms collect and publish data as a paid service to clients or to sell broadly. Examples include marketing firms, pollsters, trade organizations, and business information. This information is almost always is fee-based and may not always be available for public release.
Search the Library for books and articles dealing with data. Some researcher include their data sets with their publications. If you need help doing this, please ask a librarian!
Be specific about your topic so that you can narrow your search, but be flexible enough to tailor your needs to existing sources.
This is what you should be able to define:
Social Unit: This is the population that you want to study.
It can be...
Time: This is the period of time you want to study.
Things to think about...
Space: Geography or place.
There are two main types of geographic classifications...
Remember to define your topic with enough flexibility to adapt to available data!
Data is not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.
Funding bodies increasingly require grant-holders to develop and implement Data Management and Sharing Plans (DMPs).
Plans typically state what data will be created and how, and outline the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied.
Can't afford or don't want to use SPSS? Try one of these open source alternatives.
Thanks to Hailey Mooney of Michigan State University Library for permission to reuse her content.