Values collected about the world, stored digitally. Data can be numbers, text, images, or other formats.
Data that has been organized, interpreted, or structured so that it is useful and meaningful.
Data about data. Information that describes the properties of a dataset, such as when it was collected, who collected it, and what format it is in.
The process of fixing or removing incorrect, incomplete, duplicated, or improperly formatted data from a dataset.
The process of selecting a subset of data based on specific criteria, such as only showing rows where a column matches a value.
A relationship between two variables where changes in one are associated with changes in the other. Correlation does not imply causation.
Obtaining data, input, or services from a large number of people, typically via the Internet. Examples include Wikipedia and citizen science.
Scientific research conducted wholly or in part by non-professional scientists, often through crowdsourcing data collection.
When data systematically favors certain outcomes over others due to how it was collected, selected, or used. Biased data can lead to unfair or inaccurate conclusions.
Datasets that are too large or complex to be processed by traditional methods. Characterized by high volume, velocity, and variety.
Data that is freely available for anyone to use, share, and build upon without restrictions.
A visualization that uses rectangular bars to compare values across categories. Bar height or length represents the value.
A visualization that shows the distribution of numerical data by grouping values into ranges (bins) and displaying the frequency of each range.
A visualization that uses dots to show the relationship between two numerical variables, with one on each axis.
A table or chart that shows the relationship between two categorical variables by displaying counts or percentages for each combination.
A general direction or pattern observed in data over time or across categories.
A type of artificial intelligence where computers learn patterns from data and improve at tasks without being explicitly programmed for each case.
When an algorithm produces unfair or discriminatory results because it was trained on biased data or designed with biased assumptions.
A regularity or trend found in data that can be used to make predictions or draw conclusions.
A table that organizes data by grouping and aggregating values (counting, averaging, summing) to reveal patterns.