Data Vocab

1 / 20

All Unit 5 Terms

Data Collection

Values collected about the world, stored digitally. Data can be numbers, text, images, or other formats.

Information Analysis

Data that has been organized, interpreted, or structured so that it is useful and meaningful.

Metadata Collection

Data about data. Information that describes the properties of a dataset, such as when it was collected, who collected it, and what format it is in.

Data Cleaning Analysis

The process of fixing or removing incorrect, incomplete, duplicated, or improperly formatted data from a dataset.

Data Filtering Analysis

The process of selecting a subset of data based on specific criteria, such as only showing rows where a column matches a value.

Correlation Analysis

A relationship between two variables where changes in one are associated with changes in the other. Correlation does not imply causation.

Crowdsourcing Collection

Obtaining data, input, or services from a large number of people, typically via the Internet. Examples include Wikipedia and citizen science.

Citizen Science Collection

Scientific research conducted wholly or in part by non-professional scientists, often through crowdsourcing data collection.

Data Bias Society & Ethics

When data systematically favors certain outcomes over others due to how it was collected, selected, or used. Biased data can lead to unfair or inaccurate conclusions.

Big Data Collection

Datasets that are too large or complex to be processed by traditional methods. Characterized by high volume, velocity, and variety.

Open Data Society & Ethics

Data that is freely available for anyone to use, share, and build upon without restrictions.

Bar Chart Visualization

A visualization that uses rectangular bars to compare values across categories. Bar height or length represents the value.

Histogram Visualization

A visualization that shows the distribution of numerical data by grouping values into ranges (bins) and displaying the frequency of each range.

Scatter Plot Visualization

A visualization that uses dots to show the relationship between two numerical variables, with one on each axis.

Cross-Tabulation Visualization

A table or chart that shows the relationship between two categorical variables by displaying counts or percentages for each combination.

Trend Analysis

A general direction or pattern observed in data over time or across categories.

Machine Learning Society & Ethics

A type of artificial intelligence where computers learn patterns from data and improve at tasks without being explicitly programmed for each case.

Algorithmic Bias Society & Ethics

When an algorithm produces unfair or discriminatory results because it was trained on biased data or designed with biased assumptions.

Pattern Analysis

A regularity or trend found in data that can be used to make predictions or draw conclusions.

Summary Table Visualization

A table that organizes data by grouping and aggregating values (counting, averaging, summing) to reveal patterns.