A-Z Popular New Data Analysis Search »
Data Analysis
 

What is Overfitting?

 , updated on
Overfitting is an error of data analysis that interprets patterns as meaningful when they are most likely random noise. This occurs when a large number of theories are tested against data ensuring that patterns will be found whether they are meaningful or not.
As an example, consider a study that records everything that 1000 people eat for 10 years. Over the course of 10 years, 35 people die and the study finds that all 35 were coffee drinkers. The study thus concludes that coffee is unhealthy. The findings of this study can be considered overfitting as the participants consumed thousands of foods over the 10 years, ensuring that a random pattern would emerge.

Variables vs Samples

Overfitting is avoided by testing a small number of variables against a large sample.
If a medical researcher has reason to believe that coffee is unhealthy then a study of 1000 people for 10 years may be statistical relevant. In other words, making a small number of predictions up front reduces the chance of overfitting.

The Age of Overfitting

Overfitting is becoming a common problem because new tools allow anyone to look for patterns in data without following a proper scientific method. For example, it is common for the media to report patterns that a reporter, blogger or business finds in data using brute force methods. As a hypothetical example, an investing article might report "the last time that the price of gold went down 13 days in a row, it triggered a 34% spike in silver prices." Such patterns are almost always meaningless noise with no cause-effect relationship but may be believed because of the common trust that people place in data.

Machine Biases

Artificial intelligence typically tests a large number of parameters against data and is thus prone to overfitting. As such, artificial intelligence can develop biases based on random patterns discovered due to the brute force nature of machine learning.
Overview: Overfitting
Type
Data Analysis
Definition (1)
Aggressively searching for patterns in data such that you're sure to discover random patterns.
Definition (2)
Testing a large number of parameters relative to sample size such that results are unlikely to be reproduced in another set of samples.
Related Concepts
Next: Data Dredging
More about data analysis:
Analytics
Business Rules
Customer Data
Data
Data Analysis
Data Collection
Data Dredging
Data Driven
Data Lineage
Data Massage
Data Profiling
Data Quality
Data Science
Decision Making
Information Seeking
Integration
Metadata
Performance Analysis
Problem Solving
Qualitative Analysis
Requirements
Structured Data
What-if Analysis
If you enjoyed this page, please consider bookmarking Simplicable.
 

Data Analysis

The common types of data analysis.

Data Driven

An overview of data-driven approaches with examples.

Analytical Thinking

The definition of analytical thinking with examples.

Words To Describe Evidence

A vocabulary for describing evidence.

What-if Analysis

The definition of what-if analysis with examples.

Data Collection

An overview of data collection with examples.

Structured Data

An overview of structured data with examples.

Qualitative Analysis

An overview of qualitative analysis with examples.

Performance Analysis

An overview of performance analysis with examples.

System Analysis Examples

An overview of system analysis with examples.

Data Quality Examples

An overview of data quality with examples.

Legacy Data

An overview of legacy data with examples.

Baseline vs Benchmark

The difference between a baseline and a benchmark.

Test And Learn

A basic problem solving strategy that involves an iterative process of experimentation.

Dry Run

A definition of dry run with several business examples.

Testbed

A definition of testbed with examples.

Experiment Controls

The common types of experiment control explained.

Observational Study

The definition of observational study with examples.

Mockup

The common types of mockup.

Words For Innovation

A list of synonyms for innovation and innovative.

Primary Research

The definition of primary research with examples.
The most popular articles on Simplicable in the past day.

New Articles

Recent posts or updates on Simplicable.
Site Map