Variables vs Samples
Overfitting is avoided by testing a small number of variables against a large sample. If a medical researcher has reason to believe that coffee is unhealthy then a study of 1000 people for 10 years may be statistical relevant. In other words, making a small number of predictions up front reduces the chance of overfitting.The Age of Overfitting
Overfitting is becoming a common problem because new tools allow anyone to look for patterns in data without following a proper scientific method. For example, it is common for the media to report patterns that a reporter, blogger or business finds in data using brute force methods. As a hypothetical example, an investing article might report "the last time that the price of gold went down 13 days in a row, it triggered a 34% spike in silver prices." Such patterns are almost always meaningless noise with no cause-effect relationship but may be believed because of the common trust that people place in data.Machine Biases
Artificial intelligence typically tests a large number of parameters against data and is thus prone to overfitting. As such, artificial intelligence can develop biases based on random patterns discovered due to the brute force nature of machine learning.Overview: Overfitting | ||
Type | Data Analysis | |
Definition (1) | Aggressively searching for patterns in data such that you're sure to discover random patterns. | |
Definition (2) | Testing a large number of parameters relative to sample size such that results are unlikely to be reproduced in another set of samples. | |
Related Concepts |