| |
Overfitting is an error of data analysis that interprets patterns as meaningful when they are most likely random noise. It occurs when a large number of theories are tested against data ensuring that patterns will be found whether they are meaningful or not.As an example, consider a study that records everything that 1000 people eat for 10 years. Over the course of 10 years, 35 people die and the study finds that all 35 were coffee drinkers. The study thus concludes that coffee is unhealthy. The findings of this study can be considered overfitting as the participants consumed thousands of foods over the 10 years, ensuring that a random pattern would emerge.
Variables vs SamplesOverfitting is avoided by testing a small number of variables against a large sample. If a medical researcher has reason to believe that coffee is unhealthy then a study of 1000 people for 10 years may be statistical relevant. In other words, making a small number of predictions up front reduces the chance of overfitting.The Age of Overfitting Overfitting is becoming a common problem because new tools allow anyone to look for patterns in data without following a proper scientific method. For example, it is common for the media to report patterns that a reporter, blogger or business finds in data using brute force methods. As a hypothetical example, an investing article might report "the last time that the price of gold went down 13 days in a row, it triggered a 34% spike in silver prices." Such patterns are almost always meaningless noise with no cause-effect relationship but may be believed because of the common trust that people place in data.
Machine BiasesArtificial intelligence typically tests a large number of parameters against data and is thus prone to overfitting. As such, artificial intelligence can develop biases based on its experience.
Data
This is the complete list of articles we have written about data.
If you enjoyed this page, please consider bookmarking Simplicable.
© 2010-2023 Simplicable. All Rights Reserved. Reproduction of materials found on this site, in any form, without explicit permission is prohibited.
View credits & copyrights or citation information for this page.
|