What is a Data Massage?

Data massage is a term for cleaning up data that is poorly formatted or missing required data for a particular purpose. The term implies manual processing or highly specific queries to target data that is breaking an automated process or analysis.


The term data massage is informally used to indicate outliers in data were dropped because they were interfering with visual presentation or confirmation of a particular theory. As such, data massage has potential ethical, compliance and risk implications.

Data Quality

Another common type of data massage is to drop low quality records or to enhance their quality.


An analysis of 1.2 million customers by location is failing because 4 records have no ZIP code. The data analyst manually drops them from the dataset and runs the report again.
Overview: Data Massage
Manually processing data to remove or fix entries that are breaking an automated processes or analysis.
Cleaning up small subsets of poor quality data that have questionable value.
