Corrupt Data
Data that is corrupted due to data rot is corrected using a historical backup.Inconsistent Data
A report is constructed to identify differences in customer data between several systems. The report identifies close to 500 significant inconsistencies that are manually corrected before migrating the data into a master data management tool.Inaccurate Data
A product catalog database was launched with accurate data but wasn't maintained over time. A team updates the information using documents such as product specifications before sharing the data with an ecommerce partner.Irrelevant Data
A data quality initiative merges four sources of customer information into a common customer data record. A common data model is designed and irrelevant data that was captured by one of the sources is dropped.Dirty Data
An ecommerce site imports product data from hundreds of partners on a regular basis. As the data comes in a variety of formats, it is often imported incorrectly over the years resulting in dirty data that is costly to fix. The site decides to launch a new extranet for partners to import and manage product data. They require all partners to review their data and make fixes or risk being delisted from the site.Typographical Errors
A hotel booking site scans for reviews that have a large number of typographical errors and drops them from their database.Standardization
A telecom billing database has over 1,200 services that get listed on customer bills. The company currently provides only 92 unique services. A quality assurance team gets complaints from customers who don't understand their bills. They develop a mapping for the 1,200 service descriptions to the actual 92 current services and correct the data.Referential Integrity
A data migration project aims to migrate historical sales orders to a new sales analytics platform. The project finds that database constraints such as foreign key constraints weren't properly implemented. As a result, the structure of data is broken. The migration project runs multiple scripts to identify broken references and fix them.Completeness
A new feature is launched to a customer portal that requires a customer postal code. The project finds that postal codes are missing for 800 customers. They create a script to query the postal codes based on telephone number using a third party data provider.Characters
An integration project finds that scripts are failing due to special characters that a sensitive parser can't handle. A script is run to convert the characters in the database to a standard character.Overview: Data Cleansing | ||
Type | ||
Definition | The process of detecting and correcting data quality issues. | |
Related Concepts |