Data cleansing is the process of detecting and correcting data quality issues. It typically includes both automatic steps such as queries designed to detect broken data and manual steps such as data wrangling. The following are common examples.
Corrupt DataData that is corrupted due to data rot is corrected using a historical backup.
Inconsistent DataA report is constructed to identify differences in customer data between several systems. The report identifies close to 500 significant inconsistencies that are manually corrected before migrating the data into a master data management tool.
Inaccurate DataA product catalog database was launched with accurate data but wasn't maintained over time. A team updates the information using documents such as product specifications before sharing the data with an ecommerce partner.
Irrelevant DataA data quality initiative merges four sources of customer information into a common customer data record. A common data model is designed and irrelevant data that was captured by one of the sources is dropped.
Dirty DataAn ecommerce site imports product data from hundreds of partners on a regular basis. As the data comes in a variety of formats, it is often imported incorrectly over the years resulting in dirty data that is costly to fix. The site decides to launch a new extranet for partners to import and manage product data. They require all partners to review their data and make fixes or risk being delisted from the site.
Typographical Errors A hotel booking site scans for reviews that have a large number of typographical errors and drops them from their database.
StandardizationA telecom billing database has over 1,200 services that get listed on customer bills. The company currently provides only 92 unique services. A quality assurance team gets complaints from customers who don't understand their bills. They develop a mapping for the 1,200 service descriptions to the actual 92 current services and correct the data.
A data migration project aims to migrate historical sales orders to a new sales analytics platform. The project finds that database constraints such as foreign key constraints weren't properly implemented. As a result, the structure of data is broken. The migration project runs multiple scripts to identify broken references and fix them.
CompletenessA new feature is launched to a customer portal that requires a customer postal code. The project finds that postal codes are missing for 800 customers. They create a script to query the postal codes based on telephone number using a third party data provider.
CharactersAn integration project finds that scripts are failing due to special characters that a sensitive parser can't handle. A script is run to convert the characters in the database to a standard character.
This is the complete list of articles we have written about data.
If you enjoyed this page, please consider bookmarking Simplicable.
© 2010-2023 Simplicable. All Rights Reserved. Reproduction of materials found on this site, in any form, without explicit permission is prohibited.
View credits & copyrights or citation information for this page.