| |
A root cause is the fundamental reason that an event occurs. The term implies a depth of analysis that looks beyond the obvious causes of a problem to uncover greater underlying issues. The following are illustrative examples.
EngineeringThe read-and-write head of a new hard drive model fails in accelerated life testing. The root cause is determined to be a flaw in the device's caching algorithm that wasn't properly caching data to reduce load on hardware under extreme test scenarios.SoftwareA bank experiences a problem whereby about 5% of customer's get error messages when paying a bill online. Developers investigate and determine the cause to be data on the customer accounts that isn't formatted as the code expects. The quality assurance team does an analysis and determines that the data is acceptable to the business and that the software should handle such data variations. The root cause is labeled a software bug.
A grocery store orders accidentally orders 1,000 bags of apples when they only require 100. The order was entered incorrectly and the supplier won't take them back. The store needs to aggressively discount and advertise to sell the apples at a loss. The issue is initially considered human error. A root cause analysis process discovers latent human error in ordering systems. For example, there is no validation or warning for usually large orders. Also, fonts on the system are abnormally small and difficult for some employees to read clearly.
AvailabilityA media company's website has availability of 97% where its peers commonly achieve 99.99%. Each time the website goes down it is attributed to a cause such as a failed change, human error, data issues and service crashes. The company performs a gap analysis to discover root causes of these failures. The report finds that the website's code, platform, infrastructure and development processes all have issues that are creating an environment of instability. For example, the firm has outsourced development to a firm that has a high turnover rate. The firm is regularly moving employees around such that each developer only works on the code for 2 weeks on average. Developers are unfamiliar with the platform resulting in bugs and increasing smelly code.
A government department experiences an information security incident after an employee clicks on a link in an email. The direct cause is reported as human error as the employee was trained not to click on links from external emails. The root causes include that the email wasn't stopped by spam filters and that the employee's machine wasn't updated with recent patches allowing a vulnerability in their operating system to be exploited.
SafetyAn aircraft lands short of a runway in shallow water. The passengers are rescued. Initially the cause appears to be human error. A safety investigation confirms pilot error as the root cause.
Thinking
This is the complete list of articles we have written about thinking.
If you enjoyed this page, please consider bookmarking Simplicable.
© 2010-2023 Simplicable. All Rights Reserved. Reproduction of materials found on this site, in any form, without explicit permission is prohibited.
View credits & copyrights or citation information for this page.
|