BY: Stacey Pisani
Comments: No Comments
In recent years, companies have been generating vast and ever-increasing amounts of data associated with business operations. This trend has led to renewed interest in predictive analytics, a field which focuses on analyzing large data sets to identify patterns and predict outcomes to help guide decision-making. While many leading companies use predictive analytics to identify marketing and sales opportunities, similar data analysis strategies are less common in occupational and process safety. Although the use of predictive analytics is less common in the field of safety, the potential benefits of analyzing safety data are considerable.
Just as companies are currently using customer data to predict customer behavior, safety and incident data can be used to predict when and where incidents are likely to occur. Appropriate data analysis strategies can also identify the key factors that contribute to incident risk, thereby allowing companies to proactively address those factors to avoid future incidents.
Predictive Analytics: In Theory
Let’s take a step back and look at what predictive analytics is and what it does. Predictive analytics is a broad field encompassing aspects of various disciplines, including machine learning, artificial intelligence, statistics, and data mining. Predictive analytics uncovers patterns and trends in large data sets for the purpose of predicting outcomes before they occur. One branch of predictive analytics, classification algorithms, could be particularly beneficial to industry, especially when it comes to avoiding incidents.
Classification algorithms can be categorized as supervised machine learning. With supervised learning, the user has a set of data that includes predictive variable measurements that can be tied to known outcomes. The algorithms identify the relationships between various factors and those outcomes to create predictive rules (i.e., a model). Once created, the model can be given a dataset with predictive variable measurements and unknown outcomes, and will then predict the outcome based on the model rules.
Predictive Analytics: In Practice
Like many in the transportation industry, this railroad had experienced a number of derailments caused by broken rails. Broken rail derailments can have particularly severe consequences, since they typically occur on mainline tracks, at full speed, and with no warning of the impending broken rail. Kestrel was asked to create a predictive model of track-caused derailments on a mile-by-mile basis to identify areas of high broken rail risk so the railroad could target those areas for maintenance, increased inspections, and capital improvement projects.
Penalized Likelihood Logistic Regression
As described above, classification models learn predictive rules in an original data set that includes known outcomes, then apply the learned rules to a new data set to predict outcomes and probabilities. In this case study, Kestrel used a logistic regression modified by Firth’s penalized likelihood method to:
- Fit the model
- Identify eleven significant predictive variables (based largely on past incidents)
- Calculate broken rail probabilities for each mile of mainline track based on track characteristics
The final model calculates a predicted probability of a broken rail occurring on each mile of track over a two-year period. The results suggest that the final model effectively predicted broken rail risk, with 33% of broken rails occurring on the riskiest 5% of track miles and 70% occurring in the riskiest 20%. Further, the model shows that the greatest risk reduction for the investment may be obtained by focusing on the 2.5% of track miles with the highest probability of a broken rail. This ability to predict where broken rails are likely to occur will allow the company to more effectively manage broken rail derailment risk through targeted track inspections, maintenance, and capital improvement programs.
Implications for Other Industries
The same general approach described in the above case study can also be applied to other industries—using KPIs to determine predictive variables and incidents as the outcome. The process is as follows:
- Measurements for defined variables would be taken regularly at each facility or unit. Precision increases as the measurements become more frequent and the observed area (facility/unit) becomes smaller.
- Once a sufficient number of measurements has been taken, they would then be combined with incident data to provide both the predictive variable measurements and the outcome data needed for training a model. This dataset would be fed into a logistic regression or other classification algorithms to create a model.
- Once the model has been created, it can be applied to new measurements to predict the probability of an incident occurring at that location during the applicable timeframe.
Once predicted incident probabilities have been found, management would be able to focus improvement resources on those locations that have the highest probabilities of experiencing an incident. The classification algorithms also identify which factors have predictive validity, so management will know how improving those factors will affect the predicted probability of incidents occurring. In other words, they will know which factors have the strongest relationship with incidents and can focus on improving those first.
Industrial companies are generating and recording unprecedented amounts of data associated with operations. Those that strive to be best-in-class need to use that data intelligently to guide future business decision-making.
The versatility of predictive analytics, including the method described in this case study, can be applied to help companies analyze a wide variety of problems. In this way, companies can:
- Explore and investigate past performance
- Gain the insights needed to turn vast amounts of data into relevant and actionable information
- Create statistically valid models to facilitate data-driven decisions