Condition Monitoring Program Shortcomings

Of course there‘s nothing wrong with the strategy of on condition based maintenance, however shortcomings often exist, and there is usually always opportunity for improvement.

Condition Monitoring has traditionally often been seen as (and to some degree will always be regarded as) somewhat of a dark art. Management often does not understand the nuances involved in successful CM, often viewed as a compulsory cost center somehow tied to maintenance, to be managed as a cost to be minimised (rather than an untapped value pool). Condition monitoring outcomes not linked to feedback in a structured way so true CM performance is often less than transparent, and the truth often disguised by jargon, technology, fads, heroics and defensiveness.

Without a deep understanding of actual failure mechanisms, optimising a maintenance strategy is often ad-hoc and not supported by qualitative peer reviewed evidence. Reliability models are often based on lagging indicators of actual machine failures that only consider “those that got away”, and give little insight into what is going to happen next, and to what asset! Reliability cost optimisation models often look convincing, but are often based on invalid, or highly questionable models.

Condition assessments are exactly that. Assessment by someone who makes sense of the numbers, signatures, waveforms, reports, readings… They are supported by technical evidence, and generally recommend action. Whilst diagnosis can at times be straightforward, prognosis- and importantly scope for defect elimination is not always well managed. It is tempting to try to cost cut and oversimplify condition monitoring- rely on a black box alarm, or the “Wallchart analyst” syndrome. That however only reinforces CM as a detection technology, not a strategic activity.

We have had all sorts of condition assessment technologies for decades now, and computing power has increased many order of magnitude, however failures continue to occur, machines continue to be over-maintained, and performance of CM efforts don’t seem to be proportional to the improvement of computing power.

Why? Perhaps:
• Didn’t think any better could be done
• The cost of failure is less than prevention
• Lucked out (it won’t happen again fingers crossed)
• Engineering don’t know the failure mode or consequence
• Management don’t actually know the risk
• Other Management priorities
• Strategy isn’t working
• Waiting to deploy some IT initiative to make the problem go away in the future

Most failures give some sort of warning. So when failures occur despite a CM program being in place either:
• The specific failure mode was not monitored (deliberately or negligently)
• Not monitored at the correct frequency
• Wrong detection technology applied
• Incorrectly diagnosed, or
• The diagnosis was not acted on.

Often condition monitoring operates in a silo decoupled from reliability and business improvement departments. Defect elimination is often not supported by the defect detection effort.

So common outcomes are:
• Poor communication and collaboration of CM with maintenance planning and operations
• Reports are in a speak which is not straightforward to interpret into specific maintenance action, or strategy change.
• Data overload, but variable assessment interpretation skills
• Increasing use of contractors, and mobile workforce further erodes knowledge capture. “knowledge” gained by the plant owner is often just a bunch of reports in a filing cabinet, shared drive… etc.
• Poor visibility into how findings and recommendations are arrived at
• Success is mostly dependent on the talent of the analyst of the day. No policy for knowledge succession
• Poor feedback on recommendations further impedes learning and improvement of condition assessment performance
• Reporting standards are inconsistent, variable between service providers, and assessment technologies
• Evidence often not corroborated from other sources when making decisions around machinery health
• Maintenance decisions often fail to consider and consolidate everything that is known about the machinery in question
• Visibility to all information, and history is poor, so learning from the past is ad hoc at best
• General lack of discipline around feedback.. Often CM recommendations are provided as a one way process with no closing of the loop. Comparing actual findings with prognosis
• Strategy for knowledge retention pushed out to contractors and service providers who might do an excellent job, but are not necessarily there to look after your interests once they no longer have the contract
• “Fear of starting again” factor means putting up with the devil we know

The era of outsourced activities surrounding condition monitoring makes attention to collaboration and knowledge accumulation more important than ever.

Felix was designed to help overcome the common shortfalls in condition monitoring programs.

Surely, I no longer have to worry about improving condition assessment… “Big data”, AI, ML and the Internet of Things will do that for me soon???

In reality automated condition assessment technologies raise alarms, and bring attention to problems, but prognosis, prioritization, action to be taken and feedback is still often a human activity, and probably should be for a while to come.
Increasingly the CM landscape will be a hybrid of automatic assessments, manual diagnostics, and continuous re-evaluation of the “skill” of real time diagnostic devices, AI algorithms, and machine health predictors.

Felix helps mange the human side of what will for a long time be a somewhat hybrid approach to machine health monitoring, whereby some items will be automatically monitored, some won’t, and some may rely on automated approaches for rough alarm detection, followed up by expert human diagnostics to determine prognosis. The less manual routine analysis conducted, the more important that “knowledge” is embedded in the organisation to make sense of the automated alarms generated by algorithmic “AI”/”ML” approaches. Machines are capable of monitoring and processing large amounts of data. There is a danger that the first principles of determining machine health could be lost.

Time saved on routine asset condition screening should translate to extra time spent refining prognoses, and working on the elimination of the actual failure mode, rather than simply routine early detection.

The other common syndrome to be wary of is that automated fault detection, prognosis, and maintenance scheduling generated by “AI”, or machine generated systems go unchallenged, so over-maintenance happens silently, with a decreasingly skilled workforce to improve emerging machine assisted alarming.

There lies a danger that sound engineering becomes replaced by IT, which only gets challenged where excessive failures occur, but remains unchallenged where over maintenance due to false positives occur. The two must work together.

keyboard_arrow_up