Reliability-centred Maintenance (RCM), a way of life

On December 29, 1978, Nowlan, Heap and Matteson published a report called Reliability Centered Maintenance (RCM). 40 years later, RCM is still “alive and kicking”.

RCM is a methodology that defines the failure behaviour of technical systems and develops a maintenance policy to prevent or minimize the consequences of failure.

To define the failure behaviour of assets and systems, FMEA / FMECA techniques are used. FMEA stands for Failure Mode Effect Analysis. The most common FMEA types are: Design FMEA, Object FMEA and the Process FMEA. And when criticality is added, it is called a FMECA (Failure Modes Effect & Criticality Analysis). RCM uses the Process FMEA technique.

RCM is used to develop maintenance concepts for highly critical systems. A maintenance concept is a list of failure modes, which failure behaviour is influenced with the execution of maintenance tasks and intervals.

It is a concept, because it is not finished yet. The maintenance concepts have to be nested into maintenance plans. These maintenance plans have to be connected to an agenda for execution of those tasks. This is done in the CMMS.For highly critical systems, shortcuts in the development of the maintenance concept are not allowed. This could be one of the most important reasons why FMECA techniques are NOT used in RCM. When a process, system or asset is highly critical, all failure modes matter and all have to be evaluated. Using criticality within the analysis itself, has no added value in a RCM analysis. For low and medium critical processes and systems, shortcuts can be more acceptable and FMECAs can be used.

When RCM is a way of life, the FMEA describes the CURRENT failure behaviour. Doing this will result in a maintenance concept which will remain up-to-date. Processes, systems, assets which need maintenance, are influenced by people, procedures, raw material, batches, modifications, budgets, wether, market demands, … All these factors change continuously in time. This means failure behaviour is constantly changing too. This is why FMEAs need to describe the current failure behaviour, so the maintenance concepts / plans / programs will cover the right failure modes.

A way of life means RCM is embedded in daily life. This can be done when the RCM is executed in in a dynamic way.

When the RCM report was published, parts of RCM were offered to industry and many told their customers they were doing RCM. This led to quality problems and the Department of Defense decided a standard was needed to control the RCM quality. In August 1999, the SAE JA1011 standard was published. This JA1011 document was accepted worldwide as the RCM standard. Some years later the RCM guide SAE JA1012 followed. Both describe how RCM must be executed.

The RCM standard JA1011 is a 12 page document. One of the most important parts are the 37 RCM definitions, which describe which terminology should be used in a RCM project. Without understanding these definitions as part of the methodology, RCM can become confused and time consuming.

The other important part of the standard describes the 7 RCM steps:

  1. Determine the Operational Context and the Functions and associated desired standards of performance of the asset (Operational Context and Functions).
  2. Determine how an asset can fail to fulfill its functions (Functional Failures).
  3. Determine the causes of each functional failure (Failure Modes).
  4. Determine what happens when each failure occurs (Failure Effects).
  5. Classify the consequences of failure (Failure Consequences).
  6. Determine what should be performed to predict or prevent each failure (Tasks and task intervals).
  7. Determine if other failure management strategies may be more effective (one-time-changes).

§ 5.9 of the RCM standard SAE JA1011 describes the importance of a Living Program. RCM recognizes that during the development of the initial FMEA, not all data is available and that information can be inherently imprecise. Making RCM a living program means the continuously improving knowledge of processes must be used to keep the FMEA up-to-date, in order to continuously improve maintenance concepts, -plans and -programs.

Continuous Improvement options

During the development of RCM, Nowlan, Heap and Matteson already understood the importance of continuous improvement. Now 40 years later, the industry is even better equipped to improve.

The following PDCA (Plan – Do – Check – Act) circles can be used to maintain dynamic maintenance programs:

  1. Periodic review of the Operational Context and FMEA. If the FMEA describes the description of the current state, the connected maintenance tasks will probably be right too. If available techniques change, it might influence tasks and intervals.
  2. Traditionally failures are reported by telephone or described in text fields. The result is that many people describe different failure reports of identical failure modes. The number of identical failure modes will never become useful data if text fields are used. Grouping them, results in nice management reports, but is not really effective. It would be better to use the FMEA failure modes to report failures. This enables the organisation to focus on those failure modes – functional failures – functions – processes, which produce most pain.
  3. When a reported failure is resolved, the solver really solved it when he found the root cause. Just resetting it is not the same of course. This root cause knowledge is very important expert knowledge, which should be standardized and used to optimize the failure effect information in the FMEA.
  4. Every time a failure is reported by someone, we could ask him: “What have YOU done to prevent this failure?”. It emphasizes the importance of developing information of potential failures. Potential failures are all symptoms that show a failure mode is developing. This can be increasing or decreasing vibration, temperature, pressure, wall thickness, quality measurements, … If it were possible to know about all all potential failures, we would be able to prevent many more failure modes.
  5. The FMEA should describe the current failure behaviour. So the operator of production line 12 should understand the failure behaviour of line 12. Which means he should be able to train himself and need access to the FMEA of line 12. This way he will get a better understanding and operate his line better. The FMEA can be embedded in an e-learning program, so the management is able to measure the knowledge of all people influencing the efficiency.
  6. When people use the FMEA to report failures and solvers report failure effect data, both can be combined in order to find out if the right failures were reported in the beginning. In some cases extra training is needed to help people to learn about the processes they operate and maintain.
  7. Each professional maintenance program covers the current failure behaviour, which is the FMEA. Each maintenance program is therefore most effective when all maintenance tasks cover all failure modes. An important KPI shows how effective all maintenance programs are. This KPI called “MPE” (Maintenance Program Effectiveness) is created to keep the ToDo list of reliability engineers up-to-date and focus on those parts of the processes that give fast ROI.
  8. Connecting PLC signals, Internet 4 data, sensor data, failure reporting data, SCADA signals and all other data that can be connected to parts of the FMEA, will boost the improvement of the FMEA, so the Maintenance Program in the CMMS can be optimized regularly.
  9. OEE (Overall Equipment Effectiveness) is a tool to connect to FMEA data. We see quite nice developments that result in improvements of 51.000 to 1.900 downtime minutes / year in a 7 year period.
  10. Using data checking tools, can help optimize data about PF and MTBF figures of failure modes.
  11. Data Mining tools will indicate early warnings we can use that we never thought of before. Lots of data is lost, because we do not use it effectively. Great results have been achieved in the last few years.

Many more solutions are available to keep the FMEA up to date.

Traditionally maintenance management was focused on assets. Reliability Management focuses and improves the failure behaviour that influence Cost + Availability + Reliability + Effectiveness (C.A.R.E.). of assets, systems and processes.

Reliability Management makes RCM a living program. It Takes Care of C.A.R.E. and Becomes Better Each Day.