
Friday, August 2, 2024


 By Aleksandar Pudar

Technical Superintendent and Planned Maintenance Supervisor Reederei Nord BV

Co-founder of "Out of Box Maritime Thinker Blog" and Founder of  Narenta Consilium Group

In marine engineering, a reliability-centred maintenance (RCM) approach systematically identifies the functions, functional failures, and likely causes of failures for various assets on a vessel. It also assesses the effects of potential failure modes and determines the significance of these effects. With this information, the RCM selects the most suitable asset management policy to optimise system performance, safety, and reliability.

RCM considers all possible asset management options in the marine engineering context, such as on-condition tasks, scheduled restoration tasks, scheduled discard tasks, failure-finding tasks, and one-time changes. One-time changes encompass modifications to various aspects of the marine engineering asset, including hardware design, operating procedures, personnel training, and other factors beyond maintenance. This comprehensive consideration of asset management options sets RCM apart from other maintenance development processes.


RCM  is a process of systematically analysing an engineered system to understand the following:

·         Its functions

·         The failure modes of its equipment that support these functions

·         How then to choose an optimal course of maintenance to prevent the failure modes from occurring or to detect the failure mode before a failure occurs

·         How to determine spare holding requirements

·         How to periodically refine and modify existing maintenance over time

The objective of RCM is to achieve reliability for all of the operating modes of a system.

An RCM analysis, when properly conducted, should answer the following seven questions:

         i.            What are the system functions and associated performance standards?

       ii.            How can the system fail to fulfil these functions?

      iii.            What can cause a functional failure?

      iv.            What happens when a failure occurs?

        v.            What might the consequence be when the failure occurs?

      vi.            What can be done to detect and prevent failure?

     vii.            What should be done if a maintenance job description cannot be found?

Typically, the following tools and expertise are employed to perform RCM analyses:

·         Failure modes, effects and criticality analysis (FMECA). This analytical tool helps answer Questions 1 through 5.

·         RCM decision flow diagram. This diagram helps answer Questions 6 and 7.

·         Design, engineering and operational knowledge of the system

·         Condition-monitoring techniques

·         Risk-based decision-making
(e.g., the frequency and the consequence of a failure in terms of its impact on safety, the environment and commercial operations)

Documenting and implementing the following formalise this process:

·         The analyses and the decisions taken

·         Progressive improvements based on operational and maintenance experience

·         Clear audit trails of maintenance actions taken and improvements made

·         Once these are documented and implemented, this process will effectively ensure an engineered system's reliable and safe operation.

2.5.1.i FUNCTIONS - What can be done to detect and prevent the failure?

Determining the necessary maintenance strategies for an asset within its current operating context requires identifying the functions and their associated desired standards of performance. To effectively identify these functions and standards, the following criteria must be satisfied:

         Define the operating context of the asset: Establish the conditions under which the asset is used, including environmental factors, workload, and any other relevant parameters.

         Identify all functions of the asset or system: Ensure that all primary and secondary functions are recognised, including the roles of all protective devices.

         Create function statements with a verb, object, and performance standard: Each function statement should contain a straightforward action, target, and quantifiable performance standard whenever possible.

         Establish desired performance standards based on the owner or user's expectations: The function statements should incorporate performance standards that reflect the desired level of performance by the asset or system's owner or user in its operating context.

The operating context refers to the circumstances under which a marine engineering asset operates. For example, identical hardware may not require the same failure management policy across all installations or applications. For instance, a solitary pump within a system may require a different failure management policy than a pump that is part of multiple redundant units. Similarly, a pump handling corrosive fluids typically needs a distinct policy from one that transports benign fluids. Often overlooked protective devices must also be considered in the RCM process by identifying their functions.

Ultimately, it is the responsibility of the vessel owner or operator to determine the desired performance level that the maintenance program should maintain. By understanding and accounting for the unique operating context of each marine engineering asset, a tailored and efficient maintenance strategy can be developed, ensuring optimal performance and reliability.

2.5.1.ii FUNCTIONAL FAILURES - How can the system fail to fulfil these functions?

The criterion for this question is singular and straightforward: to identify all the failed states related to each function. If functions are well-defined, listing functional failures is relatively easy. Identifying these failures is crucial in understanding potential risks and implementing preventive measures to ensure system reliability and safety. Here are some examples of functional failures in the context of marine engineering:

Propulsion system:

a.       Engine failures: Mechanical or electrical issues with the engine lead to loss of propulsion or reduced power output.

b.       Fuel system problems: Contamination, leakage, or blockage in the fuel system impacts the engine's performance or causes shutdowns.

c.        Cooling system malfunctions: Failures in the cooling system cause overheating, which can damage components and affect the engine performance.

d.       Transmission and shaft issues: Problems with gearboxes, shafts, or couplings impact the power transfer from the engine to the propeller.

Electrical system:

a.       Generator failures: Inability to generate sufficient electrical power due to equipment malfunction or fuel shortage.

b.       Distribution failures: Power distribution issues to onboard systems, such as switchboard malfunctions, damaged cables, or circuit breaker issues.

c.        Power quality issues: Voltage fluctuations or frequency deviations that can damage electrical equipment or disrupt operations.

d.       Battery issues: Inadequate charging, capacity loss, or failure of onboard batteries, affecting the performance of critical systems.

Navigation and communication system:

a.       Equipment failure: Malfunctions in navigation or communication devices, such as GPS, radar, or radios, hinder safe and efficient operations.

b.       Software failure: Bugs or errors in system software cause incorrect data display or unexpected behaviour.

c.        Signal interference: Electromagnetic or atmospheric conditions disrupt the signal reception or transmission.

Hull and structural system:

a.       Corrosion: Deterioration of the hull or structural components due to exposure to seawater, leading to reduced structural integrity.

b.       Fatigue: Material failure due to repetitive loading or stress, causing cracks or fractures in structural components.

c.        Leakage or flooding: Damage to hull plating, seals, or watertight compartments leads to water ingress, affecting buoyancy and stability.

Auxiliary systems (e.g., HVAC, bilge, and ballast):

a.       Equipment failures: Mechanical or electrical issues with pumps, compressors, or valves result in disruptions to the operation of auxiliary systems.

b.       Piping failures: Leakage or blockage in the piping system impacts the proper functioning of auxiliary systems.

c.        Control system malfunctions: Failures in control systems lead to incorrect operation or reduced efficiency of auxiliary systems.

2.5.1.iii FAILURE MODES -  What can cause a functional failure?

What causes each functional failure (failure modes)? Understanding the causes of each functional failure (failure modes) in marine engineering is crucial for developing effective maintenance strategies. In FMECA, the term "failure mode" is used, similar to how RCM uses "functional failure," but the RCM community defines failure mode as the event that causes a functional failure. The standard criteria for a process that identifies failure modes include the following:

·         Identifying all reasonably probable failure modes that can cause each functional failure.

·         Employing a method to determine what constitutes a reasonably probable failure mode, which must be acceptable to the owner or user of the marine asset.

·         Identifying failure modes at a level of causation enables the selection of an appropriate failure management policy.

·         Included in the list of failure modes are those that have occurred before, those currently being prevented by existing maintenance programs, and those that have not yet happened but are considered reasonably likely (credible) within the operating context.

·         Incorporating in the list of failure modes any event or process likely to cause a functional failure, such as deterioration, human error by operators or maintainers, and design defects.

As the most comprehensive analytical process for developing maintenance programs and managing physical assets, RCM is well-suited to identify every reasonably likely failure mode in marine engineering applications. By thoroughly examining these failure modes, vessel operators can optimise maintenance strategies, improve system reliability, and enhance overall performance.

2.5.1.iv FAILURE EFFECTS - What happens when a failure occurs?

What happens when failures occur (failure effects)? Understanding the consequences of failures, known as failure effects, is crucial. The criteria for identifying failure effects include the following:

         Describing failure effects as what would happen if no specific task were carried out to anticipate, prevent, or detect the failure. Failure effects should encompass all necessary information to evaluate the consequences of the failure, such as:

o    The indicatory evidence (if any) that the failure has occurred (for hidden functions, consider the consequences of multiple failures occurring).The potential impact on human safety, such as causing injury or death or adversely affecting the environment.

o    The adverse effects (if any) on vessel performance or operations.

o    The physical damage (if any) resulting from the failure.

o    The actions (if any) required to restore the system's function after the failure. FMECA typically characterises failure effects by examining their impacts at the local level, subsystem level, and system level. Furthermore, it addresses the necessary actions to restore the system's functionality following a failure.

·         Failure Modes, Effects, and Criticality Analysis (FMECA) typically describe failure effects in terms of their impact at the local, subsystem, and system levels.

2.5.1.v FAILURE CONSEQUENCES - What might the consequence be when the failure occurs?

Understanding the significance of each failure (failure consequences) is crucial for effective maintenance planning. The standard's criteria for a process that identifies failure consequences are :

·         Assessing failure consequences as if no specific task is currently being performed to anticipate, prevent, or detect the failure.

·         Formally categorising the consequences of every failure mode:

o    Separating hidden failure modes from evident failure modes in the categorisation process.

o    Clearly distinguishing events (failure modes and multiple failures) with safety and environmental consequences from those that only have economic consequences, such as operational and non-operational consequences.

RCM evaluates failure consequences, assuming that no preventive measures are in place. As a result, some may be tempted to argue that a failure does not matter because a specific action "always" protects against it. On the contrary, RCM evaluates the assumed protective action's effectiveness and meticulously justifies the effort required. Furthermore, it systematically categorises failure consequences by assigning each failure mode to one of four groups: hidden, evident safety/environmental, evident operational, and evident non-operational.

2.5.1 vi EQUIPMENT FAILURE - What can be done to detect and prevent failure?

A loss of system function in marine engineering systems can result from equipment failures and/or human errors. Equipment failure can typically be attributed to the following factors:

·         Design error

·         Faulty material

·         Improper fabrication and construction

·         Improper operation

·         Inadequate maintenance

·         Maintenance errors EQUIPMENT FAILURE RATE AND PATTERNS

To effectively improve equipment reliability through maintenance, design changes, or operational improvements, it is essential to understand the potential failure mechanisms, their causes, and the associated impacts on the marine system. Equipment failure should be defined as a state or condition where a component no longer fulfils its design intent (e.g., a functional failure occurs due to equipment failure). RCM focuses on managing equipment failures that result in functional failures.

Developing an effective failure management strategy requires understanding the failure mechanism. Equipment may exhibit various failure modes (e.g., how the equipment fails). Furthermore, each failure mode's failure mechanism might vary throughout the equipment's lifespan.

Depending on the dominant system failure mechanisms, system operation, operating environment, and maintenance, specific equipment failure modes exhibit diverse failure rates and patterns. Failure rate statistics are expressed regarding operating time or another pertinent parameter before equipment failure. Failure density distributions often predict an item's failure after a working time.

A typical failure distribution used to model equipment failures is the Weibull distribution, employed when equipment exhibits a constant failure rate for part of its life, followed by an increasing failure rate due to wear-out. Weibull analysis is also used when there is limited failure data. For example, a Weibull plot can help determine if the failure is due to infant mortality, random failure, early wear-out, or wear-out, which helps determine an appropriate maintenance strategy.

Mean Time to Failure (MTTF) is another standard statistical measure. MTTF represents the average life to failure for a specific equipment failure mode, helping to determine when to perform specific maintenance tasks. For example, MTTF data can help establish the rebuilding task interval if an equipment item requires rebuilding.

Understanding that equipment failure modes can exhibit different failure patterns has important implications for determining appropriate maintenance strategies. For most equipment failure modes, specific failure patterns may be unknown but are unnecessary for making maintenance decisions. Instead, inevitable failure characteristic information is needed to make maintenance decisions:

         Wear-in failure – characterised by "weak" members related to manufacturing defects and installation/maintenance/startup errors, also known as "burn-in" or "infant mortality" failures.

         Random failure – dominated by chance failures caused by sudden stresses, extreme conditions, random human errors, etc. (unpredictable by time).

         Wear-out failure – dominated by end-of-useful life issues for equipment

Identifying which of the three equipment failure characteristics represents the equipment failure mode helps determine the proper maintenance strategy. For example, rebuilding or replacing the equipment item may be appropriate if an equipment failure mode exhibits a wear-out pattern. However, replacing or rebuilding the equipment item may not be advisable if an equipment failure mode is characterised by wear-in failure.

Lastly, a basic understanding of the failure rate helps determine whether maintenance or equipment redesign is necessary. For example, equipment failure modes with high failure rates (e.g., frequent failures) are often best addressed by redesign rather than more frequent maintenance. FAILURE MANAGEMENT STRATEGY.

Understanding failure rates and characteristics is crucial for determining an appropriate strategy to manage failure modes (e.g., RCM refers to this as the failure management strategy). Developing and utilising this understanding is fundamental to RCM and vital for enhancing equipment reliability. For instance, it is no longer considered accurate that the more an item is overhauled, the less likely it is to fail. Unless a dominant age-related failure mode exists, age limits do little to improve the reliability of complex items. Sometimes, scheduled overhauls can increase failure rates by introducing infant mortality and/or human errors into otherwise stable systems. In RCM, the failure management strategy can comprise the following:

         Appropriate proactive maintenance tasks,

         Equipment redesigns or modifications, or

         Other operational improvements.

The proactive maintenance tasks in the failure management strategy aim to (1) prevent failures before they occur or (2) detect the onset of failures in sufficient time so that the failure can be managed before it occurs. Equipment redesigns, modifications, and operational improvements (RCM refers to these as one-time changes) attempt to enhance equipment with high failure rates or for which proactive maintenance is ineffective/inefficient.

The key issues in determining whether a specific failure management strategy is effective are:

         Is the failure management strategy technically feasible?

         Is an acceptable level of risk achieved when the failure management strategy is implemented?

         Is the failure management strategy cost-effective?

In addition to proactive maintenance tasks and one-time changes, servicing tasks and routine inspections may be essential to the failure management strategy. These activities help ensure that the equipment failure rate and failure characteristics are as expected. For example, the failure rate and failure pattern for a bearing drastically change if it is not adequately lubricated. PROACTIVE MAINTENANCE TASKS

Proactive maintenance tasks can be divided into four categories:


Planned maintenance tasks (sometimes called preventative maintenance) are performed at specified intervals, regardless of the equipment's condition. The purpose of these tasks is to prevent functional failure before it occurs. They are often applied when no condition-monitoring task is identified or justified, and a wear-out region characterises the failure mode. RCM further divides planned maintenance into two subcategories:

         Restoration task: A scheduled task performed at or before a predetermined interval (age limit) to restore an item's capability, providing an acceptable probability of functioning until the end of another specified interval. For instance, rebuilding fuel injectors in a diesel engine can be a restoration task.

         Discard task: A scheduled task carried out at or before a specified age limit that requires disposing of an item, regardless of its condition. It is important to note that "restoration" and "discard" can apply to the same task. For example, when replacing a diesel engine's cylinder liners with new ones at fixed intervals, the task can be described as a scheduled discard of the cylinder liner or a scheduled restoration of the diesel engine.


A condition-monitoring task is a scheduled task used to detect the onset of a failure so that action can be taken to prevent the functional failure. A potential failure is an identifiable condition indicating that a functional failure is imminent or in progress. Condition-monitoring tasks should only be chosen when a detectable potential failure condition will exist before failure. When choosing maintenance tasks, condition-monitoring tasks should be considered first unless an observable potential failure condition cannot be identified. Condition-monitoring tasks are also referred to as "predictive maintenance." Section 4 provides additional details.


When neither condition-monitoring nor planned maintenance tasks alone seem capable of reducing the risks of the functional failure of the equipment, it may be necessary to select a combination of both maintenance tasks. This approach is usually used when the condition-monitoring or planned maintenance task is insufficient to achieve an acceptable risk.


A failure-finding task is scheduled to detect hidden failures when no condition-monitoring or planned maintenance task is applicable. It is a scheduled function check to determine whether an item will perform its required function if called upon. Most of these items are standby or protective equipment. An example would be checking the safety valve on a boiler.

A failure-finding task is a scheduled task designed to identify whether a specific hidden failure has taken place. These tasks are typically applied to protective devices that may fail without warning. This task aims to bridge the gap between the sixth (proactive tasks) question and the seventh (default actions or actions taken without proactive tasks). While failure-finding tasks share the scheduling aspect with proactive tasks, they are not proactive; they neither predict nor prevent failures. Furthermore, failure-finding tasks are not proactive; instead, they aim to detect failures that have already occurred to minimise the possibility of multiple failures or the failure of a protected function when a protective device is already in a failed state. These tasks represent a transition from the sixth (proactive) question to the seventh (default actions or measures implemented when proactive tasks are absent). RUN-TO-FAILURE

Run-to-failure is a failure management strategy that allows an equipment item to operate until failure occurs, at which point a repair is made. This maintenance strategy is acceptable only if the risk of failure is acceptable without any proactive maintenance tasks. An example would be allowing a local pressure gauge to fail on a cooling water line equipped with a remote-reading pressure gauge.

When considering a run-to-failure decision for an asset, before accepting the decision, the following criteria should be considered:

         For hidden failures with no appropriate scheduled task, the associated multiple failures must not have safety or environmental consequences.

         The related failure mode must not pose a safety or environmental risks for evident failures with no suitable scheduled task. In other words, the process should not allow users to opt for a "run to failure" strategy if the failure mode or (in the case of a hidden failure) the corresponding multiple failures have safety or environmental implications.

These continuous improvement programs encompass valuable modifications that can enhance plant performance. In numerous cases, these adjustments are shared across all methods. However, some alterations are exclusive to a single approach. The primary shortcomings of these strategies include the following:

         Insufficient emphasis on or incorporation of effective culture change, such as change management processes

         Absence of a comprehensive approach, with each method concentrating on a single function or activity within the plant

         The necessity for a permanent organisational structure to oversee the efforts


What should be done if a maintenance job description cannot be found?

What actions should be taken if no appropriate proactive task can be identified (default actions)? This question relates to unscheduled failure management policies, which involve deciding whether to allow an asset to run until failure or to alter some aspect of the asset's operating context, such as its design or operation method.

One-time changes are employed to reduce the failure rate or manage failures when appropriate proactive maintenance tasks are not identified or cannot effectively and efficiently manage the risk. The primary purpose of a one-time change is to modify the failure rate or failure pattern through:

         Equipment redesigns or modifications and/or

         Operational improvements.

One-time changes most effectively address equipment failure modes resulting from the following:

         Faulty design and/or material

         Improper fabrication and/or construction


         Maintenance errors

These failure mechanisms often lead to a wear-in failure characteristic, thus requiring a one-time change.

When no maintenance strategy can be found that is both applicable and effective in detecting or preventing failure, a one-time change should be considered. A one-time change is mandatory for failure modes with the highest risk. The following briefly describes each type of one-time change:

         Equipment redesign or modifications: Redesign or modifications involve physical changes to the equipment or system. An example would be adding drain valves to appropriate lengths of piping on a tanker's deck cargo piping to prevent freezing and damage to the piping during vessel transits in freezing temperatures.

         Operational improvements: Operational improvements may include modifications to the operation of the equipment and/or modifications to how maintenance is performed on the equipment. Operational improvements typically involve changing the operating context and procedures, providing additional training to the operator or maintainer, or any combination thereof. For example, in the case of a main propulsion engine with a non-continuous rating nameplate, the engine could be operated at a lower output closer to its continuous rating to reduce downtime for maintenance. (However, this action may cause the vessel to be unable to meet its schedules.) SERVICING AND ROUTINE INSPECTION

These tasks are designed to (1) ensure that the failure rate and failure pattern remain as predicted by performing routine servicing (e.g., lubrication) and (2) identify accidental damage and/or issues resulting from ignorance or negligence. In addition, they provide an opportunity to confirm that the overall maintenance standards are satisfactory. These tasks are not based on any explicit potential failure condition. Servicing and routine inspection may also be applied to items with minor failure consequences that should not be overlooked (such as minor leaks, drips, etc.).



