By Aleksandar Pudar
Technical Superintendent and Planned Maintenance Supervisor Reederei Nord BV
Co-founder of "Out of Box Maritime Thinker Blog" and Founder of Narenta Consilium Group
In marine engineering, a reliability-centred
maintenance (RCM) approach systematically identifies the functions, functional
failures, and likely causes of failures for various assets on a vessel. It also
assesses the effects of potential failure modes and determines the significance
of these effects. With this information, the RCM selects the most suitable
asset management policy to optimise system performance, safety, and
reliability.
RCM considers all possible asset management
options in the marine engineering context, such as on-condition tasks,
scheduled restoration tasks, scheduled discard tasks, failure-finding tasks,
and one-time changes. One-time changes encompass modifications to various aspects
of the marine engineering asset, including hardware design, operating
procedures, personnel training, and other factors beyond maintenance. This
comprehensive consideration of asset management options sets RCM apart from
other maintenance development processes.
2.5.1 DEFINING
RELIABILITY-CENTERED MAINTENANCE (SEVEN QUESTIONS ADDRESSED BY RCM)
RCM is a
process of systematically analysing an engineered system to understand the
following:
·
Its functions
·
The failure modes of its equipment that support these functions
·
How then to choose an optimal course of maintenance to prevent the
failure modes from occurring or to detect the failure mode before a failure
occurs
·
How to determine spare holding requirements
·
How to periodically refine and modify existing maintenance over time
The objective of RCM is to achieve reliability
for all of the operating modes of a system.
An RCM analysis, when properly conducted, should
answer the following seven questions:
i.
What are
the system functions and associated performance standards?
ii.
How can the
system fail to fulfil these functions?
iii.
What can
cause a functional failure?
iv.
What
happens when a failure occurs?
v.
What might
the consequence be when the failure occurs?
vi.
What can be
done to detect and prevent failure?
vii.
What should
be done if a maintenance job description cannot be found?
Typically, the following tools and expertise are
employed to perform RCM analyses:
·
Failure modes, effects and criticality analysis (FMECA). This
analytical tool helps answer Questions 1 through 5.
·
RCM decision flow diagram. This diagram helps answer Questions 6 and
7.
·
Design, engineering and operational knowledge of the system
·
Condition-monitoring techniques
·
Risk-based decision-making
(e.g., the frequency and the consequence of a failure in terms of its impact on
safety, the environment and commercial operations)
Documenting and implementing the following
formalise this process:
·
The analyses and the decisions taken
·
Progressive improvements based on operational and maintenance
experience
·
Clear audit trails of maintenance actions taken and improvements made
·
Once these are documented and implemented, this process will
effectively ensure an engineered system's reliable and safe operation.
2.5.1.i FUNCTIONS -
What can be done to detect and prevent
the failure?
Determining the necessary maintenance strategies
for an asset within its current operating context requires identifying the
functions and their associated desired standards of performance. To effectively
identify these functions and standards, the following criteria must be
satisfied:
•
Define the operating context of the asset: Establish the conditions
under which the asset is used, including environmental factors, workload, and
any other relevant parameters.
•
Identify all functions of the asset or system: Ensure that all primary
and secondary functions are recognised, including the roles of all protective
devices.
•
Create function statements with a verb, object, and performance
standard: Each function statement should contain a straightforward action,
target, and quantifiable performance standard whenever possible.
•
Establish desired performance standards based on the owner or user's
expectations: The function statements should incorporate performance standards
that reflect the desired level of performance by the asset or system's owner or
user in its operating context.
The operating context refers to the circumstances
under which a marine engineering asset operates. For example, identical
hardware may not require the same failure management policy across all
installations or applications. For instance, a solitary pump within a system
may require a different failure management policy than a pump that is part of
multiple redundant units. Similarly, a pump handling corrosive fluids typically
needs a distinct policy from one that transports benign fluids. Often
overlooked protective devices must also be considered in the RCM process by
identifying their functions.
Ultimately, it is the responsibility of the
vessel owner or operator to determine the desired performance level that the
maintenance program should maintain. By understanding and accounting for the
unique operating context of each marine engineering asset, a tailored and
efficient maintenance strategy can be developed, ensuring optimal performance
and reliability.
2.5.1.ii FUNCTIONAL
FAILURES - How can the system fail to
fulfil these functions?
The criterion for this question is singular and
straightforward: to identify all the failed states related to each function. If
functions are well-defined, listing functional failures is relatively easy. Identifying
these failures is crucial in understanding potential risks and implementing
preventive measures to ensure system reliability and safety. Here are some
examples of functional failures in the context of marine engineering:
Propulsion system:
a.
Engine failures: Mechanical or electrical issues with the engine lead
to loss of propulsion or reduced power output.
b.
Fuel system problems: Contamination, leakage, or blockage in the fuel
system impacts the engine's performance or causes shutdowns.
c.
Cooling system malfunctions: Failures in the cooling system cause
overheating, which can damage components and affect the engine performance.
d.
Transmission and shaft issues: Problems with gearboxes, shafts, or
couplings impact the power transfer from the engine to the propeller.
Electrical system:
a.
Generator failures: Inability to generate sufficient electrical power
due to equipment malfunction or fuel shortage.
b.
Distribution failures: Power distribution issues to onboard systems,
such as switchboard malfunctions, damaged cables, or circuit breaker issues.
c.
Power quality issues: Voltage fluctuations or frequency deviations
that can damage electrical equipment or disrupt operations.
d.
Battery issues: Inadequate charging, capacity loss, or failure of
onboard batteries, affecting the performance of critical systems.
Navigation and
communication system:
a.
Equipment failure: Malfunctions in navigation or communication
devices, such as GPS, radar, or radios, hinder safe and efficient operations.
b.
Software failure: Bugs or errors in system software cause incorrect
data display or unexpected behaviour.
c.
Signal interference: Electromagnetic or atmospheric conditions disrupt
the signal reception or transmission.
Hull and structural
system:
a.
Corrosion: Deterioration of the hull or structural components due to
exposure to seawater, leading to reduced structural integrity.
b.
Fatigue: Material failure due to repetitive loading or stress, causing
cracks or fractures in structural components.
c.
Leakage or flooding: Damage to hull plating, seals, or watertight
compartments leads to water ingress, affecting buoyancy and stability.
Auxiliary systems
(e.g., HVAC, bilge, and ballast):
a.
Equipment failures: Mechanical or electrical issues with pumps,
compressors, or valves result in disruptions to the operation of auxiliary
systems.
b.
Piping failures: Leakage or blockage in the piping system impacts the
proper functioning of auxiliary systems.
c.
Control system malfunctions: Failures in control systems lead to
incorrect operation or reduced efficiency of auxiliary systems.
2.5.1.iii FAILURE
MODES - What can cause a functional failure?
What causes each functional failure (failure
modes)? Understanding the causes of each functional failure (failure modes) in
marine engineering is crucial for developing effective maintenance strategies.
In FMECA, the term "failure mode" is used, similar to how RCM uses
"functional failure," but the RCM community defines failure mode as
the event that causes a functional failure. The standard criteria for a process
that identifies failure modes include the following:
·
Identifying all reasonably probable failure modes that can cause each
functional failure.
·
Employing a method to determine what constitutes a reasonably probable
failure mode, which must be acceptable to the owner or user of the marine
asset.
·
Identifying failure modes at a level of causation enables the
selection of an appropriate failure management policy.
·
Included in the list of failure modes are those that have occurred
before, those currently being prevented by existing maintenance programs, and
those that have not yet happened but are considered reasonably likely
(credible) within the operating context.
·
Incorporating in the list of failure modes any event or process likely
to cause a functional failure, such as deterioration, human error by operators
or maintainers, and design defects.
As the most comprehensive analytical process for
developing maintenance programs and managing physical assets, RCM is
well-suited to identify every reasonably likely failure mode in marine
engineering applications. By thoroughly examining these failure modes, vessel
operators can optimise maintenance strategies, improve system reliability, and
enhance overall performance.
2.5.1.iv FAILURE
EFFECTS - What happens when a failure
occurs?
What happens when failures occur (failure
effects)? Understanding
the consequences of failures, known as failure effects, is crucial. The
criteria for identifying failure effects include the following:
•
Describing failure effects as what would happen if no specific task
were carried out to anticipate, prevent, or detect the failure. Failure effects
should encompass all necessary information to evaluate the consequences of the
failure, such as:
o
The indicatory evidence (if any) that the failure has occurred (for
hidden functions, consider the consequences of multiple failures occurring).The
potential impact on human safety, such as causing injury or death or adversely
affecting the environment.
o
The adverse effects (if any) on vessel performance or operations.
o
The physical damage (if any) resulting from the failure.
o
The actions (if any) required to restore the system's function after
the failure. FMECA
typically characterises failure effects by examining their impacts at the local
level, subsystem level, and system level. Furthermore, it addresses the
necessary actions to restore the system's functionality following a failure.
·
Failure Modes, Effects, and Criticality Analysis (FMECA) typically
describe failure effects in terms of their impact at the local, subsystem, and
system levels.
2.5.1.v FAILURE
CONSEQUENCES - What might the
consequence be when the failure occurs?
Understanding the significance of each failure (failure consequences) is crucial for effective maintenance planning. The standard's criteria for a process that identifies failure consequences are :
·
Assessing failure consequences as if no specific task is currently
being performed to anticipate, prevent, or detect the failure.
·
Formally categorising the consequences of every failure mode:
o
Separating hidden failure modes from evident failure modes in the
categorisation process.
o
Clearly distinguishing events (failure modes and multiple failures)
with safety and environmental consequences from those that only have economic
consequences, such as operational and non-operational consequences.
RCM evaluates failure consequences, assuming that
no preventive measures are in place. As a result, some may be tempted to argue
that a failure does not matter because a specific action "always"
protects against it. On the contrary, RCM evaluates the assumed protective
action's effectiveness and meticulously justifies the effort required. Furthermore,
it systematically categorises failure consequences by assigning each failure
mode to one of four groups: hidden, evident safety/environmental, evident
operational, and evident non-operational.
2.5.1 vi EQUIPMENT
FAILURE - What can be done to detect
and prevent failure?
A loss of system function in marine engineering
systems can result from equipment failures and/or human errors. Equipment
failure can typically be attributed to the following factors:
·
Design error
·
Faulty material
·
Improper fabrication and construction
·
Improper operation
·
Inadequate maintenance
·
Maintenance errors
2.5.1.1 EQUIPMENT
FAILURE RATE AND PATTERNS
Developing
an effective failure management strategy requires understanding the failure
mechanism. Equipment may exhibit various failure modes (e.g., how the equipment
fails). Furthermore, each failure mode's failure mechanism might vary
throughout the equipment's lifespan.
Depending
on the dominant system failure mechanisms, system operation, operating
environment, and maintenance, specific equipment failure modes exhibit diverse
failure rates and patterns. Failure rate statistics are expressed regarding
operating time or another pertinent parameter before equipment failure. Failure
density distributions often predict an item's failure after a working time.
A typical
failure distribution used to model equipment failures is the Weibull
distribution, employed when equipment exhibits a constant failure rate for part
of its life, followed by an increasing failure rate due to wear-out. Weibull
analysis is also used when there is limited failure data. For example, a
Weibull plot can help determine if the failure is due to infant mortality,
random failure, early wear-out, or wear-out, which helps determine an
appropriate maintenance strategy.
Mean
Time to Failure (MTTF) is another standard statistical measure. MTTF represents
the average life to failure for a specific equipment failure mode, helping to
determine when to perform specific maintenance tasks. For example, MTTF data
can help establish the rebuilding task interval if an equipment item requires
rebuilding.
Understanding
that equipment failure modes can exhibit different failure patterns has important
implications for determining appropriate maintenance strategies. For most
equipment failure modes, specific failure patterns may be unknown but are
unnecessary for making maintenance decisions. Instead, inevitable failure
characteristic information is needed to make maintenance decisions:
•
Wear-in failure – characterised by "weak" members related to
manufacturing defects and installation/maintenance/startup errors, also known
as "burn-in" or "infant mortality" failures.
•
Random failure – dominated by chance failures caused by sudden
stresses, extreme conditions, random human errors, etc. (unpredictable by
time).
•
Wear-out failure – dominated by end-of-useful life issues for
equipment
Identifying
which of the three equipment failure characteristics represents the equipment
failure mode helps determine the proper maintenance strategy. For example, rebuilding
or replacing the equipment item may be appropriate if an equipment failure mode
exhibits a wear-out pattern. However, replacing or rebuilding the equipment
item may not be advisable if an equipment failure mode is characterised by
wear-in failure.
Lastly,
a basic understanding of the failure rate helps determine whether maintenance
or equipment redesign is necessary. For example, equipment failure modes with
high failure rates (e.g., frequent failures) are often best addressed by
redesign rather than more frequent maintenance.
2.5.1.2
FAILURE MANAGEMENT STRATEGY.
Understanding failure rates and characteristics is
crucial for determining an appropriate strategy to manage failure modes (e.g.,
RCM refers to this as the failure management strategy). Developing and utilising
this understanding is fundamental to RCM and vital for enhancing equipment
reliability. For instance, it is no longer considered accurate that the more an
item is overhauled, the less likely it is to fail. Unless a dominant
age-related failure mode exists, age limits do little to improve the reliability
of complex items. Sometimes, scheduled overhauls can increase failure rates by
introducing infant mortality and/or human errors into otherwise stable systems.
In RCM, the failure management strategy can comprise the following:
•
Appropriate proactive maintenance tasks,
•
Equipment redesigns or modifications, or
•
Other operational improvements.
The proactive maintenance tasks in the failure
management strategy aim to (1) prevent failures before they occur or (2)
detect the onset of failures in sufficient time so that the failure can
be managed before it occurs. Equipment redesigns, modifications, and
operational improvements (RCM refers to these as one-time changes) attempt to
enhance equipment with high failure rates or for which proactive maintenance is
ineffective/inefficient.
The key issues in determining whether a specific
failure management strategy is effective are:
•
Is the failure management strategy technically feasible?
•
Is an acceptable level of risk achieved when the failure management
strategy is implemented?
•
Is the failure management strategy cost-effective?
In addition to proactive maintenance tasks and
one-time changes, servicing tasks and routine inspections may be essential to
the failure management strategy. These activities help ensure that the
equipment failure rate and failure characteristics are as expected. For
example, the failure rate and failure pattern for a bearing drastically change
if it is not adequately lubricated.
2.5.1.3 PROACTIVE
MAINTENANCE TASKS
Proactive maintenance tasks can be divided into
four categories:
I.
PLANNED MAINTENANCE TASKS
Planned maintenance tasks (sometimes called
preventative maintenance) are performed at specified intervals, regardless of
the equipment's condition. The purpose of these tasks is to prevent functional
failure before it occurs. They are often applied when no condition-monitoring
task is identified or justified, and a wear-out region characterises the
failure mode. RCM further divides planned maintenance into two subcategories:
•
Restoration task: A scheduled task performed at or before a predetermined interval (age
limit) to restore an item's capability, providing an acceptable probability of
functioning until the end of another specified interval. For instance,
rebuilding fuel injectors in a diesel engine can be a restoration task.
•
Discard task: A scheduled task carried out at or before a specified age limit that
requires disposing of an item, regardless of its condition. It is important to
note that "restoration" and "discard" can apply to the same
task. For example, when replacing a diesel engine's cylinder liners with new
ones at fixed intervals, the task can be described as a scheduled discard of
the cylinder liner or a scheduled restoration of the diesel engine.
II.
CONDITION-MONITORING TASKS
A condition-monitoring task is a scheduled task
used to detect the onset of a failure so that action can be taken to prevent
the functional failure. A potential failure is an identifiable condition
indicating that a functional failure is imminent or in progress.
Condition-monitoring tasks should only be chosen when a detectable potential
failure condition will exist before failure. When choosing maintenance tasks,
condition-monitoring tasks should be considered first unless an observable
potential failure condition cannot be identified. Condition-monitoring tasks
are also referred to as "predictive maintenance." Section 4 provides
additional details.
III. COMBINATION OF
TASKS
When neither condition-monitoring nor planned
maintenance tasks alone seem capable of reducing the risks of the functional
failure of the equipment, it may be necessary to select a combination of both
maintenance tasks. This approach is usually used when the condition-monitoring
or planned maintenance task is insufficient to achieve an acceptable risk.
IV. FAILURE-FINDING
TASKS
A failure-finding task is scheduled to detect
hidden failures when no condition-monitoring or planned maintenance task is
applicable. It is a scheduled function check to determine whether an item will
perform its required function if called upon. Most of these items are standby
or protective equipment. An example would be checking the safety valve on a boiler.
A failure-finding task is a scheduled task
designed to identify whether a specific hidden failure has taken place. These
tasks are typically applied to protective devices that may fail without
warning. This task aims to bridge the gap between the sixth (proactive tasks)
question and the seventh (default actions or actions taken without proactive
tasks). While failure-finding tasks share the scheduling aspect with proactive
tasks, they are not proactive; they neither predict nor prevent failures. Furthermore,
failure-finding tasks are not proactive; instead, they aim to detect failures
that have already occurred to minimise the possibility of multiple failures or
the failure of a protected function when a protective device is already in a
failed state. These tasks represent a transition from the sixth (proactive)
question to the seventh (default actions or measures implemented when proactive
tasks are absent).
2.5.1.4 RUN-TO-FAILURE
Run-to-failure is a failure management strategy
that allows an equipment item to operate until failure occurs, at which point a
repair is made. This maintenance strategy is acceptable only if the risk of
failure is acceptable without any proactive maintenance tasks. An example would
be allowing a local pressure gauge to fail on a cooling water line equipped
with a remote-reading pressure gauge.
When considering a run-to-failure decision for an
asset, before accepting the decision, the following criteria should be
considered:
•
For hidden failures with no appropriate scheduled task, the associated
multiple failures must not have safety or environmental consequences.
•
The related failure mode must not pose a safety or environmental risks
for evident failures with no suitable scheduled task. In other words, the
process should not allow users to opt for a "run to failure" strategy
if the failure mode or (in the case of a hidden failure) the corresponding
multiple failures have safety or environmental implications.
These continuous improvement programs encompass valuable modifications
that can enhance plant performance. In numerous cases, these adjustments are
shared across all methods. However, some alterations are exclusive to a single
approach. The primary shortcomings of these strategies include the following:
•
Insufficient emphasis on or incorporation of effective culture change,
such as change management processes
•
Absence of a comprehensive approach, with each method concentrating on
a single function or activity within the plant
•
The necessity for a permanent organisational structure to oversee the
efforts
2.5.1.vii DEFAULT ACTIONS - ONE-TIME CHANGES
What should be done if a maintenance job
description cannot be found?
What actions should be taken if no appropriate
proactive task can be identified (default actions)? This question relates to
unscheduled failure management policies, which involve deciding whether to
allow an asset to run until failure or to alter some aspect of the asset's
operating context, such as its design or operation method.
One-time changes are employed to reduce the
failure rate or manage failures when appropriate proactive maintenance tasks
are not identified or cannot effectively and efficiently manage the risk. The
primary purpose of a one-time change is to modify the failure rate or failure
pattern through:
•
Equipment redesigns or modifications and/or
•
Operational improvements.
One-time changes most effectively address
equipment failure modes resulting from the following:
•
Faulty design and/or material
•
Improper fabrication and/or construction
•
Misoperation
•
Maintenance errors
These failure mechanisms often lead to a wear-in
failure characteristic, thus requiring a one-time change.
When no maintenance strategy can be found that is
both applicable and effective in detecting or preventing failure, a one-time
change should be considered. A one-time change is mandatory for failure modes
with the highest risk. The following briefly describes each type of one-time
change:
•
Equipment redesign or modifications: Redesign or modifications involve
physical changes to the equipment or system. An example would be adding drain
valves to appropriate lengths of piping on a tanker's deck cargo piping to
prevent freezing and damage to the piping during vessel transits in freezing
temperatures.
•
Operational improvements: Operational improvements may include
modifications to the operation of the equipment and/or modifications to how
maintenance is performed on the equipment. Operational improvements typically
involve changing the operating context and procedures, providing additional
training to the operator or maintainer, or any combination thereof. For
example, in the case of a main propulsion engine with a non-continuous rating
nameplate, the engine could be operated at a lower output closer to its
continuous rating to reduce downtime for maintenance. (However, this action may
cause the vessel to be unable to meet its schedules.)
2.5.1.5 SERVICING
AND ROUTINE INSPECTION
These tasks are designed to (1) ensure that the
failure rate and failure pattern remain as predicted by performing routine
servicing (e.g., lubrication) and (2) identify accidental damage and/or issues
resulting from ignorance or negligence. In addition, they provide an
opportunity to confirm that the overall maintenance standards are satisfactory.
These tasks are not based on any explicit potential failure condition.
Servicing and routine inspection may also be applied to items with minor
failure consequences that should not be overlooked (such as minor leaks, drips,
etc.).
References & Bibliography:
1.
Simatupang, J., Harahap, R. and Simatupang, J.,
2021. Determination of Maintenance Task on Tanker Vessel's Marine Boiler Using
Reliability Centered Maintenance (RCM) II Method. ResearchGate.
Available at: https://www.researchgate.net/publication/354347868_Determination_of_Maintenance_Task_on_Tanker_Vessel's_Marine_Boiler_Using_Reliability_Centered_Maintenance_RCM_II_Method
[Accessed 2 August 2024].
2.
Simatupang, J., Harahap, R. and Simatupang, J.,
2021. The Combination of Reliability and Predictive Tools to Determine Ship
Engine Performance based on Condition Monitoring. ResearchGate.
Available at: https://www.researchgate.net/publication/350303270_The_Combination_of_Reliability_and_Predictive_Tools_to_Determine_Ship_Engine_Performance_based_on_Condition_Monitoring
[Accessed 2 August 2024].
3.
Harahap, R., Simatupang, J. and Simatupang, J.,
2021. Application of Reliability-Centered Maintenance for Tugboat Kresna 315
Cooling Systems. ResearchGate. Available at: https://www.researchgate.net/publication/350148670_Application_of_Reliability-Centered_Maintenance_for_Tugboat_Kresna_315_Cooling_Systems
[Accessed 2 August 2024].
4.
ATPM Co., Ltd., n.d. Reliability. Available at: http://www.atpm.co.kr/5.mem.service/6.data.room/data/treatise/5.reliability/5.reliability_01.pdf
[Accessed 2 August 2024].
5.
Stević, M. and Radojević, V., 2008. Increasing ship
operational reliability through the implementation of a holistic maintenance
management strategy. Academia.edu. Available at: https://www.academia.edu/962903/Increasing_ship_operational_reliability_through_the_implementation_of_a_holistic_maintenance_management_strategy
[Accessed 2 August 2024].
6.
American Bureau of Shipping, 2018. Reliability-Centered
Maintenance. Available at: https://ww2.eagle.org/content/dam/eagle/rules-and-guides/current/design_and_analysis/132_reliabilitycenteredmaintenance/rcm-gn-aug18.pdf
[Accessed 2 August 2024].