LinkedIn

Showing posts with label SHIPPING. Show all posts
Showing posts with label SHIPPING. Show all posts

Monday, January 6, 2025

Maintenance-Repairs & Spare Parts (Stores) Management

 Effective Maintenance, Repair, and Operations (MRO) management is essential for optimizing vessel performance. Proper MRO management ensures inventory accuracy, quick access to quality spare parts, and seamless integration between the technical department, onboard engineers, purchasers, and supervisors. By addressing challenges, adopting best practices, and leveraging technology, vessel operators can reduce costs, improve efficiency, and minimize downtime.


Objectives of Inventory Management

The primary goals of inventory management include:

  • Reducing operational cycle time by optimizing lead times, repair, transportation, and delivery processes.
  • Lowering inventory costs through efficient stock control, reducing carrying costs, and minimizing expedited freight expenses.
  • Improving inventory accuracy ensures spare parts are readily available when and where needed.

Types of Inventory

Vessel inventory falls into three main categories:

  1. MRO Supplies: Spare parts and consumables supporting maintenance operations.
  2. Hardware: Tools, fasteners, vendor-managed stock, and consumables.
  3. Facility Supplies: Office equipment and janitorial essentials.

Efficient MRO management helps minimize costs, enhance operational reliability, and reduce downtime.


Best Practices for Inventory Management

  1. Preventive Maintenance for Stored Spare Parts: Regularly maintain critical components like gaskets, belts, and O-rings to ensure readiness.
  2. Vendor-Managed Inventory (VMI): Partner with suppliers to manage stock levels effectively.
  3. Cycle Counting: Use methods like ABC classification to maintain inventory accuracy.
  4. Obsolete Inventory Removal: Conduct monthly reviews to eliminate non-essential items.
  5. Salvage Operations: Reuse or recycle obsolete or scrapped materials.
  6. Streamlined Storeroom Layouts: Optimize layouts for efficient access and storage.
  7. FIFO Stocking: Use First-In, First-Out methods to avoid expired inventory.
  8. Defined Receiving Processes: Inspect and document incoming spare parts for quality control.
  9. Inventory Dashboards: Track key performance indicators (KPIs) for real-time insights.
  10. Secure Storage: Maintain physical security to protect inventory.

 Barriers to Effective Inventory Management

Key challenges include:

  • Inconsistent inventory classification methods.
  • Lack of standardized workflows and KPIs.
  • Poor coordination between departments.
  • Unaddressed obsolete materials in Computerized Maintenance Management Systems (CMMS).
  • Inefficient purchasing practices.

Addressing these barriers requires robust workflows, better inter-departmental communication, and modern inventory management tools.


Inventory Control Techniques

Effective control minimizes downtime, stockouts, and overstocking by:

  • Maintaining optimal inventory levels based on usage and historical data.
  • Securing inventory locations to prevent theft and damage.
  • Leveraging forecasting tools for better stock management.

Replacement Asset Value (RAV)

The Replacement Asset Value (RAV) of a vessel influences inventory decisions. Best practices suggest maintaining inventory levels at 0.50% to 0.75% of the RAV to balance operational readiness and cost-effectiveness.


Risk Management

Stocking critical parts requires:

  • Conducting Failure Modes and Effects Analysis (FMEA).
  • Evaluating supplier reliability and part availability.
  • Implementing preventive maintenance programs for high-risk components.

Loss Prevention in Inventory Management

To prevent losses due to theft, damage, or mismanagement:

  • Implement quality checks during receiving.
  • Enforce strict security measures.
  • Regularly audit inventory accuracy.
  • Maintain appropriate storage conditions for sensitive items.

Criticality Analysis

Using the equipment bill of materials (BOM), prioritize spare parts by their impact on operations. Components like bearings, electronic boards, and hazardous materials require special attention to meet safety and operational standards.


Efficient Stocking Levels

Define service levels for critical spares (100% availability), insurance spares (98%), and standard components (90–95%). Supplier accountability and minimum/maximum level reviews are essential for maintaining efficiency.


Carrying Costs

Carrying costs encompass interest, insurance, taxes, and storage expenses. Reducing slow-moving or obsolete inventory significantly lowers these costs while improving operational efficiency.


Role of a Storeroom Supervisor

Supervisors play a pivotal role by:

  • Ensuring storeroom cleanliness and organization.
  • Coordinating with maintenance for parts planning.
  • Monitoring inventory levels and maintaining KPIs.

Leveraging Technology for MRO Management

  1. Barcoding Systems: Enable real-time inventory tracking, reduce errors, and streamline processes.
  2. Machine Learning: Improve accuracy in demand forecasting and inventory classification.
  3. Computerized Maintenance Management Systems (CMMS): Enhance EAM integration for predictive maintenance and inventory control.

Storeroom Optimization Techniques

  1. Adopting 5S Practices: Sort, systemize, standardize, shine, and sustain for streamlined operations.
  2. Satellite Stores: Use strategically placed satellite storerooms to reduce downtime.
  3. Advanced Storage Solutions: To better utilize space, employ demand flow racks, sliding shelving, and carousel systems.

Continuous Improvement and KPIs

Measure KPIs such as inventory accuracy, stock-out rates, and carrying costs regularly to identify areas for improvement. Then, use these metrics to implement changes that enhance efficiency and reliability.


Conclusion

Adequate MRO inventory and spare parts management are foundational to vessel performance and cost control. By leveraging best practices, adopting modern technologies, and fostering cross-departmental collaboration, vessel operators can ensure optimal performance, reduced costs, and enhanced operational reliability.

Frequently Asked Questions (FAQs)

What is MRO inventory management in vessel operations?
MRO (Maintenance, Repair, and Operations) inventory management involves maintaining and controlling spare parts, tools, and supplies required for vessel maintenance and operations. Effective MRO management ensures vessel reliability, minimizes downtime, and optimizes costs.

How does preventive maintenance help in inventory management?
Preventive maintenance ensures that stored spare parts, such as gaskets, belts, and O-rings, remain in optimal condition. It reduces the risk of part failure during usage and helps maintain readiness for planned or emergency repairs.

What are the key benefits of using barcoding for inventory management?
Barcoding improves accuracy in tracking inventory, streamlines the receiving and issuing process, and enhances employee productivity. It reduces errors associated with manual data entry and provides real-time updates to inventory systems.

Why is the Replacement Asset Value (RAV) important in inventory planning?
RAV provides a baseline for determining the optimal value of MRO inventory to stock. Best practices recommend maintaining inventory levels at 0.50% to 0.75% of the RAV to balance availability and cost.

What role does a storeroom supervisor play in vessel management?
The storeroom supervisor ensures that inventory is organized, accessible, and maintained to support maintenance activities. They also coordinate with purchasing, planning, and technical teams to optimize inventory levels and improve operational efficiency.

How can technology improve MRO inventory management?
Technologies like barcoding, machine learning, and Computerized Maintenance Management Systems (CMMS) enhance accuracy, enable predictive maintenance, and streamline workflows. These tools help in real-time inventory tracking, demand forecasting, and efficient resource allocation.

Disclaimer:

Out of Box Maritime Thinker © by Narenta Gestio Consilium Group 2022 and Aleksandar Pudar assumes no responsibility or liability for any errors or omissions in the content of this paper. The information in this paper is provided on an "as is" basis with no guarantees of completeness, accuracy, usefulness, or timeliness or of the results obtained from using this information. The ideas and strategies should never be used without first assessing your company's situation or system or consulting a consultancy professional. The content of this paper is intended to be used and must be used for informational purposes only.

Wednesday, December 18, 2024

2.8 TOTAL PRODUCTIVE MAINTENANCE ON VESSELS

 

TPM assigns preventive and routine maintenance responsibility to those who operate the individual equipment. In the maritime context, autonomous maintenance by operators is essential due to the decreasing number of crew on board; this creates a sense of ownership of the equipment in the operator, and the best way to implement it would be to have dual competency marine officers (MAROF) on fixed rotation, removing the concept of separate navigation and engineer officers. MAROFs could prevent equipment deterioration through correct operations and daily checks, bring the equipment to its ideal state through restoration and proper management, and establish the conditions needed to keep the equipment well-maintained. Furthermore, riding teams of repair fitters and technicians could undertake major maintenance as and when required under the supervision of MAROFs; this would optimise the use of skilled officers on board and improve their job satisfaction.

Training is another aspect that needs to be focused on to ensure adequate maintenance management in the marine industry. While maintenance is an essential shipboard activity, there is no training imparted in maintenance management either as a subject or as part of one mandatory course. Most countries do not examine candidates for this topic in their competency exams either. However, IMO has developed a model course on maintenance planning and execution, which includes maintenance planning and control. While RCM is not mentioned, it has a similar theme to the extent that the different failure profiles noted in the aviation industry are also shown; this is another indication of the relevance of RCM in the maritime context. Private maritime training institutes could deliver this training, but they are commercial ventures and must generate profit to sustain themselves. Maintenance management could be a topic for competency exams or a mandatory course to create awareness.

 

In the most basic sense, the three goals of TPM are zero unplanned failures (no small stops or slow running), zero product defects, and zero accidents. To achieve these goals, TPM must be applied in the proper amounts and situations and be integrated with the manufacturing system and other improvement initiatives. Therefore, adequate implementation of TPM will not only improve the capability of the equipment but also make the maintenance practice more efficient, making the business more profitable and competitive.

TPM aims to cultivate a sense of ownership in the equipment operator, which is essential for the success of any maintenance program. A comparison can be made between how RCM and TPM deal with the issue of variation in failure intervals. RCM advocates for the use of condition-based maintenance whenever possible and feasible to address this issue, while TPM attempts to stabilise failure intervals by:

1)       establishing basic conditions through cleaning, lubricating, and tightening;

2)       identifying abnormalities and restoring deterioration;

3)       clarifying operating conditions and complying with conditions of use;

4)       eliminating or controlling major contamination sources;

5)       establishing daily checking and lubricating standards; and

6)       Introducing extensive visual control.

These recommendations are highly relevant in the shipping industry, and TPM can facilitate RCM implementation. To a considerable extent, ships' formal safety assessment (FSA) has a similar approach to RCM. The difference could be that FSA considers all hazards, while RCM mainly concerns functional failures. Nevertheless, FSA is a methodology successfully approved for rule-making purposes by the IMO and provides insights on how RCM should be applied in ship operations.


2.8.1 TOTAL PRODUCTIVE MAINTENANCE CONCEPTS

The concepts of TPM can be grouped into three categories: autonomous maintenance, planned maintenance, and maintenance reduction.

2.8.1.1 AUTONOMOUS MAINTENANCE

Autonomous maintenance in the marine industry involves using the crew to perform routine maintenance tasks on the vessel's equipment, including daily cleaning, inspecting, tightening, and lubricating. The crew is familiar with the equipment and can quickly detect any anomalies. Effective implementation of autonomous maintenance requires education and training of manufacturing and maintenance staff, crew, and management on the concepts of TPM, the benefits of autonomous maintenance, and special safety awareness training for equipment operators.

Visual controls can simplify inspection tasks and minimise training. Equipment is labelled to make identifying normal versus abnormal conditions more manageable, and inspections are documented on check sheets, including a map of the area and the appropriate inspection route. The equipment operators also collect daily information on equipment health, including downtime, product quality, and any maintenance performed. This information helps identify signs that equipment is beginning to degrade and may need significant maintenance.

While implementing autonomous maintenance, addressing concerns such as asking equipment operators to assume additional responsibilities and maintenance staff to give up part of their responsibilities is essential. Management must communicate their support for the new maintenance approach and provide opportunities for the maintenance staff to assume new responsibilities. Ideally, the maintenance staff will be free from their daily firefighting activities and focus on planned maintenance, equipment analysis, and design activities. These changes in roles and responsibilities need to be developed.

 


2.8.1.2 PLANNED MAINTENANCE

By removing some of the routine maintenance tasks through autonomous maintenance, the maintenance staff can start working on proactive equipment maintenance. Planned maintenance activities (preventive maintenance) are scheduled to repair equipment and replace components before they break down; this requires the production schedule to accommodate planned downtime to perform equipment repairs and allow these repairs to be treated as a priority on

par with running the equipment to produce parts. The prevailing theory is that as the planned maintenance goes up, the unplanned maintenance (breakdowns) goes down, and the total maintenance costs go down.

After implementing TPM, the equipment is likely to receive better care. The vessel owner, operator, and crew should determine the appropriate amount of planned maintenance as a team based on the equipment's health and the type of vessel and equipment on board. Excessive maintenance can be as expensive as insufficient maintenance, and therefore, a balance point needs to be determined by a thorough analysis of the equipment.


A detailed understanding of the equipment, including the equipment component level, is required to perform planned maintenance properly. The process should start with the vessel and critical features and extend to the equipment, equipment processes, and process parameters. Once the technical department on shore and the engine department on board have identified the critical process parameters, they should validate them and determine the proper parameter settings, which can be accomplished through the design of experiments (DOE). These DOE can identify the process parameters that provide the most significant leverage for improving equipment performance linked to critical vessel features, and this process can be aided by the Original Equipment Manufacturer (OEM).

Planned maintenance uses data from process capability and machine capability studies to establish acceptable performance levels. Process capability studies evaluate the equipment's ability to operate without breakdowns, while machine capability studies analyse the equipment's ability to perform specific operations and compare the results to industry standards. Regularly conducting these studies can provide indicators that the equipment's performance is deteriorating and the vessel's performance is likely to decline or have a breakdown soon. This data can also be stored in a maintenance database to analyse similar equipment or equipment components for chronic issues, reducing the problems associated with making decisions with insufficient data.



2.8.1.3 MAINTENANCE REDUCTION

The final TPM concept involves equipment design and predictive maintenance, which aim to decrease the total amount of required maintenance. Collaborating with original equipment makers  (OEM) enables the integration of knowledge gained from equipment maintenance into the design of the next generation of equipment. This design for maintenance approach results in easier maintenance equipment, such as easy-to-reach lubrication points and access covers to inspection points. It can be immediately supported with autonomous maintenance. The supplier can also include the visual control markings and labels the customer uses. This collaboration can be used to establish equipment performance criteria, which can serve as an acceptance test for the equipment. Additionally, the supplier can provide data on their components to determine the required frequency of inspections and planned maintenance.

Another way to reduce the required maintenance is through special equipment analysis, which collects data to predict equipment failures. Examples of such analysis include thermography, ultrasound, and vibration analysis. Thermography detects equipment "hot spots" related to bearing wear, poor lubrication, or plugged coolant lines. Ultrasound analysis detects minute cracks in the equipment, and vibration analysis detects unusual equipment vibration. These equipment analyses can be performed periodically, and the frequency can be adjusted as historical data shows trends. They are also helpful in finding the causes of chronic problems that cannot be eliminated with the data collected by the operator's inspections and regular planned maintenance.

2.8.1.4 EQUIPMENT EFFECTIVENESS

The term equipment effectiveness is commonly used to refer only to the availability or up-time of the equipment, which is the percentage of time it is up and operating. However, the actual effectiveness of equipment depends not only on its availability but also on its performance and rate of quality. A key objective of TPM is to maximise equipment effectiveness by reducing waste in the manufacturing process. The three factors that determine equipment effectiveness are equipment availability, performance efficiency, and quality rate, and these factors are used to calculate the overall equipment effectiveness (OEE) measure, which is discussed further in the TPM Metrics section.

 

2.8.1.5 EQUIPMENT AVAILABILITY

Equipment availability is crucial on a well-functioning vessel, but it may mean something other than the equipment must always be available. In a synchronised system, for example, having an auxiliary engine running when power is not needed results in fuel oil overconsumption. However, when there is a need to increase power, the equipment must be capable of satisfying the demand. Therefore, the equipment management program should balance the costs of maximising equipment utilisation with the costs of storing extra fuel and spares needed for running extra auxiliary engines.

Equipment availability is affected by both scheduled and unscheduled downtime. In a well-functioning system, unplanned downtime is minimised, while planned downtime is optimised based on need levels and the equipment's ability to change as per user needs. The appropriate need evel depends on how often the equipment is down for scheduled and unscheduled repairs.

Unexpected breakdowns are the most common cause of lost equipment availability, affecting crew maintenance times. Keeping backup systems available can minimise the effect of lost availability, but this is only sometimes cost-effective. Another drain on equipment availability is the time required to change over equipment to run. Performance efficiency is another factor that contributes to overall equipment effectiveness.

2.8.1.6 PERFORMANCE EFFICIENCY

Equipment efficiency is a commonly used metric when evaluating a propulsion system ( main engine) or any other system on board. It is typically maximised by running the equipment at its highest speed for as long as possible to increase performance. Conversely, efficiency is reduced by idling time (waiting for orders), minor stops, and lower throughput from running the equipment at a reduced speed. These efficiency losses can be caused by low operator skill, worn equipment, or poorly designed systems.

However, relying solely on equipment efficiency can lead to poor decision-making since the system may not benefit from the traditional goal of 100 per cent efficiency. The essential criteria are how many parts the equipment should be online, not how many can run at a breakneck pace. The target efficiency needs to consider what speed/output the equipment is designed to produce and what is needed as per requirements. When the equipment is up and running, it should be capable of running at its designed speed. However, when running is unnecessary, shut the equipment down and use this time to perform other tasks rather than slowing the equipment down to reduce throughput (e.g. the Auaxiliry engine). This occasional downtime can be valuable for performing autonomous maintenance, planned maintenance, and equipment analysis.

2.7.1.7 QUALITY RATE

An equipment/system on board's primary purpose is to provide propulsion to produce power, not just to keep equipment running or people busy. Therefore, the system only fulfills its purpose if the equipment is available and running at its designed speed but not performing as designed. In such cases, it is best to shut down the equipment to conserve energy and repair it if necessary. Quality losses can also result from long warm-up periods or waiting for other process parameters to stabilise, resulting in lost time, effort, and overconsumption.

Improving the quality rate should be linked to critical requirements. Having vessel performance on a perfect level or close to perfect in every aspect except the critical one that matters most to the customer is of little value. The concept of key characteristics helps align the essential features with the responsible equipment parameters. These are the parameters that need to be improved to maximise the overall system's benefit, and they should be measured when determining the quality rate of the equipment.

2.8.2 TPM METRICS

The additional data collection requirements in TPM are crucial for measuring progress and identifying opportunities for improvement. These metrics include equipment availability, performance efficiency, and quality rate, which calculate the equipment's overall effectiveness (OEE).

In addition to OEE, other metrics commonly used in TPM include the mean time between failures (MTBF), mean time to repair (MTTR), and the number of breakdowns. These metrics help identify chronic problems, evaluate the effectiveness of maintenance activities, and determine the root causes of equipment failures.

It is also essential to track the costs associated with maintenance activities, including planned maintenance, corrective maintenance, and autonomous maintenance. By analysing these costs, organisations can identify opportunities to reduce maintenance expenses and improve the overall efficiency of the maintenance program.

Finally, data on the frequency and severity of safety incidents should also be collected and analysed as part of the TPM program. This data can be used to identify safety risks and develop strategies to mitigate them, reducing the risk of injuries and accidents in the workplace.

2.8.2.1 OVERALL EQUIPMENT EFFECTIVENESS

The OEE is a widely used concept in TPM literature, and it is determined by multiplying the equipment availability, performance efficiency, and quality rate, as previously explained. The equipment operators collect the data required to calculate these values daily, including scheduled downtime, unscheduled downtime, and good or bad performance results. Implementing control charts on these values provides aggregate data that helps track changes in equipment performance. However, predefined thresholds should be established to determine when more detailed data collection is necessary to prevent catastrophic failure. A history of both OEE and more detailed data should be collected to identify undesirable events and their causes to set these thresholds. Although OEE provides a valuable benchmark for tracking TPM program progress, it does not provide enough detail to determine the root cause of equipment performance issues. Therefore, supplemental data is required to identify the cause of observed events.

2.8.2.1.1 SUPPLEMENTAL DATA COLLECTION

These methods are intended to provide more detailed information for problem-solving and decision-making than the aggregate measure of OEE.

One such method is statistical process control (SPC) data, which can be collected on critical product features to provide feedback to equipment operators on the repeatability of specific equipment operations. For example, if the process goes out of control, the SPC data should immediately alert the operator.

Another approach is to collect SPC data on critical process parameters such as feed, speed, temperature, and time. This requires first identifying the vital parameters and determining their optimal settings using design-of-experiments (DOE) techniques. Once established, the operators can collect data or continuously monitor the parameters' performance.

Due to limited data, special "short run" methods may be required for SPC equipment and process parameters analysis.

A process is to be considered

out of control when any of the following are detected in the control charts:

         One point is more than three standard deviations from the process mean

         Two out of three points are at least two standard deviations from the process mean

         Four out of five points are at least one standard deviation from the process mean

         Eight points in a row lie on the same side of the process, which means it is essential to balance providing too little or too much data for analysis.

The data should be easily monitored and analysed to provide benefits without becoming burdensome.

2.8.3 TPM IMPLEMENTATION

The implementation of TPM involves critical issues that must be considered. Based on the TPM concepts discussed earlier, the plan typically takes 3 to 5 years to complete. Below is a brief description of each TPM implementation activity.

2.8.3.1 MASTER PLAN

At this stage, the office technical team, management, and onboard engine team collaborate to determine the scope and focus of the TPM program. The equipment to be included in the program and their implementation sequence are selected. Baseline performance data is collected, and the program's objectives are established.

2.8.3.2 AUTONOMOUS MAINTENANCE

The office and onboard technical/engine team undergo training on TPM's methods, tools, and visual controls. The vessel engine crew cleans and inspects their equipment and performs maintenance work. The engine staff trains the operators ( on deck, bridge) on routine maintenance, and all parties collaborate in developing safety procedures. The vessel crew begins to collect data to assess equipment performance.


2.8.3.3 PLANNED MAINTENANCE

The engine crew, followed by the technical department on shore, analyses the data collected by equipment operators to determine maintenance requirements based on usage and need. If a system for tracking equipment performance metrics and maintenance activities is yet to be available, they create one. The maintenance schedules are integrated into the vessel schedule to avoid conflicts.

2.8.3.4 MAINTENANCE REDUCTION

The knowledge gained from collecting data and implementing TPM is shared with OEM, who incorporate this "design for maintenance" knowledge into the next generation of equipment designs. Additionally, the office and engine develop plans and schedules for periodic equipment analyses such as thermography, vibration and oil analysis. The results of this analysis are recorded in the maintenance database to create accurate estimates of equipment performance and repair requirements. These estimates are then used to develop spare part inventory policies and proactive replacement schedules to ensure that the necessary parts are available when needed.

2.8.3.5 HOLDING THE GAINS        

The organisation's standard operating procedures should include the new TPM practices. Integrating these practices and data collection activities with other production system elements is important to avoid duplication or conflicting requirements. The equipment management methods should be continuously improved to simplify tasks and reduce the effort required to maintain the TPM program.

2.8.3.5 SUMMARY

Total Productive Maintenance (TPM) is an approach to equipment management that emphasises collaboration and teamwork to enhance equipment effectiveness. Successful implementation requires shared responsibilities, full employee involvement, and natural work groups. The analogy to the average car owner can be used to explain the TPM approach. The owner (equipment operator) performs minor maintenance activities like checking the oil and tires and sometimes even gives the car a tune-up. However, if something major goes wrong, an expert auto mechanic (maintenance technician) is called to perform the complex tasks. However, most traditional organisations treat their equipment as a rental car, unlike the car owner who values and takes care of their vehicle.

While TPM is often implemented as a standalone improvement activity, it should be integrated with other vessel elements as an overall system for more significant benefits. For example, the synergy of inventory reduction, hardware variability control, and cycle time reduction can provide more significant benefits than the sum of their parts.

2.8.4 COMMON BARRIERS ENCOUNTERED

Several challenges and setbacks were encountered while implementing the TPM programs mentioned earlier. Although some were unique to specific vessel types, many barriers encountered were common to most TPM programs. The following briefly describes some of these challenges that affected the four sites mentioned earlier and other TPM programs.

2.8.4.1 STRATEGIC DIRECTION

One of the significant challenges to the success of TPM is the need for more strategic direction, which can be the most significant obstacle. Each organisation that decides to implement a TPM program creates a unique implementation plan. Consequently, each program is typically "owned" by one or two individuals within the organisation. If these individuals leave the group, the TPM program may experience a gradual decline in direction and support. Moreover, these programs hinder information, tools, and data sharing among organisations. If a comprehensive TPM strategy and approach were developed, some organisations might be required to backtrack to achieve commonality.

2.8.4.2 PRIORITY GIVEN TO TPM

TPM is often viewed as maintenance rather than a vessel operations issue in the marine industry; this is supported by the fact that most TPM personnel are either maintenance or training personnel. The separation between vessel operations and maintenance is often significant, with these groups rarely reporting to the same organisation until above the executive level. As a result, vessel operations personnel are rarely measured on vessel equipment performance, and the maintenance group is rarely measured on vessel operations, safety, or environmental performance. This separation eliminates any incentive for managers of both organisations to pool their resources to achieve a successful TPM program. The lack of TPM goals and metrics in business plans and performance plans for vessel operations and maintenance provides evidence of this barrier.

 

2.8.4.3 CONFLICTING PROCESSES

The vessel operators are often faced with implementing multiple projects and initiatives simultaneously, including new equipment installations and process improvement initiatives; this can often result in overwhelming workloads and a lack of resources to implement all projects effectively. Implementing additional initiatives, such as a TPM program, without proper documentation and prioritisation may lead to neglecting other important projects.

Vessel operators must prioritise their crew's workload and evaluate the potential for integrating multiple processes into one cohesive initiative. This can help avoid resource competition and ensure that all initiatives are effectively implemented over a longer timeframe. Therefore, it is crucial to evaluate the possible process integration that can be achieved to avoid overwhelming workloads and neglect of important projects.

2.8.4.4 DATA AVAILABILITY

The lack of reliable data for planning purposes has hindered several efforts to implement TPM in the marine industry. The existing data collection methods do not emphasise the benefits of accurately monitoring vessel performance. Without this data, it is challenging to determine the relationship between vessel performance, safety, and maintenance costs. In addition, the data being collected is often not used for any decision-making, so the quality of this data is never verified. The result is that some data is available, but it may be of poor quality and is not collected in a manner that allows easy analysis. Without reliable data, the organisation can not develop accurate prioritisation plans and quantify any of the benefits received from their TPM programs, such as increased safety and reduced maintenance costs.

2.8.5 SUCCESS FACTORS AND ENABLERS

It is not surprising that some of the factors contributing to successful TPM programs involve avoiding the barriers discussed earlier. However, additional activities are also common among the more successful programs. For this analysis, a successful TPM program is defined as one that has developed and followed through with implementation plans and has achieved the expected benefits. It is important to note that success is not solely measured in significant financial savings, as some programs may have less ambitious goals yet still be considered successful if they achieve their stated objectives.

2.8.5.1 MANAGEMENT SUPPORT

The successful implementation of TPM requires a highly supportive management team committed to the program; this means that management should do more than allow TPM to be implemented; they should be an integral part of the driving force behind it. Management activities include rewarding teams for proactive maintenance, revising business plans to include TPM goals, allowing vessel crews to attend training sessions, and communicating the TPM goals to the entire organisation. With management's full support, the TPM program can succeed even if the TPM coordinator or critical personnel leave the organisation.

2.8.5.2 FOCUSED APPROACH

In the marine industry, it is vital to prioritise the implementation of TPM practices to achieve the most significant benefits. While there may not be a universal method for prioritisation, developing a clear master plan is crucial, as it allows organisations to allocate resources efficiently and address the most pressing equipment issues. Successful TPM programs in the marine industry have had the support of a committed management team, who rewarded proactive maintenance and communicated TPM goals to the entire organisation.

2.8.5.3 OPERATOR ( vessel management team ) OWNERSHIP

It is also essential to involve the vessel's crew in implementing TPM. Crew members responsible for operating and maintaining the equipment on board should be trained in the methods and tools of TPM. They should be encouraged to take responsibility for cleaning, inspecting their equipment, and performing maintenance tasks. The office technical and vessel engine team should train all the crew members on performing maintenance, and all should be involved in developing safety procedures.

It is also important for the crew members to collect data to determine equipment performance. The crew members can provide valuable insights into the condition of the equipment, which can be used to determine maintenance requirements. The office technical and vessel engine team can use this data to develop accurate estimates of equipment performance and repair requirements.

For example, in manufacturing, the marine industry must prioritise the implementation of TPM based on the equipment and systems most critical to vessel operations; this requires a clear master plan for how the vessel will transition to TPM practices and a focused plan of attack.

Management also needs to assume a leadership role in TPM implementation while allowing the crew members to take a prominent role in its development and implementation. The crew members' involvement in implementing TPM enables them to invest some of their blood, sweat, and tears in the TPM plans, resulting in proactive equipment inspection and maintenance.

2.8.5.4 JUST-IN-TIME TRAINING

For the vessel management team, ensuring that training is delivered at the appropriate time is vital to be effective. Training that is provided too early or too late can be equally ineffective. Therefore, it is essential to time the training to deliver it just before the individual is required to apply the knowledge in their job. This approach, known as just-in-time (JIT) training, allows individuals to reinforce their learning by immediately applying it hands-on. JIT training also minimises the impact of training on the vessel's operations by delivering training in small doses over an extended time.

2.8.5.5 INTEGRATED PROCESSES AND SCHEDULES

Failing to integrate the various vessel operation and maintenance processes can create conflicts and inefficiencies. It is essential to have a cohesive plan that integrates the vessel's maintenance schedule with the schedule to avoid conflicts and prioritise resources; this includes incorporating scheduling activities and coordinating access to the vessel among various departments and teams involved in vessel operation, maintenance, testing, and research and development. Just-in-time training should also be considered to ensure crew members are trained using their new knowledge, reinforcing classroom learning with direct and immediate feedback.

2.8.6 BENEFITS OF TPM

When considering whether to implement a TPM program for ocean-going vessels, it is important to ask what benefits can be gained from such an initiative. The answer is simple: TPM can help reduce the operating costs of vessels; this is particularly true for vessels that rely on complex equipment and automated systems. However, the actual savings will depend on the current state of the vessel's system and the specific type of equipment used. For example, if the equipment is already performing well, focusing on other ways to improve the system may be more beneficial. Each vessel must be evaluated for its unique situation to determine whether TPM brings significant advantages.

It is important to note that TPM implementation does come with a cost, including training and changes to procedures and roles. However, the benefits of improved equipment reliability and utilisation, reduced equipment wear and tear, and decreased maintenance issues have been well-documented in many industries. For example, in the marine industry, increased equipment availability may also enable vessels to delay purchasing additional equipment to meet employment demands.

In the marine industry, increasing scheduled maintenance activities can also drastically reduce unscheduled maintenance and breakdown repairs, leading to cost savings. This is particularly important for vessels that operate in remote areas, where breakdowns can result in significant delays and additional costs. Preventive maintenance can also help extend the vessel's and equipment's lives, reducing the need for costly replacements. Additionally, improving equipment reliability and utilisation can increase the efficiency of operations, resulting in increased productivity and profitability.

Implementing TPM in the marine industry can also improve safety, which is vital in this field. Ensuring equipment is properly maintained and inspected reduces the risk of accidents and injuries. Furthermore, improving equipment performance and reliability can help reduce environmental risks and comply with regulatory requirements.

In summary, implementing TPM in the marine industry can lead to cost savings, improved efficiency and productivity, increased safety, and environmental compliance. However, like any other industry, the optimal amount of preventive maintenance must be determined through a cost-benefit analysis to balance excessive maintenance costs and breakdowns.

2.8.6.1 REDUCED VARIATION

In addition to the benefits mentioned above, TPM can help reduce energy costs and environmental impact. Properly maintained equipment running at optimal conditions typically uses less energy and produces less waste; this can lead to significant cost savings for the organisation and help reduce its carbon footprint. By implementing TPM practices such as energy audits and identifying opportunities for energy savings, organisations can reduce their energy consumption and improve their sustainability. Additionally, TPM can help improve workplace safety by identifying and eliminating potential hazards in the production process; this can lead to a safer work environment and reduce workers' compensation costs for the organisation. Overall, the benefits of TPM are numerous and can positively impact an organisation in many ways.

2.8.6.2 INCREASED VESSEL AVAILABILITY

Implementing TPM can also lead to increased vessel availability. By reducing unscheduled downtime and increasing equipment reliability, vessels can spend more time in operation and less time in repair or maintenance; this is particularly important for vessels that operate on tight schedules or in harsh environments where maintenance and repairs are more challenging to perform. An effective TPM program can also reduce setup and changeover times, allowing vessels to transition between tasks more quickly and efficiently. The result is increased productivity and value-added time for the vessel and crew. Companies that have successfully implemented TPM in the marine industry have seen increased vessel productivity, maintenance cost reductions, and customer satisfaction improvements due to increased vessel availability.

2.8.6.3 REDUCED MAINTENANCE COSTS

By shifting maintenance focus from reactive repairs to proactive improvement, organisations can reduce their overall maintenance costs. The traditional approach of fighting fires and handling unpredictable workloads requires extra staffing. However, by implementing scheduled maintenance events and autonomous maintenance under TPM, organisations can level-load the work across all staff members and free up maintenance staff to focus on proactive equipment improvements and analysis. This transition requires a management team that values improved maintenance's potential gains over reducing staff headcount. Moreover, TPM also contributes to reduced energy consumption since equipment operates more efficiently, spending less time idling. Though gains from energy conservation may not be staggering, they still contribute to reducing overall manufacturing costs. Successful companies have achieved significant benefits from TPM, including a reduction in maintenance spending.

It is important to note that the actual benefits of TPM will vary based on each organisation's specific circumstances and challenges. TPM is not a one-size-fits-all solution, requiring careful planning, implementation, and monitoring to ensure its success. It is also important to recognise that TPM is not a quick fix or a short-term solution. It requires a long-term commitment to continuous improvement and a willingness to adapt to changing circumstances and emerging challenges. With the right approach and mindset, however, TPM can be a powerful tool for improving equipment reliability, reducing costs, increasing productivity, and enhancing overall competitiveness

2.8.6.4 REDUCED INVENTORY

Any marine organisation that uses unreliable equipment on their vessels must maintain an unnecessarily large stock of spare parts to fulfil the operational demands. In contrast, the equipment is nonoperational—the more unreliable the equipment, the larger the necessary stock of spare parts. If a vessel system is composed of unreliable equipment, the spare parts inventory must be kept higher than desirable to accommodate equipment performance uncertainty. This extra inventory can create many problems: vessel operational requirements change takes too long to incorporate; the new lead time must allow for using up the spare parts inventory. Further, any defective parts produced can sit in the inventory, waiting to be discovered at the next step in the maintenance process. The inventory is effectively hiding these maintenance problems. Implementing a TPM program removes much of the uncertainty in the vessel system's equipment reliability and maintenance cycle time.

The spare parts for the vessel equipment are another source of unnecessary inventory holding costs. The spare parts are used to repair the equipment, which could occur at any time on unreliable equipment. Once again, the uncertainty in the equipment performance requires extra inventory. Through reliability engineering, data collection and analysis, the maintenance staff can develop an accurate estimate of the necessary spare parts and the frequency of their usage. Implementing TPM will allow the maintenance technicians to perform the analysis required to optimise their spare parts inventory policy.

Additional effort should be applied to reducing inventory via improved scheduling systems and synchronised maintenance processes.

2.8.6.5 IMPROVED SAFETY

The initial steps in implementing the autonomous maintenance activities of TPM in the marine industry create an environment that could quickly reduce safety and increase accidents; this is the result of crew members taking on additional and unfamiliar maintenance tasks for which they may not have been effectively trained. Since these tasks are new to the crew and often involve potentially hazardous activities (such as inspecting machinery in confined spaces or working at heights), they pose a new threat to the crew's safety. Therefore, ensuring the crew's safety must be a primary function of the TPM implementation plan; this requires extensive training, developing "fool-proof" maintenance tasks, and implementing improved procedures. Also, the crew better understands their equipment by performing routine maintenance tasks frequently. This new knowledge helps the crew make more intelligent decisions to reduce the potential hazards that the equipment presents. Therefore, the safety of all individuals involved with the vessel must be a top priority of any good TPM program.

The improved safety within TPM has allowed some marine companies to reduce their accidents to zero. Another side benefit of the TPM program is that pollution is often reduced due to more efficient equipment, which extends the safety improvements to include the marine environment.

2.8.6.6 IMPROVED MORALE

The final benefit of implementing TPM is improved crew morale. Of course, any change in the workplace can cause disruption. Still, TPM uses onboard teams to develop and deploy the implementation plans, giving the crew increased control and ownership over the vessel equipment. This ownership allows the crew to take pride in their work and make informed decisions on equipment use. However, this requires management support and the willingness to give decision-making authority to the operators. If managers are unwilling to do so, morale may suffer rather than improve.

The vessel's engineers also benefit from TPM as they have more time to perform equipment analysis, work with equipment designers, and engage in other technically challenging tasks. The engine crew will not necessarily see a drop in their workload as they shift to more proactive maintenance activities, such as developing preventive maintenance requirements for the equipment. This change requires management support to allow the engine crew to develop their skills and take on these new tasks.

Disclaimer:

Out of Box Maritime Thinker © by Narenta Consilium Group 2022 and Aleksandar Pudar assumes no responsibility or liability for any errors or omissions in the content of this paper. The information in this paper is provided on an "as is" basis with no guarantees of completeness, accuracy, usefulness, or timeliness or of the results obtained from using this information. The ideas and strategies should never be used without assessing your company's situation or system or consulting a consultancy professional. The content of this paper is intended to be used and must be used for informational purposes only.

Wednesday, October 23, 2024

2.6. RELIABILITY-CENTERED MAINTENANCE (RCM) – PART II

 2.6.1 FAILURE FINDING MAINTENANCE

Maintenance tasks focused on identifying failures are used to uncover equipment malfunctions that go unnoticed during regular crew operations, such as concealed defects. Since these issues are hidden, proper maintenance is crucial. Otherwise, a subsequent failure and its consequences must transpire before the equipment problem becomes evident. For instance, the inability of a backup electrical generator to start upon power loss might only become apparent when the primary generator fails, causing a power outage. Unfortunately, traditional condition monitoring or scheduled maintenance tasks are often inadequate for managing hidden failures. Instead, failure-detection maintenance tasks, which usually involve functional equipment testing, are employed to ensure the equipment's readiness to perform its designated function(s) when needed.


2.6.1.1 HIDDEN FAILURES – STATISTICS


A failure-finding task aims to minimise the risk of multiple failures to an acceptable level by controlling the occurrence rate of such failures. For example, when assuming that multiple failures can only arise from the simultaneous combination of a specific initiating event and the unavailability of a safety or backup system, the occurrence frequency of multiple failures can be described using the following formula:

 

Maintaining equipment unavailability below the level required to keep the multiple failure occurrence frequency sufficiently low is crucial, ensuring an acceptable failure risk to achieve an acceptable failure occurrence frequency. For instance, if a specific event's acceptable multiple failure occurrence frequency is 0.01 per year, the initiating event's failure frequency (e.g., ) is 0.1 per year. The acceptable unavailability for the hidden failure is 0.1.

Failure-finding tasks are efficient in addressing hidden failures because they either (1) verify the equipment's functionality or (2) enable the detection of equipment failure, necessitating repairs. After conducting the task, the unavailability of the safety or backup system is effectively "reset" to zero (or close to zero). As time passes, the unavailability increases until the equipment fails or undergoes retesting. The failure rate remains constant, implying that the failure probability increases linearly (or nearly linearly over most reasonable periods) with a slope equal to the failure rate (i.e., the failure probability is the product of the failure rate and elapsed time). The impact of failure-finding tasks can be seen in the below figure.



2.6.1.2 APPLICABILITY AND EFFECTIVENESS OF FAILURE-FINDING TASK


For a failure-finding work to be deemed effective, the following criteria must be met:

·         There should be no available or cost-efficient condition-monitoring or planned maintenance task capable of detecting or preventing the failure.

·         The task must be technically viable to execute. Performing at the necessary interval should be feasible without disrupting a stable system.

·         The task must reduce the failure probability (and thereby the risk) to an acceptable level. The tasks must be conducted at intervals that ensure the probability of multiple failures maintains an acceptable risk level. Risk acceptance criteria should be established and documented.

·         The task must not elevate the risk of multiple failures (e.g., when testing a relief valve, overpressure should not be generated without the relief valve is operational).

·         The task must ensure that entire protective systems are tested, not individual components.

·         The task must be cost-effective. Over time, the expense of performing the task should be lower than the total cost associated with the consequences of failure.


2.6.1.3 TASK INTERVAL FOR DETERMINING FAILURE-FINDING MAINTENANCE


The interval for failure-finding tasks can be established through the following:

·         Mathematical approaches employing reliability equations, or

·         Utilising general guidelines designed to maintain an acceptable risk level.

Regardless of the method employed, the primary objective is to ensure that the unavailability of a safety or backup system remains low enough so that the frequency of multiple failure occurrences is adequately reduced to achieve an acceptable risk level. An acceptable occurrence frequency must be defined for a specific consequence of multiple failures. For instance, a 0.01 per year occurrence frequency may be acceptable for a €1 million operational loss, while a 0.1 per year frequency could be suitable for a $100,000 operational loss. In both scenarios, the risk is equivalent (€10,000 per year).

 

2.6.1.3.1 FAILURE-FINDING TASK INTERVAL -  MATHEMATICAL DETERMINATION


The intervals for failure-finding tasks addressing the highest-risk hidden failures often necessitate mathematical determination; this is typically achieved by assuming the hidden failure is random and, thus, best modelled using the exponential distribution. This assumption is generally valid for the following reasons:

·         If the failure exhibits a wear-in failure characteristic, a one-time change or a condition-monitoring task is usually employed to manage the failure.

·         If the failure demonstrates a wear-out failure characteristic, a condition-monitoring or planned maintenance task should be applied to manage the failure.

To establish a failure-finding task interval, the equations for the frequency of multiple failures and the unavailability of the hidden failure are combined as follows:

The subsequent additional assumptions are frequently accurate and will result in simplification:

·         The failure distribution is exponential.

·         The product of the conditional failure rate and the test interval time (λ × test interval) is less than 0.1.

·         The time required to perform a failure-finding task is short compared to the system's availability duration.

·         The time needed to conduct a system repair is short compared to the system's availability duration.

·         Multiple failures can only arise from the simultaneous combination of the specified initiating event and the unavailability of the backup or safety system.


2.6.1.3.2 USING GUIDELINES TO DETERMINE THE FAILURE-FINDING TASK INTERVAL


Guidelines are established and documented to determine the failure-finding task interval. This process typically involves the following steps:

·         Setting up criteria for determining the required unavailability of the hidden failure, considering the risk associated with the hidden failure

·         Estimating the Mean Time To Failure (MTTF) of the hidden failure

·         Identifying the test interval using a table derived from the equation ( T )

Risk of Hidden Failure

Unavailability Required

Very High

< 0.0001

High

> 0.0001 to 0.001

Moderate

> 0.001 to 0.01

Low

> 0.01 to 0.05

Failure-finding - Interval Rules – Example

Unavailability Required

Failure-finding Interval (as % of MTTF)

0.0001

0.02

0.001

0.2

0.01

2

0.05

10

Failure-finding - Interval Based on MTTF – Example

When using the guideline-based approach, users should be cognizant of the assumptions made while developing the rules and task intervals. Confirming the validity of these assumptions in the context of the specific system or equipment being assessed is essential. By doing so, users can ensure that the determined task intervals accurately reflect real-world conditions and contribute to an effective failure management strategy.


2.6.2 RISKS CONSIDERATION


Risk can be divided into two components: the frequency of loss events and the severity of their consequences.

Frequency, often expressed in loss events per year, can be determined using historical data (if many events have occurred) or calculated through risk analysis tools (if limited data records exist).

Consequences can be represented as a combination of a loss event's impact on various aspects, such as:

·         Capital investment: Damage to equipment and cost of repair

·         Community: Effects on the public

·         Directional control: Total loss or diminished manoeuvrability

·         Explosion or fire: Damage to equipment and/or the vessel

·         Loss of containment: Quantity of harmful substances released into the environment (including cleanup costs)

·         Operations: Loss of hire, outage time for functions like drilling, position mooring (station keeping), hydrocarbon production and processing, loading or unloading operations

·         Propulsion: Complete loss or reduced propulsive capability

·         Safety: Number of individuals affected (injuries or fatalities)

After identifying the risk associated with a loss event, ship designers, operators, insurers, and regulators should implement preventive and/or mitigative measures to reduce the risk to an acceptable level.

The operating modes of a vessel or marine structure, such as ocean transit, cargo discharge, and others, may be associated with different types of loss events. As a result, identifying the operating mode is a crucial initial step in addressing risks for marine vessels and structures. By understanding the specific operating mode, appropriate risk mitigation measures can be tailored to the unique challenges presented in each situation.

Operating modes represent the various contexts and environments in which the vessel functions. Therefore, hazards can be identified based on these operating modes. Initiating events are specific equipment failures, human errors, or external occurrences (e.g., lightning strikes) that could lead to undesired events. Preventative measures consist of engineered safeguards (e.g., alarms) or management systems (e.g., personnel training) designed to prevent an initiating event from escalating into an undesired event. Undesired events are the immediate consequences of an initiating event and hazard affecting the vessel, such as collisions or allisions. By recognising these factors and their interplay, appropriate risk mitigation strategies can be developed for each operating mode.


FIGURE 1 The General Risk Model


2.6.2.1 VESSELS AND INNATE RISKS


Loss events associated with each operating mode can be comprehensively postulated. Many loss events are already well-known and implicitly or explicitly identified in classification society rules and International Maritime Organization (IMO) regulations. These typically include events such as structural failure, loss of stability, loss of propulsion, and fire. Take "loss of propulsion" as an example. Preventative measures must be implemented to ensure propulsion machinery reliability, thus minimising the risk of propulsion loss. Additionally, mitigative measures should be in place to address potential consequences like collision, grounding, or pollution in the event of propulsion loss. The extent of preventative and mitigative measures depends on the risk level and acceptable risk threshold.

Until now, classification society rules and IMO regulations have primarily focused on risk reduction for vessels through hardware design, facility provision, and in-service inspection. However, some areas traditionally have not been explicitly covered, such as:

·         Specificity regarding the vessel's operating modes

·         Quantification of risk levels and definition of acceptable risk criteria

·         Inclusion of operational measures (e.g., safety management and crew training) as tools to reduce risks, especially in measures to mitigate consequences

·         Emphasis on maintenance as a means of preventing loss events

Risk acceptance criteria are discussed in general terms in the following subsection.


2.6.2.2 RISK CHARACTERISATION


A risk matrix is an effective method for characterising the risk of loss events. It consists of a grid with cells corresponding to defined consequence (severity) and frequency categories, where loss events can be placed. The consequence and frequency categories are designed to be broad enough to quickly determine the appropriate risk cell for a loss event yet sufficiently narrow to offer varying degrees of resolution for decision-making. In addition, frequency and consequence levels on a risk matrix are often graduated by order of magnitude, providing a clear visual representation of the risks associated with different events; this allows stakeholders to prioritise their risk management efforts based on each event's potential impact and likelihood.

Severity levels can be defined for various types of loss consequences based on the list of examples. Table 1 showcases five (5) consequences (directional control, propulsion, loss of containment, fire/explosion, and safety) and outlines four severity levels for each consequence. Before the risk analysis, an appropriate severity level term for the consequence should be chosen and defined. For each severity level, several example descriptors are listed. Some descriptors are repeated between adjacent severity levels. Some studies use numerals (e.g., 1, 2, 3, 4) to represent the severity levels.

The descriptor chosen to describe a particular severity level may vary between analyses. For instance, one analysis might choose the descriptors "hazardous" and "critical" to describe the two highest severity levels, while another analysis might opt for "critical" and "catastrophic." This flexibility allows organisations to adapt the severity levels to their needs and better understand the potential consequences of different loss events.

Frequency categories are typically expressed in units of events per year. Sometimes, it is challenging to comprehend the smaller frequency categories (e.g., Remote: 0.001 events per year to 0.01 events per year, or from 1 event every 1,000 years to 1 event every 100 years). However, these small frequencies can be better understood when considering multiple vessels over extended periods. For example, a loss event that has occurred twice across a fleet of 100 vessels over the last 20 years corresponds to a frequency of 0.001 events per year.

The risk-based decision-making aspect of a risk matrix is observed along the lines of constant risk. Each cell in the risk matrix corresponds to a defined risk level. Cells with similar risk levels are grouped to form a continuous risk line. All cells in a risk matrix should be categorised into relevant lines of constant risk. Risk-based action levels to address the loss event are based on the risk level represented by the line of constant risk. Generally, the actions necessary to address loss events in each risk level are predefined.

Using a risk matrix enables decision-makers to prioritise actions and allocate resources based on the risk levels associated with different loss events. By grouping events with similar risk levels and defining actions to address these events, organisations can manage risks more effectively and make informed decisions on how to best mitigate potential consequences.

Once the risk of the loss event has been identified using the risk matrix, a risk reduction action consistent with the required action level should be chosen. Typically, maintenance tasks such as condition monitoring and planned maintenance will primarily reduce the frequency of occurrence of a loss event. At the same time, equipment redesign actions and one-time changes may decrease both the frequency and consequence. Central to the risk-based task selection process is evaluating the task's impact on the loss event. The ultimate objective should be to select an efficient, feasible task to decrease the risk level of the loss event to an acceptable level of risk (e.g., low risk or medium risk).

By carefully assessing the potential risk reduction actions and their impact on the loss event, decision-makers can prioritise and implement the most suitable measures. This ensures that resources are allocated effectively and efforts are focused on tasks that will provide the most significant risk reduction. As a result, organisations can better manage risks associated with their operations and minimise the likelihood and consequences of loss events, thereby achieving a safer and more reliable operational environment.



FIGURE 2 Example Risk Model



FIGURE 3 Risk Matrix and Example Consequence (Severity) Categories


2.6.3 RCM ANALYSIS – EXECUTION AND DOCUMENTATION


The following procedures guide RCM (Reliability Centered Maintenance) analyses. RCM analyses should be performed systematically, following a step-by-step process. The essential elements of an RCM analysis process include the following:

·         Identify operating modes and corresponding operating contexts: Determine the different operating modes and their specific contexts, such as regular operation, standby, or emergency modes, and the related environmental and operational conditions.

·         Define vessel systems: Break down the vessel into various systems and subsystems, such as propulsion, power generation, and safety systems, for more detailed analysis.

·         Develop system block diagrams and identify functions: Create block diagrams for each system, illustrating the relationships between components and subsystems. Identify each system's primary and secondary functions, considering their intended purposes and goals.

·         Identify functional failures: Determine potential failures that can prevent a system from performing its intended functions, such as equipment malfunctions, human errors, or external factors.

·         Conduct a Failure Modes, Effects, and Criticality Analysis (FMECA): Analyse each potential failure mode and its effects on the system, considering factors such as frequency, severity, and detectability. In addition, the criticality of each failure mode should be assessed to prioritise risk management efforts.

·         Select a failure management strategy: Choose an appropriate strategy for managing each identified failure mode, including preventive maintenance, condition monitoring, or redesigning components. The selected strategy should minimise the overall risk and maximise the system's reliability.

·         Determine spare parts holdings: Identify the necessary parts and their required quantities to support the chosen maintenance strategies, considering lead times, costs, and storage constraints.

·         Document the analysis: Compile and document the RCM analysis process, including the identified failure modes, chosen maintenance strategies, and spare parts holdings, to support effective implementation and continuous improvement.


2.6.3.1 DEFINING SYSTEMS


Each system must be accurately defined to conduct an RCM analysis effectively and comprehensively. This process involves:

1.                    Defining the operating characteristics for the ship as a whole and then for each system: Begin by outlining the overall operational characteristics of the vessel, considering factors such as its intended use, operating environment, and mission requirements. Next, define each system's operating characteristics, considering aspects like performance criteria, operating modes, and specific environmental conditions.

2.                    Partitioning the vessel into functional groups, systems, and equipment items: Divide the vessel into distinct functional groups based on their overall purpose and interdependence. For example, functional groups may include propulsion, navigation, safety, and communication systems. Further partition these functional groups into specific systems, such as engines, generators, or fire suppression systems, and then break them down into individual equipment items or components. This hierarchical structure helps clearly define each system's boundaries and operational intent subject to RCM analysis.

By following these steps, you can ensure a detailed and organised approach to defining each system for RCM analysis. This accurate definition is crucial for identifying potential failure modes, understanding their effects, and determining appropriate maintenance strategies to minimise risks and maximise system reliability.


2.6.3.2 DEFINING SHIP OPERATING CHARACTERISTICS


Defining the operating characteristics of a vessel is crucial for making informed decisions regarding the RCM failure management strategy. Incomplete or inaccurately defined operating characteristics can lead to an inappropriate failure management strategy. To properly define the operating characteristics, follow these steps:

·         Identify the various operating modes for the vessel: List the different operating modes the vessel goes through during its service life. These may include transit, loading and unloading, anchoring, maintenance, and emergency response. Each operating mode has specific requirements and challenges that must be considered.

·         Define the operating context for each functional group based on the operating modes: After identifying the operating modes, use them to define the operating context for each functional group. The operating context consists of the conditions and requirements the functional group must meet while operating within a specific mode. For example, the propulsion system's operating context in transit mode might include speed, fuel efficiency, and noise reduction requirements. In contrast, its context in maintenance mode could involve accessibility, ease of maintenance, and safety considerations.


FIGURE 1 RCM Analysis Diagram


2.6.3.2.1 OPERATING MODE


An operating mode for a vessel or marine structure represents its current operational state. Each operating mode affects how the ship's systems and machinery should be operated, guiding the formulation of operating contexts for distinct functional groups. The following are some standard operating modes for ships:

         Normal seagoing conditions at full speed

         Maximum allowed operating speed in crowded waters

         Manoeuvring next to a dock or another ship

         Cargo handling operations

The following example operating modes are typical for mobile offshore drilling units and offshore oil and gas production facilities:

         Drilling operations

         Position mooring or station keeping

         Relocation/Towing

         Hydrocarbon production and processing

         Import and export functions


2.6.3.2.2 OPERATING CONTEXT


The operating context of a functional group refers to the conditions under which the system is expected to operate. It should provide a comprehensive description of the following:

         The physical environment where the functional group operates

         A detailed explanation of how the functional group is used

         The specified performance capabilities of the functional group, as well as the required performance of any additional functional groups within which the functional group is integrated

Some essential factors to consider when developing the operating context for a functional group include:

         Serial redundancy applies to configurations with an identical standby system/equipment to support an operating functional group. If the operating system fails, the standby system is activated. The operating contexts for the running and standby systems/equipment are distinct. For instance, a functional failure in the operating system/equipment is likely to be apparent, while one in the standby system/equipment is likely to be hidden.

         Parallel redundancy pertains to systems/equipment operating simultaneously. Each system can meet the total demand. If a functional failure occurs in one system/equipment, the remaining systems/equipment continue operating at a higher capacity. In some cases, standby systems/equipment may also be available.

         Performance and quality standards: Systems/equipment may need to perform at a specific level or provide a service meeting particular quality criteria (e.g., compressed air supplied at a specified quantity, pressure, temperature range, and humidity limit).

         Environmental standards: As required by international, national, and local laws and regulations (e.g., for an engine emission standard, the operating context should consider a functional group's impact or potential impact on the environment).

         Safety standards: The operating context should specify any hazards that might be present and the safeguards required to protect the crew.

         Shift arrangements: It is assumed that the propulsion machinery operates continuously for ocean-going vessels, except when the vessel is docked. However, the ship's service electrical power system operates continuously. Therefore, system configurations and maintenance strategies must be carefully developed to ensure system availability.

2.6.3.2.3 VESSELS OPERATING CONTEXTS - DEVELOPMENT

Operating contexts should be developed with varying degrees of detail at each level. An operating context statement should be written for each level of functional breakdown, expanding upon the context provided at the preceding level. As the focus shifts to the systems and equipment that comprise the functional group at the lower levels of the functional breakdown, more detail is included in the operating context statement.

Specific performance parameters are crucial for defining functions for the functional group and determining what constitutes a failure and the effects that such failures will have on individual equipment performance, overall system operation, and, ultimately, the vessel's roles.

         Vessel level:  Operating contexts should first be developed for the vessel as a whole, typically with a focus on the vessel type; this includes a physical description of the vessel, the vessel type, the cargoes to be carried, performance standards (speed, manoeuvrability, fuel capacity and consumption, etc.), and cargo handling capabilities. Statements should address the primary roles (e.g., transporting cargo from point A to point B within a specific time frame, cargo preservation), secondary roles (e.g., crew habitability), and safety and environmental roles of the vessel.

         Functional group level: Using the vessel-level operating context, develop an operating context for each functional group level (e.g., machinery and utilities, propulsion functional group). The operating contexts at a given functional group level must encompass all operational characteristics to define the context for the next highest level. For example, the operating contexts for propulsion, manoeuvring, electrical, vessel service, and navigation and communication functional groups should include all operating characteristics in the machinery and utilities functional groups. Additionally, an operating context must be developed for each vessel's operating mode.


TABLE(s) 1 & 2 Example Operating Modes and Operating Context


2.6.3.3 DIVIDING SYSTEMS


Due to a vessel's complexity, which comprises numerous intricate systems and subsystems, it is beneficial to categorise it into functional groups. These functional groups can then be divided into specific systems, subsystems, equipment items, and individual components. This approach allows for a more structured and manageable understanding of the vessel's overall design and operation.


2.6.3.3.1 DIVIDING A VESSEL INTO FUNCTIONAL GROUPS


Dividing a vessel into functional groups is done using a top-down approach. For most vessels, the top level consists of the following primary functional groups:

         Hull

         Machinery and utilities

         Cargo handling

In most instances, it is necessary to further divide these high-level functional groups to identify major systems for analysis. For example, machinery and utilities can be subdivided into the following functional groups:

         Propulsion functional group

         Manoeuvring functional group

         Electrical functional group

         Ship service functional group (e.g., bilge, ballast, firefighting, steam)

         Navigation and communication functional group

Each functional group should be divided using a top-down approach until a level is reached where functions are associated with distinct physical units, such as individual systems or equipment items; this is sometimes called the level of indenture. The indenture level is crucial as it significantly influences the time and effort needed to conduct a thorough analysis. An analysis performed at too high a level may be overly superficial, while one conducted at too low a level may become excessively complex.

The level of indenture will vary depending on a system's complexity. Highly complex systems with numerous failure modes will typically be analysed at lower levels. The level of indenture should allow for the identification of the following elements within the functional group:

         Physical boundaries

         Functions and functional failures

         Discrete equipment items

2.6.3.3.2 DIVIDING A FUNCTIONAL GROUP INTO EQUIPMENT ITEMS

Upon achieving an appropriate level of partitioning for functional groups, each functional group should be further divided into specific equipment items. One or two levels of indenture might be necessary to adequately separate a functional group into equipment items. The chosen level of indenture for equipment items should meet the following criteria:

         The equipment can be identified for its contribution to the overall functions of the functional group.

         The equipment can be identified for its failure modes.

         The equipment represents the most convenient physical unit for specifying maintenance tasks.




2.6.3.3.3 SELECTION OF FUNCTIONAL GROUPS FOR ANALYSIS


Establishing a priority order to analyse functional groups might be necessary, allowing resources to be utilised more effectively. Generally, one of the following methods is employed to choose groups for analysis:

         Engineering judgment: This method depends on the undocumented experience of subject matter experts in selecting the group. Typically, when choosing a group, a team will subjectively consider factors such as the number of failures that have occurred, the amount of maintenance resources, the potential for performance improvement, and the possibility of reducing costly downtime maintenance (e.g., dry-docking maintenance). After determining the selection and priorities, the team should document the reasoning behind their decision.

         Simple analytical approaches: A more analytical method for selecting functional groups involves using simple tools like Pareto analysis and relative ranking. These tools offer the selection team a structured methodology for ranking the various factors considered during the selection process. When using Pareto analysis, the team collects data on each considered factor. For instance, if the number of failures is essential, the team would gather failure data for each group and then rank each group based on the number of failures. The team develops a scoring system for each factor when using relative ranking. The scores are then tabulated and evaluated to rank the groups.

         Risk assessment: The most comprehensive approach is to perform a risk assessment or use an available risk assessment to select and rank functional groups. Whether the risk assessment is a detailed quantitative analysis or a high-level profiling analysis (used for enterprise risk management), the risk assessment data can identify groups with unacceptable risk and those with the highest risk. The unacceptable risk data can determine if further detailed analysis, such as RCM analysis, is warranted. The group risk ranking can then be used to prioritise groups for analysis (e.g., groups with the highest risk are analysed first). Additionally, the risk assessment should be reviewed to determine if equipment failures can be affected by improved maintenance and if these failures are significant contributors to the risk. For example, when reviewing a group's risk assessment, it might be discovered that the major risk contributor is operational errors. In this case, an RCM analysis might not be the best method to reduce the risk. However, the highest-risk groups where equipment failures are major risk contributors are suitable candidates for RCM analysis.

Regardless of the approach used to select groups, the following considerations should be taken into account:

         The expected cost savings over the projected remaining life of the equipment should be balanced against the cost of the analysis.

         The human resources required to conduct each analysis must be identified, and their availability must be ascertained.


2.6.3.4 DEFINING FUNCTIONS AND FUNCTIONAL FAILURES


After establishing the operating mode for the vessel and the operating context for a functional group, the RCM analysis team uses this information to determine the necessary functions for the functional group to operate successfully, ensuring all relevant vessel functions are maintained. Considering the applicable operating modes when defining functions is crucial, as functions can vary with different operating modes. Furthermore, identifying all functions is essential, as failing can lead to overlooking critical failures (e.g., those affecting system and vessel performance).

Once the functions are defined, functional failures (e.g., various loss functions that can occur due to failures) are identified. Functional failures can represent a total loss of function (e.g., providing no compressed air) or a partial loss of function (e.g., providing compressed air at reduced pressure and flow). The following paragraphs explain more about identifying functions and functional failures.

1.                    Identifying functions: Begin by listing the required functions for each functional group, considering the operating context and mode. These functions may include primary functions (directly related to the main purpose of the functional group), secondary functions (supporting the primary functions), and safety or environmental functions (relating to safety and environmental requirements).

2.                    Identifying functional failures: For each identified function, determine the potential functional failures that could occur. These failures can be total (the complete loss of a function) or partial (a reduction in the function's effectiveness or efficiency). It is important to consider different operating modes and contexts and the potential impact of these failures on the overall system and vessel performance.


2.6.3.4.1 IDENTIFYING FUNCTIONS FOR A FUNCTIONAL GROUP


After completing the operating characteristics and partitioning, the subsequent task involves identifying the functions associated with the chosen functional group and its associated equipment. These functions are determined by considering the operating context of the functional group and the equipment included within it. Therefore, it is essential to state what the functional group must provide or do to ensure proper vessel operation in the given operating mode rather than how individual equipment items operate to ensure comprehensive identification and definition of all functions. For instance, a propulsion group function could be "providing X horsepower at Y RPM to the propeller" despite the engine's potential to produce more horsepower and operate at a higher RPM.

Developing a functional block diagram is one approach to identifying functions. This diagram represents the system operation in a graphical format, comprising three main components. Firstly, the inputs (e.g., raw materials, energy sources) that enter the system boundary. Secondly, the functional blocks represent the functions within the system boundary. Thirdly, the outputs (e.g., materials, energy, signals) that leave the boundary.

Additionally, arrows illustrate the flow of materials, energy, signals, etc., between functional blocks and into and out of the system. Each block within the boundary corresponds to a primary or secondary function that must be provided to transform the inputs into outputs. Therefore, every function block and its associated outputs signify a function that must be furnished for the system to operate correctly.

Each function must be documented as a function statement consisting of a verb, an object, and a performance standard  To ensure clear identification and definition. The performance standard should specify the minimum acceptable requirement rather than the design capability and must be clearly defined or quantified. It defines failure as the basis for the maintenance decision-making process. Functions can be categorised as primary or secondary. Primary functions represent the fundamental reasons why the system/equipment exists. For instance, the primary functions of a diesel engine are to provide power to drive the propeller from 0 to 91 RPM with output from 0 to 16,860 kW. The minimum acceptable output is 9,000 kW to maintain a minimum vessel speed of 7 knots. Secondary functions are generally less evident but may have worse consequences if they fail. The following functional categories can be used to determine secondary functions:

         Environment integrity

         Safety, structural integrity

         Control, containment, comfort

         Appearance

         Protection

         Economy, efficiency

         Supplementary functions

For example, some of the secondary functions of a diesel engine could include having acceptable engine emissions by some standards and having a vibration level that will not affect structural integrity.



Protection and protective devices are the most critical of the various secondary functions. These devices function in one of the following five ways:

         Alerting the operator to abnormal conditions

         Shutting down the equipment when a failure occurs

         Eliminating or relieving abnormal conditions following a failure that could otherwise cause more severe damage

         Taking over from a failed function

         Preventing a hazardous situation from occurring in the first place

When documenting the functions of any system/equipment, it is essential to list the functions of all associated protective devices. These devices must receive special attention as they safeguard the system/equipment and personnel.


2.6.3.4.2 IDENTIFYING FUNCTIONAL FAILURES FOR A FUNCTIONAL GROUP


Identifying a series of functional failures is crucial for every function within the functional group. Typically, each function will have at least two functional failures, which can be a complete or partial loss of function. Deviations in the performance standard usually represent a partial loss of function. For instance, the function "to provide 16,860 kW at 91 RPM to the propeller" can have the following functional failures:

         Total loss of function

No power to the propeller

         Partial loss of function

Provides less than 16,860 horsepower to the propeller

Provides more than 16,860 horsepower to the propeller

Provides less than 35 RPM to the propeller

Provides more than 91 RPM to the propeller

Functional failures can be identified from functions by using the following guides:

         No or none of the function

         Less of each standard performance parameter

         More of each performance parameter

         Premature operation of the function

         Failure to cease operation of the function (e.g., the function operates too long)

         Intermittent operation of the function

         Other functional failures appropriate for the functional group

Each functional failure must be documented in a statement containing a verb, an object, and the functional deviation.


Function and Functional Failure List - Example



2.6.3.5 FMECA – CONDUCTION


After identifying potential functional failures, the next step in the RCM analysis is to perform a Failure Mode, Effects, and Criticality Analysis (FMECA). This step aims to establish the cause-and-effect relationship between potential equipment failures, functional failures, and the end effect of functional failures. It also evaluates the criticality of the postulated failure mode. This information is essential in determining the following:

         When a failure management strategy is required

         What type of failure management strategy is best suited to manage the failure mode (e.g., one-time change, planned maintenance, or run-to-failure)

         The significance of the failure management strategy

By evaluating the criticality of the failure modes and their potential impact on the system/equipment, it is possible to determine the most appropriate course of action to manage the failure. The failure management strategy can include various approaches, such as redesign, replacement, or modification of equipment, scheduled maintenance, or run-to-failure. The importance of the failure management strategy will depend on the consequences of the failure mode and the criticality of the system/equipment.


2.6.3.5.1 IDENTIFYING FAILURE MODES AND EFFECTS WITH AN FMECA


There are two primary approaches for performing an FMECA: bottom-up and top-down. Both methods can be effectively used in an RCM analysis, and each has advantages and disadvantages. However, the main attribute of both approaches is that they are inductive analysis techniques that help guide the RCM analysis team in establishing the cause-and-effect relationships necessary to identify maintenance requirements and other potential improvements.

The bottom-up approach involves analysing individual equipment components and their potential failure modes and effects, then evaluating their criticality in the overall system/equipment context. This approach can be more time-consuming but provides a more comprehensive understanding of the system/equipment's failure modes.

On the other hand, the top-down approach focuses on analysing the overall system/equipment and its functions to identify potential failure modes and their effects. This approach can be less time-consuming but may overlook some equipment-specific failure modes.

Both approaches can effectively identify potential failure modes and their impacts on the system/equipment. They ultimately help develop a maintenance strategy that maximises system/equipment reliability while minimising maintenance costs.


2.6.3.5.1.1 BOTTOM-UP FMECA APPROACH


The bottom-up approach is a systematic method of explicitly analysing each equipment item. This approach focuses on understanding the effects of different equipment failure modes on the system's operation. The bottom-up approach involves determining whether an equipment failure mode results in a local effect that leads to a functional failure, which ultimately causes an end effect of interest. The following are the steps involved in performing a bottom-up FMECA:

         Select an equipment item for analysis

         Identify potential failure modes for the equipment item

         Choose a failure mode for evaluation

         Determine the failure characteristic (e.g., wear-in, random, wear-out) for the failure mode

         Determine the local, next higher-level, and end effects for the postulated failure mode

         If the end effect leads to a consequence of interest, determine the causes of the failure mode

         Use the risk decision tool to determine the criticality of the failure mode

         Repeat the steps as necessary until all equipment items and associated failure modes have been evaluated

When conducting a bottom-up FMECA, the failure causes are the fundamental equipment failures that result in the failure mode, and the following higher-level effect usually identifies the resulting functional failure.

The bottom-up approach is advantageous because it helps ensure that all equipment items are analysed, and all potential equipment failure modes are considered. Furthermore, a standard list of failure modes can be created for common equipment items, making the analysis more straightforward and promoting consistency between RCM teams. This consistency allows for the efficient transfer of knowledge and experience from one team to another, making it easier for organisations to implement RCM strategies across multiple sites or locations. Additionally, the bottom-up approach helps identify failure modes specific to particular equipment items, which might be missed in a more general top-down analysis. This detailed analysis can lead to the development of more effective maintenance strategies and increase the reliability and safety of the system/equipment.


2.6.3.5.1.2 TOP-DOWN FMECA APPROACH


The top-down approach is a method of analysing each function and its associated functional failures. This approach focuses on understanding the effects of different functional failures on the system's operation and then identifying which equipment failures (e.g., failure mode) can cause the functional failure. The top-down approach aims to determine whether a functional failure results in an end effect of interest and then determine which equipment failures can cause the functional failure. The following are the steps involved in performing a top-down FMECA:

         Select a function for analysis

         Choose a functional failure for evaluation

         Determine the local and end effects of the postulated functional failure

         If the end effect leads to a consequence of interest, identify the equipment failures that can result in the functional failure

         Determine the failure characteristic (e.g., wear-in, random, wear-out) for the failure mode

         Use the risk decision tool to determine the criticality of the failure mode

         Repeat the steps until all functions and functional failures are evaluated

The top-down approach allows for a broader understanding of the system/equipment by analysing the functions and associated functional failures, leading to a better understanding of the system's overall reliability and safety. It is advantageous for complex systems/equipment where many functions and multiple pieces of equipment could fail. However, it is possible to overlook equipment-specific failure modes that might only affect a single component or subsystem, which could be identified more easily through a bottom-up approach.

 

 


Bottom-up FMECA Worksheet - Example




Top-down FMECA Worksheet - Example


2.6.3.5.2 CONSIDERATIONS IN IDENTIFYING FAILURE MODES AND FAILURE EFFECTS WITH AN FMECA


Regardless of whether a top-down or bottom-up FMECA approach is used, the FMECA must identify equipment failures that can lead to functional failures, which ultimately result in end effects of interest. To ensure that the FMECA serves its intended purpose, the following issues must be considered:


2.6.3.5.2.1 IDENTIFYING FAILURE MODES


To develop a comprehensive list of failure modes causing each functional failure, the following should be considered:

         Failures that have previously occurred on similar equipment

         Other failure modes that are considered probable, including those being suppressed by the current preventative maintenance program

         Failure modes that are possible but considered unlikely are included to demonstrate that they have been considered

When conducting a bottom-up FMECA, the following guide phrases may help develop a list of failure modes to consider:

         Premature operation

         Failure to operate at a designated time

         Intermittent operation

         Failure to stop operation at a designated time

         Loss of output or failure during operation

         Degraded output or operational capability

         Other unique failure conditions

The causes of failure, such as normal wear and tear, corrosion, abrasion, erosion, fatigue, etc., should be recorded in detail to identify an appropriate failure management strategy. In addition, if there is clear evidence of human error leading to failures or if operator error can cause significant consequences, those failures should also be included. It is essential to ensure that the causes of failure are adequately identified so that the subsequent maintenance recommendations address the root cause rather than just treating its symptoms.


2.6.3.5.2.2 IDENTIFYING FAILURE EFFECTS


When identifying failure effects, it is crucial to consider the following three levels:

         Local effects: These are effects that are localised to the system/equipment being analysed and should include the following:

o    Methods of detecting failures (e.g., alarms, test indicators)

o    Reduced level of performance

o    Whether a standby system/equipment can provide the same function

         Next higher effects: These are effects on the larger system to which the system/equipment belongs and should include the following:

o    Potential physical damage to the system/equipment

o    Potential secondary damage to other equipment in the system or unrelated equipment in the vicinity

         End effects: These are effects on the vessel and should include the following:

o    Threats to safety and the environment

o    The operational effectiveness of the vessel

o    Downtime required to repair the damage


2.6.3.5.3 END EFFECT CONSIDERATIONS


Upon identifying the end effects of failure, the following information should be included in the FMECA:

         Mitigation strategies to reduce the consequences of failure before implementing maintenance (e.g., activating a standby system, reconfiguring the system), and the estimated time needed for such action

         Repair action for the defective item (e.g., repairing primary and secondary damages, identifying personnel needed, determining if dry-docking or shore support is necessary, estimating the time required for the repair)

         Identification of spare parts needed for the repair


2.6.3.5.3.1 LEVEL OF INDENTURE CONSIDERATIONS


When conducting FMECA for a functional group, it is usually performed at the level of equipment where maintenance is performed. This level is known as the convenient indenture level. However, a lower indenture level should be chosen if more than 20 to 30 failure modes can be identified at this level. The lower level should be selected to ensure comprehensive analysis and to identify all potential failure modes that may affect the system's functionality. This approach helps ensure that all potential failure modes are identified and maintenance strategies can be developed to mitigate the risks associated with those failure modes.


2.6.3.5.3.2 MAINTENANCE CONSIDERATIONS


When conducting FMECA, it is important to assume zero-based maintenance, which means that no proactive maintenance tasks are being performed. This approach is necessary to identify the potential failure modes that could occur without any maintenance and to determine the need for a failure management strategy. If proactive maintenance tasks were included in the analysis, the risk associated with equipment failures could be underestimated.

However, the probability of failure for existing maintenance programs should be based on the current maintenance program; this allows for a more accurate assessment of the equipment's actual risk of failure, considering the maintenance tasks already being performed. This information can be used to optimise the maintenance program and identify any additional maintenance tasks necessary to mitigate the risks associated with equipment failures.


2.6.3.5.4 ASSESSING THE CRITICALITY OF FAILURE MODES AND EFFECTS IN AN FMECA


Criticality analysis is an important step in the RCM process as it involves ranking each potential failure mode identified during the FMECA based on a combination of severity classification and probability of occurrence. This analysis aims to identify the risks associated with each failure mode and prioritise them based on their criticality level. By doing so, maintenance resources can be allocated more effectively to those failure modes that pose the most significant risk to the system's functionality, safety, and environmental impact.

The criticality analysis involves assigning a severity classification to each potential failure mode based on the consequences of the failure and a probability of occurrence based on the likelihood of the failure mode occurring. Then, the severity and probability rankings are combined to calculate a risk priority number (RPN) for each failure mode. The RPN is used to prioritise the failure modes and identify those that require immediate attention.

The criticality analysis highlights the risks associated with each failure mode and helps ensure that maintenance resources are allocated effectively to mitigate those risks. This approach can help prevent costly equipment failures and downtime, improve safety, and reduce environmental impact.

Two approaches may be taken to determine the frequency category (probability of occurrence) during the criticality analysis.

         The first approach is the quantitative approach, which should be used if reliability data is available. This approach involves analysing the reliability data and calculating the probability of occurrence of each potential failure mode. The source of the data and the operating context should be noted to ensure the analysis is accurate and reliable.

         The second approach is the qualitative approach, which is used when quantitative data is unavailable to determine the probability of occurrence. This approach involves applying engineering judgment based on previous experience; this may include expert opinions, historical data, and other sources of information to estimate the probability of occurrence of each potential failure mode. This approach may be less precise than the quantitative approach, but it is still helpful in identifying and ranking potential failure modes.

Regardless of the approach used, it is important to ensure that the probability of occurrence is accurately and consistently determined to ensure that the criticality analysis is reliable and effective in identifying and prioritising potential failure modes.

To determine the qualitative risk associated with a failure mode, the following procedure should be followed:

         Severity classification: Identify the consequence of the end effect resulting from each failure mode and the severity category allocated in applying the example shown here. Suppose the failure mode does not directly result in an end effect. In that case, the criticality analysis will assume that the protected function experiences failure with the protective device in the failed state.

         Probability of occurrence: Derive the probability of occurrence of each failure mode identified in the FMECA. If reliability data are available, use the quantitative approach to determine the probability of occurrence. Otherwise, use engineering judgment based on previous experience.

         Risk matrix: Plot the severity classification and probability of occurrence on the risk matrix shown here to obtain the risk level.

The criticality ranking or risk level for each failure mode/end effect pair is then used in an RCM decision flow chart to determine the appropriate failure management strategy. Examples of criticality ranking can be found here.


2.6.3.6 SELECTING A FAILURE MANAGEMENT STRATEGY







After conducting the FMECA, failure modes assessed to have high, medium or low risks are evaluated using the RCM Task Selection Flow Diagrams at the level of each system/equipment item. The flow diagram helps determine the appropriate failure management strategy for the identified failure modes. An example of suggested failure management tasks for the identified failure modes is provided. It should be noted that the answers to the questions posed in the flow diagram only apply to the operating mode under consideration, and "Normal Operating Conditions" (NOC) should be interpreted as the normal operating condition concerning the relevant operating mode.

 

 

2.6.3.6.1 RCM TASK SELECTION FLOW DIAGRAM


The RCM Task Selection Flow Diagram helps determine each failure mode's most effective and efficient failure management strategy based on its assessed risk. The flow diagram considers the failure's consequences, the likelihood of occurrence, and the effectiveness of the failure management strategy to select the appropriate tasks to manage the identified failure modes.


2.6.3.6.1.1 FIRST SELECTION DECISION


To determine whether a failure mode has the highest or lowest risk, the RCM analysis team must consider the consequences and probability of occurrence associated with each failure mode. If a failure mode has the highest risk and cannot be managed through maintenance alone, a fundamental change in how the equipment is designed or operated may be required. In this case, a one-time change is necessary to reduce the risk. Once the one-time change is identified, the FMECA should be updated, and any applicable failure modes should be reevaluated using the RCM Task Selection Flow Diagram.

 

On the other hand, if a failure mode has the lowest risk, it may not require any failure management strategy. Therefore, the RCM analysis team should have a high confidence level in the risk characterisation, indicating that the team is relatively confident that the risk is characterised correctly and can be used in the RCM flow diagram without further discussion.

However, if the confidence level in the risk characterisation is low, indicating uncertainty, the failure mode is assumed to have a medium/moderate risk characterisation. The entire RCM Task Selection Flow Diagram evaluates the failure management strategy. Additional data about the probability or consequence of the failure may be needed before the risk can be used in the decision-making process.



2.6.3.6.1.2 SECOND SELECTION DECISION

Condition-monitoring tasks are considered first because they are usually the most technically feasible and cost-effective option. To determine if a failure mode can be managed through a condition-monitoring task, the team should select a specific task and determine an appropriate task interval. The following criteria should be considered when selecting the task:

         The task must be practicable to implement

         The task must have a high degree of success in detecting the failure mode

         The task must be cost-effective, meaning that the cost of performing the task should be less than the total cost of the consequences of failure over time.

The team should then evaluate the potential risk reduction resulting from implementing the condition-monitoring task by comparing the reduced risk to the risk acceptance criteria. Suppose the risk reduction achieved by the task is unacceptable. Further analysis is needed to determine if other maintenance tasks or a one-time change is needed to manage the failure.

Proactive maintenance task intervals should ideally be based on actual failure data to determine the task interval, but this may not be feasible for most organisations. Therefore, the task interval can be determined from the following sources in ascending order of priority:

         Generic P-F interval data

         Manufacturer's recommendations

         Current task intervals

         Team experience

For condition-monitoring tasks, the task interval must be set at less than half the anticipated P-F interval to ensure enough warning time to take action and avoid the consequences of failure.


2.6.3.6.1.3 THIRD SELECTION DECISION


For wear-out failure modes, the team must select a planned maintenance task that will restore the equipment item to an acceptable level of performance before it fails. The task and task interval should be determined based on the following criteria:

         The task should effectively reduce the probability of the failure mode occurring.

         The task should be practical to implement and cost-effective.

         The task interval should be based on the anticipated P-F interval and the consequences of failure.

         The task interval should be less than half the anticipated P-F interval.

Suppose the selected planned maintenance task does not achieve an acceptable level of risk reduction. In that case, the team must evaluate the need for additional or alternative maintenance tasks or a one-time change.


a) MAINTENANCE TASK SELECTION CRITERIA

If the risk reduction achieved through the planned maintenance task is unacceptable, the team must evaluate the need for design or operational changes to manage the failure mode. For example, if the failure mode exhibits a wear-out failure characteristic, a one-time change or redesign of the equipment item may be necessary to manage the failure. If the failure mode exhibits a wear-in failure characteristic, a one-time change or redesign may also be necessary, but the team should evaluate whether any other proactive maintenance tasks can be implemented in addition to the one-time change or redesign.


b) MAINTENANCE TASK INTERVAL DETERMINATION.

Ideally, proactive maintenance task intervals are determined using actual failure data, but this is not realistic for most organisations. Therefore, the task frequency must be determined from the following sources listed in ascending order of priority and documented:

         Generic failure data

         Manufacturers' recommendations or failure data

         Current task intervals

         Team experience

For planned maintenance to be effective, the item must have a straightforward life. Most items must survive this life, after which the conditional probability of failure increases significantly. The life can be determined based on information from the equipment manufacturer, expert opinion, published reliability data, actuarial analysis, etc.

Suppose the risk reduction does not achieve an acceptable level of risk. In that case, the failure mode is further analysed to determine if a combination of planned maintenance and condition-monitoring tasks can achieve an acceptable risk. If a combination provides an appropriate failure management strategy, the failure mode is further analysed per the maintenance task selection criteria above.

2.6.3.6.1.4 FOURTH SELECTION DECISION

The failure-finding task is a proactive maintenance task designed to detect hidden failures. It is used when condition monitoring and planned maintenance tasks are insufficient to manage the risk associated with a hidden failure mode. The following criteria should be considered when selecting a failure-finding task:

         The failure-finding task must be practicable to implement.

         The task must have a high degree of success in detecting the hidden failure mode.

         The cost of undertaking the task over some time should be less than the total cost of the consequences of failure.

         The task interval must give enough warning of the failure to ensure action can be taken in time to avoid the consequences.

If a failure-finding task is selected, the task and interval should be recorded in the maintenance program, and the failure mode should be reassessed using the RCM Task Selection Flow Diagram to ensure that the selected task provides an acceptable level of risk.


a) MAINTENANCE TASK SELECTION CRITERIA


If the failure-finding task is an appropriate failure management strategy, the team must select the task and task interval. The task interval for a failure-finding task must give enough time for the hidden failure mode to develop between inspections while providing enough warning time to take appropriate corrective action. The maintenance task interval must be set at less than half the anticipated time between the failure of the protective device and the secondary failure. Suppose the risk reduction achieved by the failure-finding task does not achieve an acceptable level of risk. In that case, the failure mode is further analysed to determine if a one-time change or redesign of the equipment item is needed to manage the failure.

 

(b)MAINTENANCE TASK INTERVAL DETERMINATION


In an ideal scenario, actual failure data should be used to determine the intervals for proactive maintenance tasks. However, for most organisations, this is not feasible. Hence, the frequency for these tasks must be established based on the following sources, in order of priority, and documented:

         Generic failure data

         Manufacturers' recommendations or failure data

         Current task intervals

         Team experience

Availability and reliability information should be utilised wherever possible to set the failure-finding intervals T. One equation that can be used to determine T is as follows:

 


The team should consider regulatory requirements and guidelines when setting the unavailability limits for different functions. As a general guide, the unavailability for safety and environmental functions should be kept as low as reasonably practicable, typically not exceeding 0.05%. The unavailability limit should be kept below 2.0% for operational functions, while for non-operational functions, the limit may be higher, up to 10%.


2.6.3.6.1.5 ONE-TIME CHANGES


If the team determines that a one-time change is the best failure management strategy, the team should update the FMECA and RCM Task Selection Flow Diagram to include the change. The team should also consider any impact the change may have on other systems, equipment, or processes and ensure that any necessary modifications are made.

Suppose the team determines that none of the maintenance tasks or one-time changes can manage the failure to an acceptable level of risk. In that case, the team must record the failure mode and its associated risks in the system/equipment documentation and notify the appropriate personnel. The risk may need to be reevaluated periodically or when new information becomes available.

It should be noted that the RCM process is ongoing and requires continuous monitoring and feedback. As new data becomes available or the operating context changes, the FMECA and RCM Task Selection Flow Diagram should be updated accordingly.


2.6.3.6.1.6 ROUNDS AND ROUTINE SERVICING


The team should also review rounds and routine inspection tasks to prevent altering the failure rate curve for the failure mode. These tasks are essential to maintaining the basis for proactive maintenance tasks and risk characterisation, preventing issues such as premature wear-out of a bearing due to insufficient lubrication.

2.6.3.6.2 MAINTENANCE TASK ALLOCATION AND PLANNING


2.6.3.6.2.1 TASK CATEGORIES

The RCM-derived maintenance tasks should be categorised as follows:

         Category A: Can be performed by the vessel's crew while at sea.

         Category B: Must be carried out alongside with the assistance of equipment vendors or using dockside facilities.

         Category C: Must be performed in a dry dock facility.


2.6.3.6.2.2 TASK INTERVAL ADJUSTMENT


The task intervals resulting from the RCM analysis may not align with the current calendar-based maintenance schedule, so the team must integrate them into a common maintenance schedule. However, the RCM task intervals may need to be adjusted to a shorter or longer interval depending on the following criteria:

 

         Tasks with safety or environmental consequences should only be adjusted to a shorter task interval to ensure that safety and containment are not compromised.

         Tasks with operational consequences may be adjusted to a longer or shorter task interval. However, if the interval is adjusted to a longer one, the responsible person in the shipping company should approve the change.


2.6.3.6.2.3 OVERALL MAINTENANCE SCHEDULE


Category B and C tasks should be organised to derive an overall maintenance schedule by adjusting the RCM task intervals, but only for Category B and C tasks. This adjustment should be made using the criteria specified in Category B so that these tasks can be timed to coincide with the vessel's port calling and dry-docking schedules.


2.6.3.6.3 SPARES HOLDING


The availability of spares to support the identified maintenance tasks is crucial for the proposed maintenance schedules to be viable. The spare holding requirement should be developed based on the following factors:

         The list of parts needed to perform tasks to correct each failure mode identified in the RCM analysis and the parts required as a result of remedial work to correct condition-monitoring, planned maintenance, failure-finding, any applicable and effective, and run-to-failure tasks.

         An assessment of the impact on the operational availability of the functional group or system in an out-of-stock situation.

         An evaluation of the parts that can be preplanned for use. For parts that cannot be preplanned, determine the quantity required to achieve the desired operational availability.


2.6.3.6.3.1 STOCK-OUT EFFECT ON END EFFECTS


Determine whether the stock-out and further failure will result in End Effects, such as degradation or loss of propulsion, fire, etc. When determining the effect, consider the direct and indirect effects of the stock-out under normal circumstances. The following defines direct and indirect effects and normal circumstances.

         Direct effect. Suppose the spare is unavailable, and the associated maintenance tasks cannot be carried out. In that case, the corresponding failure mode will eventually lead to an End Effect(s) if a failure occurs.

         Indirect effect. Suppose the spare is unavailable, and the associated maintenance tasks cannot be carried out. In that case, the corresponding failure mode will not lead to an End Effect(s) unless a different failure occurs.

         Normal circumstances. The item operates within the context without a failure occurring.

         If the stock-out has no effect, then no spare holding is required.

2.6.3.6.3.2 SPARES HOLDING DECISIONS

The spares selection process involves the following decision-making process when a stock-out or a stock-out and further failure may result in End Effects:

         If parts requirements can be anticipated before failure occurs or there is sufficient warning time for parts to be ordered, order parts before demand occurs, provided that ordering parts in advance is acceptable.

         If ordering parts in advance is not acceptable, consider holding parts onboard or in storage depots, provided that:

o    The risk of a stock-out is reduced to an acceptable level.

o    The cost and storage basis for holding the parts are feasible.

         If neither of the above strategies is feasible, consider the following:

o    If the stock-out will result in End Effect(s) (direct or indirect), reviewing the RCM analysis to revise the maintenance task is mandatory.

o    If the stock-out will only have a non-operational effect, reviewing the RCM analysis to revise the maintenance task is desirable.




 


 


 

FIGURE 6 Spares Holding Decision Flow Diagram (1)

An example of an operating context and analysis is the fuel oil piping system with two fuel oil supply pumps arranged in parallel redundancy. The system is designed to supply heavy fuel oil to the main propulsion engine and two diesel generator engines operating at their maximum continuous rating. The pumps are operated alternately, with one pump operating for a week while the other is on standby. The anticipated annual service hours for both pumps are the same. This operating context and analysis provide the necessary information to develop a maintenance strategy to ensure the system's reliability and availability.

 

 


 

References & Bibliography:

 

https://www.researchgate.net/publication/354347868_Determination_of_Maintenance_Task_on_Tanker_Vessel's_Marine_Boiler_Using_Reliability_Centered_Maintenance_RCM_II_Method

 

https://www.researchgate.net/publication/350303270_The_Combination_of_Reliability_and_Predictive_Tools_to_Determine_Ship_Engine_Performance_based_on_Condition_Monitoring

 

https://www.researchgate.net/publication/350148670_Application_of_Reliability-Centered_Maintenance_for_Tugboat_Kresna_315_Cooling_Systems

 

http://www.atpm.co.kr/5.mem.service/6.data.room/data/treatise/5.reliability/5.reliability_01.pdf

 

https://www.academia.edu/962903/Increasing_ship_operational_reliability_through_the_implementation_of_a_holistic_maintenance_management_strategy

 

 

https://ww2.eagle.org/content/dam/eagle/rules-and-guides/current/design_and_analysis/132_reliabilitycenteredmaintenance/rcm-gn-aug18.pdf

 

 

Disclaimer:

Out of Box Maritime Thinker © by Naro Consilium Group 2022 and Aleksandar Pudar assumes no responsibility or liability for any errors or omissions in the content of this paper. The information in this paper is provided on an "as is" basis with no guarantees of completeness, accuracy, usefulness, or timeliness or of the results obtained from using this information. The ideas and strategies should never be used without first assessing your company's situation or system or consulting a consultancy professional. The content of this paper is intended to be used and must be used for informational purposes only.

2.13. PRESCRIPTIVE MAINTENANCE

2.13.1 INTRODUCTION 2.13.1.1 DEFINITION Prescriptive maintenance in marine engineering is an evolution of maintenance strategies, meldin...