FMEA에서 고장발생 및 탐지시간을 고려한 고장원인의 위험평가 척도

A Risk Metric for Failure Cause in FMEA under Time-Dependent Failure Occurrence and Detection

Article information

J Korean Soc Qual Manag. 2019;47(3):571-582
Publication date (electronic) : 2019 September 27
doi : https://doi.org/10.7469/JKSQM.2019.47.3.571
*Department of Systems Management and Engineering, Pukyong National University
**Department of Industrial and Information Systems Engineering, Chonbuk National University
***Department of Information and Statistics, Chungnam National University
권혁무*,, 홍성훈**, 이민구***
*부경대학교 시스템경영공학부
**전북대학교 산업정보시스템공학과
***충남대학교 정보통계학과
Corresponding Author(iehmkwon@pknu.ac.kr)
*

이 논문은 부경대학교 자율창의학술연구비(2019년)에 의하여 연구되었음.

Received 2019 June 25; Revised 2019 August 12; Accepted 2019 August 13.

Trans Abstract

Purpose

To develop a risk metric for failure cause that can help determine the action priority of each failure cause in FMEA considering time sequence of cause- failure- detection.

Methods

Assuming a quadratic loss function the unfulfilled mission period, a risk metric is obtained by deriving the failure time distribution.

Results

The proposed risk metric has some reasonable properties for evaluating risk accompanied with a failure cause.

Conclusion

The study may be applied to determining action priorities among all the failure causes in the FMEA sheet, requiring further studies for general situation of failure process.

1. INTRODUCTION

FMEA(Failure mode and effect analysis) is a powerful tool for system safety and reliability analysis of products and processes. FMEA is extensively used in a wide range of industries from manufacturing to service as examples of Sun et al.(2017), Apriliana et al.(2018), Fithri et al.(2018) and so on. In conventional FMEA, the risk of a failure or its cause is evaluated with RPN(Risk priority number), which is the mathematical product of its occurrence, severity and detection. Many authors discussed on the drawbacks of RPN and suggested alternative approaches for risk evaluation. Lieu et al.(2013) provided a literature review on risk evaluation approaches in FMEA up to 2013. Improvement efforts for RPN have been continued until recently, see Srivastava et al.(2018) for example.

Now AIAG(Automotive Industry Action Group) and VDA(German Association of the Automotive Industry) have been debating on their differences and making alignment for the 5thedition of FMEA handbooks(VDA QMC, 2018). Some important changes are i) FMEA-MSR (Monitoring and System Response) is added to maintain a safe state or a state of regulatory compliance during the client’s operation, ii) RPN is replaced by AP (Action Priority), iii) six steps of FMEA are specified, iv) the score tables are updated, and v) two types of recommended actions are to be provided, i.e. preventive action and detection action.

To determine priorities for preventive and detection action, FC(Failure Cause) should be more weighted than FM(Failure Mode). Considering the failure occurrence process, it should be noted that i) any failure occurs only after one of its causes occurs, ii) an FC detected before the actual failure does not induce failure, and iii) each FC has different frequency of occurrence and different inducing time of failure. But there are not so many works of FMEA that considers the role of time in the literature. Kwon et al.(2011), Kwon et al.(2013), Kwon et al.(2018), Jang et al.(2016), and Jang et al.(2016) are the few works which take account of time for risk evaluation in FMEA.

In this paper, we suggest a risk metric which may help determine AP for each FC. Assuming probabilistic models for failure and FC occurrences and detection, the risk metric is defined for FC. In Section 2, the failure occurrence process is described and a risk metric is defined for FC. Section 3 derives a formula to get the numerical value of the risk metric, assuming specific probability distributions. Section 4 provides a numerical example with some analyses and application to FMEA. And finally, some discussions and conclusion are followed.

2. THE RISK METRIC OF FC

Suppose the mission period (0, A] is given for the system. The risk of a failure cause is closely related with the severity and the occurrence process of the failure. If a failure actually occurs during the mission period (0, A], we may suffer some amount of losses. But we do not know exactly when will the failure occur and this uncertainty may cause additional expenses or costs. Thus, the risk may be supposed to have two components; i) the estimated loss due to the unfulfilled mission period and ii) the additional expense due to its uncertainty. Denote the failure time by a random variable T. Let μT and σT be the mean and standard deviation of T. Assuming a quadratic loss function, the risk due to the unfulfilled mission period may be evaluated as

(1a) Lμ=a(A-μT)2I(A>μT),

where I(.) is an indicator function and a is a constant number unique to each FM. Note that each FM or FC will have different context of failure. And the time to failure will be different for each FC and FM. For fair comparison of the size of risk among different FC’s, the risk due to the uncertainty may be properly evaluated by

(1b) Lσ=b(σTμT)2,

which is the coefficient of variation. Thus, we define the RFC (risk metric of FC) as

(2) RFC=Lμ+Lσ.

For convenience of deploy, we consider only one FC of an FM present, assuming the correction time of the FC is negligible. To get μT and σT, we should first examine the failure occurrence process. Let Xk be the kth occurrence time of the FC, Yk be the failure time due to the kth occurrence of the FC, and Uk be the detection time of the kth occurrence of the FC. If the number M of occurrence times of the FC before the actual failure occurs is given by m, the failure occurrence process can be depicted as Figure 1.

Figure 1.

Failure occurrence process

Note that the actual failure does not occur if the FC occurrence is detected and corrected before it occurs. Given M = m, the conditional failure time Tm can be expressed as

(3) Tm=k=1mXk+k=1m-1Uk+Ym.

If we denote the probability mass function of M by p(m) and the probability density function of Tm by fTm(t), the probability density function of the actual failure time is obtained as

(4) fT(t)=m=1p(m)fTm(t).

It may be impossible to get the closed functional form of fT(t). If the probability distributions of Xk, Yk and Uk are given, however, we can obtain μT and σT using the method of taking expectation of the conditional expectation like E[E(TM)]. And hence the size of risk can be evaluated.

3. NUMERICAL EVALUATION OF RFC

The distribution of T is not easy to derive even assuming simple distributions for Xk, Yk and Uk. In this section, we derive the specific formula of the risk metric (2) assuming exponential probability distributions for Xk, Yk and Uk. We further assume that X1, X2, ..., Xm are independently and identically distributed with the probability density function

(5a) fX(x)=λe-λx,   0<x.

Y1, Y2, ..., Ym are independently and identically distributed with the probability density function

(5b) fY(y)=μe-μy,   0<y.

and U1, U2, ..., Um are independently and identically distributed with the probability density function

(5c) fU(u)=τe-τu,   0<u.

3.1 The Mean and Variance of M

Before getting μT and σT, we should first derive the mean and variance of M. Since M is the number of occurrences of the FC until the actual failure occurs, it follows the geometric distribution with success probability

(6) P[Y<U] = μμ+τ.

Thus, the probability mass function of M is given by

(7) p(m) = (τμ+τ)m-1(μμ+τ),      m=1,2,...

The mean and variance of M are

(8a) E(M) = 1+τμ,
(8b) V(M) =τμ (1+τμ),

respectively.

3.2 The Mean and Variance of T

The distribution of T cannot be obtained as a closed form solution. So we first obtain the conditional mean and variance of T given M and then we get the mean and variance of T by taking the expectation of the conditional mean and variance. The conditional mean and variance of T given M = m are

(9a) E[T|M=m] = (1λ+1τ)m +  (1μ-1τ),
(9b) V[T|M=m] = (1λ2+1τ2)m +  (1μ2-1τ2),

respectively. And thus, the mean and variance of T are

(10a) μT=E[E(T|M)] = 1λ(1+τμ)+2μ,
(10b) στ2=E[V(T|M)] + V[E(T|M)] = (1+τμ){1λ2(1+τμ)+2μ(1μ+1τ)},

respectively.

3.3 Validity of RFC

Using formula (1a), (1b), (2), (10a) and (10b), the risk metric can be evaluated quantitatively given the numerical values of λ, μ and τ. Both a and b are closely linked to the severity of FM, while λ, μ and τ are more related with FC. It will be worth examining the behavioral pattern of RFC against λ, μ and τ to confirm its validity. Figure 2 shows the graphs of REC versus τ/μ for λ = (1/100, 1/120, 1/140, 1/160).

Figure 2.

The Graph of RFC for

Based on Figure 1, our intuition tells two general axioms; i) the risk will become smaller if λ takes smaller values and ii) the risk will decrease if τ/μ increases. Axiom i) says that if FC occurs rarely, then the failure risk will be small. And axiom ii) says that if FC is detected earlier before the actual failure occurs, the failure risk become smaller. Both axioms can be reasonably accepted.

Now if we look at Figure 2, the shape of RFC coincides with our intuitive axioms in most cases. When λ = 1/160, RFC has a slight increasing trend as increases. This may not be acceptable but it is a negligible quantity and hard to identify its increasing trend at all. When FC itself occurs very rarely i.e., λ takes very small value, the failure will not occur most of the time and the value of τ/μ does not make any meaningful difference. On the other hand, if τ/μ takes a very large value i.e., the FC is detected immediately upon its occurrence, the failure will not occur even if FC occurs frequently. Figure 2 seems to reflect these logical inferences very well. Thus, the RFC may be generally accepted as a good risk metric for FC.

4. APPLICATION TO FMEA

In this section, we take a numerical example to illustrate application to FMEA. We do not show the whole FMEA spreadsheets. Instead, we provide only relevant columns slightly modified to fit our purpose. And, for simplicity, we consider only one FM with several FC’s. A system or subsystem generally has many FM’s with many FC’s each. But the general situation can be handled similarly.

4.1 An Example

The main functions of the front door of an automobile are i) ingress to and egress from vehicle, ii) occupant protection from weather, noise, and side impact, iii) support anchorage for door hardware including mirror, hinges, latch and window regulator, iv) provide proper surface for appearance items, and v) paint and soft trim. Let’s consider the potential FM “Corroded interior lower door panels.” The potential effects of this FM are i) deteriorated life of door, ii) unsatisfactory appearance, and impaired function of interior door hardware. There are five possible FC’s with the values of λ, μ and τ shown in Table 1.

FC’s of the FM “Corroded interior lower door panels”

Considering the severity of the failure effects, suppose a = 1.0 and b = 0.5 is appropriate for this FM. And the mission period is 10 years. The numbers in the table are not real but fictional only for illustrative use.

Using formula (1a), (1b), (2), (10a) and (10b), we obtain the RFC values shown in Table 2. It is not surprising that the 6th FC “Insufficient room between panels for spray head access” has the biggest value of RFC and hence the first priority for action. It occurs the most frequently and cannot be detected effectively. Its detection probability before the occurrence of the actual failure is 1/(0.8+1) ≅ 0.56. The 3rd FC “Inappropriate wax formulation specified” occurs rarely with λ = 0.01 but it can be hardly detected, once it occurs, before the occurrence of the actual failure. Thus, its RFC value is close to other three FC’s of the 1st, 4th and 5th.

The Numerical Values of RFC’s and Action Priorities

4.2 Sensitivity of μT, σT2 and RFC

Note that μT and σT2 directly affect RFC and they are closely related each other under the assumed distribution. Their sensitivity analyses against the distribution parameters will be helpful to get some insight into the behavior. Assuming the same situation as the example, μT, σT2 and RFC are calculated for λ = 0.25, 0.50, 0.75, 1.00 and τ/μ 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, given μ = 4. We allocated the value of μ much bigger than that of λ because the failure will occur fast once an FC has occurred. If τ/μ is less than 1, the detection of FC is slower than the failure occurrence and an FC occurrence will result in the actual failure occurrence with high possibility. If τ/μ is much bigger than 1, the FC is much likely to be detected before the actual failure occurs.

Figure 3 shows the pattern of change in μT against τ/μ. They shows linear positive relationship with steeper increment when λ is small. It is natural that μT increases as τ/μ increases and λ decreases. This implies that quicker detection and infrequent occurrence of FC prevents or delays the actual failure occurrence on the average.

Figure 3.

The graphs of versus

Figure 4 shows the pattern of change in σT2 against τ/μ. When λ has large value, they shows almost linear and slightly positive relationship but σT2 is not affected so much by τ/μ. When λ is small, however, σT2 is very much affected by τ/μ in a curvilinear pattern. σT2 may be dramatically increase as τ/μ increases if λ is very small. It is also natural that σT2 increases as τ/μ increases but the pattern is quite different from that of μT.

Figure 4.

The graphs of versus

Figure 5 shows the RFC curves against τ/μ. The actual failure will have smaller possibility to occur if FC occurs rarely and easily detected upon its occurrence. The RFC will have smaller values when λ is small and τ/μ gets bigger values. With a bigger value of λ, RFC tends to slowly decrease as τ/μ increases. But, with a smaller value of λ, RFC decreases more steeply at the early stage of increase in τ/μ.

Figure 5.

The graphs of versus

5. DISCUSSIONS

In this section, we discuss some possible issues of the suggested risk metric RFC, which may need to be improved in future studies. We are sure that there should be many weak points better to be refined. But we discuss here only two points; i) about the definition of the risk metric and ii) about the assumptions on the distributions of failure occurrence and detection times.

5.1 Definition of the Risk Metric

To evaluate the risk related with failure in FMEA, Kwon et al.[7] employed three types of loss function; constant, linear and quadratic. And they proposed to use the expected loss for evaluation of the risk. For example, if we apply the quadratic loss function to our situation, the risk can be measured by

(11) RFC=c0A(A-t)2fT(t)dt.

This is a simple and reasonable metric which is an acceptable and easily understandable concept. For practical use in the field application, however, the functional form of fT(t) cannot be derived, even with the most simple probability models for X, Y and U in (3). Thus, we cannot evaluate RFC of (11) even numerically. The only way to get a numerical value is using simulation. As a result of over tens of thousands calculations, we may obtain not an exact but an approximate value of the RFC.

This paper suggests a risk metric of (2) as an alternative to (11). Compared with (11), it may not be perfectly logical but it definitely is much simpler and may have a closed form solution, depending on situations. Moreover, it have some similar and reasonably acceptable characteristics.

5.2 Assumptions of the Probability Model

We assumed in this paper that all the probability models are exponential for X, Y and U in (3) for simplicity. But this is not a practical assumption. The exponential distribution may be appropriate for X but it usually is not appropriate for Y and U. Once an FC occurred, the failure is more likely to occur as time elapses. And detection may have similar properties. Thus, μ and τ are not constant anymore and increasing function of time.

Assuming the Weibull probability model for Y and U, μ(y) and τ(u) can be expressed as

(12a) μ(y) = β1α1(yα1)β1-1,
(12b) τ(u) = β2α2(uα2)β2-1,

respectively. Assuming the Weibull distributions with (12a) and (12b) for Y and U, we know their means and variances are

(13a) E(Y) = α1Γ(1+1β1),
(13b) V(Y) = α12{Γ(1+2β1)-Γ2(1+1β1)},
(13c) E(U) = α2Γ(1+1β2),
(13d) V(U) = α22{Γ(1+2β2)-Γ2(1+1β2)},

respectively, where Γ(.) is the gamma function. And μT and σT2 can be obtained without difficulty. Thus, our risk metric RFC is obtained straightforward from (1a), (1b) and (2).

6. CONCLUSIONS

We proposed a risk metric for the failure cause in FMEA, which may possibly used as an alternative to RPN. The conventional metric RPN(risk priority number) has many drawbacks as discussed in many studies in the literature. The 5th edition of FMEA handbook also provides an improved metric.

We assumed that there are time gaps between the occurrence times of the failure cause and the failure itself. And also detection of a failure cause is assumed to require a length of time. Based on the assumed process of the failure and detection occurrence, we constructed a risk metric for a given failure cause. The metric can be calculated for any failure cause and thus can be used for determining AP(action priority) among all the failure causes in FMEA.

To use the proposed metric in FMEA, information on the failure and detection time distributions are necessary. But, in practical situations, it is very hard to get sufficient information necessary. Past experiences or knowledge of failure mechanism may be helpful in such situation. Sampling and life test data may also be necessary.

In future studies, practical cases or situations are expected over a wide range of industries where the time based model is applicable. Based on the real situations, the suggested model is open to modification, improvement or refinement.

References

Apriliana A. F., Sarno R., Effendi Y. A.. 2018. Risk Analysis of IT Applications Using FMEA and AHP SAW Method with COBIT 5. In : International Conference on Information and Communications Technology (ICOIACT). p. 373–378.
Fithri P., Riva N. A., Susanti L., Yuliandra B.. 2018. Safety Analysis at Weaving Department of PT. X Bogor Using Failure Mode and Effect Analysis (FMEA) and Fault Tree Analysis (FTA). In : 5th International Conference on Industrial Engineering and Applications (ICIEA). p. 382–385.
Jang H. A., Lee M. K., Hong S. H., Kwon H. M.. 2016;Risk Evaluation Based on the Hierarchical Time Delay Model in FMEA. Journal of the Korean Society of Quality Management 44(2):121–138.
Jang H. A., Yun W. Y., Kwon H. M.. 2016;Risk Evaluation in FMEA when the Failure Severity Depends on the Detection Time. Journal of the Korean Society of Safety 31(4):136–142.
Kwon H. M., Hong S. H., Lee M. K.. 2013;An Expected Loss Model for FMEA under Periodic Monitoring of Failure Causes. Journal of the Korean Institute of Industrial Engineers 39(2):143–148.
Kwon H. M., Hong S. H., Lee M. K., Sutrisno A.. 2011;Risk Evaluation Based on the Time Dependence Expected Loss Model in FMEA. Journal of Korean Society of Safety 26(6):104–110.
Kwon H. M., Lee M. K., Hong S. H.. 2018;Risk Evaluation of FMEA under a Weibull Time Delay Model. Journal of the Korean Society of Safety 33(32):83–91.
Liu H., Liu L., Liu N.. 2013;Risk Evaluation Approaches in Failure Mode and Effects Analysis: A Literature Review. Expert Systems with Applications 40:828–838.
Srivastava P., Khanduja D., Agrawal V. P.. 2018. Mitigation of Risk Using Rule Based Fuzzy FMEA Approach. In : 8th International Conference on Cloud Computing, Data Science & Engineering. p. 26–30.
Sun L., Peng L., Deng G., Chien K.. 2017. A Novel FMEA Tool Application in Semiconductor Manufacture. In : Semiconductor Technology International Conference (CSTIC), China. p. 1–4.
VDA QMC. 2018. Alignment of VDA and AIAG FMEA Handbooks. http://vda-qmc.de/en/publications/fmea-alignment.

Article information Continued

Figure 1.

Failure occurrence process

Figure 2.

The Graph of RFC for

Figure 3.

The graphs of versus

Figure 4.

The graphs of versus

Figure 5.

The graphs of versus

Table 1.

FC’s of the FM “Corroded interior lower door panels”

FC’s (Potential failure causes) Distribution parameters
λ µ τ
Upper edge of protective wax application specified for inner panels is too low 0.2 2 3
Insufficient wax thickness specified 0.4 0.4 1
Inappropriate wax formulation specified 0.01 1 0.5
Entrapped air prevents wax from entering comer/edge access 0.1 1 1
Wax application plugs door drain holes 0.05 0.5 2
Insufficient room between panels for spray head access 0.4 0.8 1

Table 2.

The Numerical Values of RFC’s and Action Priorities

FC’s (Potential failure causes) RFC τ/µ Action Priority
Upper edge of protective wax application specified for inner panels is too low 0.43 1.5 4
Insufficient wax thickness specified 0.22 2.5 6
Inappropriate wax formulation specified 0.49 0.5 2
Entrapped air prevents wax from entering comer/edge access 0.42 1 5
Wax application plugs door drain holes 0.46 4 3
Insufficient room between panels for spray head access 3.77 1.25 1