POSTGRADUATE
SCHOOL
MONTEREY, CALIFORNIA
THESIS
APPLICATION OF NEURAL NETWORKS TO PREDICT UH-60L ELECTRICAL GENERATOR CONDITION USING
(IMD-HUMS) DATA
by
Evangelos Tourvalis
December 2006
Thesis Advisor: Lyn R. Whitaker Second Reader: Samuel E. Buttrey
Approved for public release; distribution is unlimited
THIS PAGE INTENTIONALLY LEFT BLANK
REPORT DOCUMENTATION PAGE
Form Approved OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503.
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED December 2006 Master’s Thesis 4. TITLE AND SUBTITLE: Application of Neural Networkss to Predict UH-60L 5. FUNDING NUMBERS Electrical Generator Condition using (IMD-HUMS) data 6. AUTHOR(S) Tourvalis Evangelos
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING
Naval Postgraduate School ORGANIZATION REPORT Monterey, CA 93943-5000 NUMBER 9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING
N/A AGENCY REPORT NUMBER 11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. 12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Approved for public release; distribution is unlimited 13. ABSTRACT (maximum 200 words)
In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System
(IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of vibratory and flight regime data obtained from sensors located throughout the aircraft.
The purpose of this research is to predict failures of the UH-60L’s electrical generators, applying Artificial Neural
Networks (ANN) on the IMD-HUMS-produced data. Artificial NNs are data based vice rule based, thereby possessing the potential capability to operate where analytical solutions are inadequate. They are reputed to be robust and highly tolerant of noisy data. Software tools such as Clementine 10.0, S-Plus 7.0, and Excel are used to establish these predictions.
This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However,
the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.
14. SUBJECT TERMS Condition Based Maintenance, IMD-HUMS, ANNs, Backpropagation, 15. NUMBER OF Learning Process PAGES
99
16. PRICE CODE 17. SECURITY
CLASSIFICATION OF REPORT
Unclassified
NSN 7540-01-280-5500
18. SECURITY
CLASSIFICATION OF THIS PAGE
Unclassified
19. SECURITY 20. LIMITATION CLASSIFICATION OF OF ABSTRACT ABSTRACT
Unclassified UL
Standard Form 298 (Rev. 2-)
Prescribed by ANSI Std. 239-18
i
THIS PAGE INTENTIONALLY LEFT BLANK
ii
Approved for public release; distribution is unlimited bold
APPLICATION OF NEURAL NETWORKS TO PREDICT UH-60L ELECTRICAL GENERATOR CONDITION USING (IMD-HUMS) DATA
Evangelos Tourvalis Major, Hellenic Air Force
Submitted in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE IN OPERATIONS RESEARCH
from the
NAVAL POSTGRADUATE SCHOOL
December 2006
Author: Evangelos Tourvalis
Approved by: Lyn R. Whitaker
Thesis Advisor
Samuel E. Buttrey Second Reader
James N. Eagle
Chairman, Department of Operations Research
iii
THIS PAGE INTENTIONALLY LEFT BLANK
iv
ABSTRACT
In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of vibratory and flight regime data obtained from sensors located throughout the aircraft.
The purpose of this research is to predict failures of the UH-60L’s electrical generators, applying Artificial Neural Networks (ANN) on the IMD-HUMS-produced data. Artificial NNs are data based vice rule based, thereby possessing the potential capability to operate where analytical solutions are inadequate. They are reputed to be robust and highly tolerant of noisy data. Software tools such as Clementine 10.0, S-Plus 7.0, and Excel are used to establish these predictions.
This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However, the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.
v
THIS PAGE INTENTIONALLY LEFT BLANK
vi
TABLE OF CONTENTS
I.
INTRODUCTION........................................................................................................1 A. CONDITION BASED MAINTENANCE......................................................1 B. IMD-HUMS......................................................................................................3
1. On-Board System (OBS).....................................................................3 2. Ground Station System (GSS)............................................................4 C. PREVIOUS WORK.........................................................................................6 D. AREA OF RESEARCH AND APPROACH.................................................7 E. STATISTICAL TOOLS..................................................................................8 F. ORGANIZATION OF STUDY......................................................................8 ARTIFICIAL NEURAL NETWORKS OVERVIEW.............................................9 A. HISTORY.........................................................................................................9 B. BIOLOGICAL NEURON.............................................................................10 C. ARTIFICIAL NEURON...............................................................................11 D. ARCHITECTURE OF NEURAL NETWORKS........................................12
1. Single Layer Networks (SLN)...........................................................12 2. Multi Layer Networks (MLN)..........................................................12 3. Feed -Forward Networks (FFN).......................................................13 4. Radial Basis Function Networks (RBFN)........................................13 E. LEARNING PROCESS.................................................................................14
1. Supervised Learning..........................................................................14
a. Hebbian Learning...................................................................14 b. Delta Rule Learning................................................................15 c. Competitive Learning..............................................................15 2. Unsupervised Learning.....................................................................15 3. Activation Functions..........................................................................16 4. Gradient Descent................................................................................18 5. Back propagation Algorithm............................................................19
a. First Case.................................................................................22 b. Second Case.............................................................................22 6. Efficient Algorithms...........................................................................24 7. Batch Vs Incremental Learning.......................................................28
a. Advantages of Incremental Learning (IL).............................28 b. Advantages of Batch Learning (BL)......................................28 DATA DESCRIPTION AND METHODOLOGY..................................................31
A. SOURCES OF VIBRATION........................................................................31
1. Gear Vibration...................................................................................31 2. Bearings..............................................................................................32 3. Shafts...................................................................................................32 B. DATA COLLECTION..................................................................................32 C. SELECTING VARIABLES..........................................................................33
1. Input Vector.......................................................................................33
vii
II.
III.
D. E.
F. IV.
a. Torque......................................................................................33 b. SO_1 (Shaft Order 1)..............................................................33 c. SO_2 (Shaft Order 2)..............................................................33 d. SO_3........................................................................................34 e. Signal Average RMS...............................................................34 f. Residual Kurtosis....................................................................34 g. Residual RMS..........................................................................34 h. Side Band Modulation_1........................................................34 i. Gear Distributed Fault............................................................34 j. G2_1.........................................................................................34 k. Residual Peak to Peak.............................................................34 l. Gear Misalignment_1.............................................................35 m. Ball Energy..............................................................................35 n. Cage Energy............................................................................35 o. Inner Race Energy..................................................................35 p. Outer Race Energy..................................................................35 q. Envelope RMS.........................................................................35 2. Output Vector.....................................................................................36 DATA PREPROCESSING...........................................................................37 DATA SETS...................................................................................................40 1. Training Set........................................................................................40 2. Test Sets..............................................................................................40 3. Validation Set.....................................................................................41 NETWORK ARCHITECTURE AND EVALUATION CRITERIA........41
RESULTS AND DISCUSSION................................................................................43 A. MODEL WITH ALL PREDICTORS..........................................................43 B. ARTIFICIAL TRAINING SETS.................................................................49 C. STEPWISE PREDICTORS USAGE...........................................................52 CONCLUSIONS AND RECOMMENDATIONS...................................................55
V.
APPENDIX A.........................................................................................................................59 APPENDIX B.........................................................................................................................61 APPENDIX C.........................................................................................................................69 APPENDIX D.........................................................................................................................71 LIST OF REFERENCES......................................................................................................75 INITIAL DISTRIBUTION LIST.........................................................................................77
viii
LIST OF FIGURES
Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16. Figure 17. Figure 18. Figure 19. Figure 20. Figure 21. Figure 22. Figure 23. Figure 24. Figure 25. Figure 26. Figure 27. Figure 28. Figure 29. Figure 30. Figure 31. Figure 32. Figure 33. Figure 34. Figure 35. Figure 36. Figure 37. Figure 38. Figure 39. Figure 40. Figure 41. Figure 42.
Overview of Maintenance Terminology............................................................2 OBS & GSS (From: IMD-HUMS User Manual, 2005)...................................5 A Biological Neuron (From: Lawrence, J., 1993)...........................................10 An Artificial Neuron........................................................................................11 Single Layer Network......................................................................................12 Multi Layer Network.......................................................................................13 Feed Forward and RBF Network Representation............................................14 Identity Function..............................................................................................17 Sigmoid Function.............................................................................................17 Hyperbolic Tangent Function..........................................................................18 Gradient Descent..............................................................................................18 Sigmoid Function.............................................................................................19 Gradient Descent Using One Weight...............................................................20 Learning Rate effect on Gradient Descent (From: Fausett, L., 1994).............25 Ill Conditioning. (From: Bishop, C., 1995).....................................................27 Clementine Preprocessing Data.......................................................................39 Model Architecture GUI..................................................................................40 Clementine Prediction Table............................................................................42 Ggobi screen for “bad” generators...................................................................44 p
ˆ from “bad” Generator 9...............................................................................54 p
ˆ from “good” Generator 66...........................................................................54 Model 1 (Predict Bad 09 and Good 4, 26, 42, 66)...........................................61 Model 2 (Predict Bad 22 and Good 4, 26, 42, 66)...........................................61 Model 3 (Predict Bad 31 and Good 4, 26, 42, 66)...........................................62 Model 4 (Predict Bad 33 and Good 4, 26, 42, 66)...........................................62 Model 5 (Predict Bad 53 and Good 4, 26, 42, 66)...........................................63 Model 6 (Predict Bad 56 and Good 4, 26, 42, 66)...........................................63 Model 7 (Predict Bad 53, 9 and Good 4, 26, 42, 66)....................................... Model 8 (Predict Bad 53, 22 and Good 4, 26, 42, 66)..................................... Model 9 (Predict Bad 53, 31 and Good 4, 26, 42, 66).....................................65 Model 10 (Predict Bad 53, 33 and Good 4, 26, 42, 66)...................................65 Model 11 (Predict Bad 53, 56 and Good 4, 26, 42, 66)...................................66 Model 12 (Predict Bad 31, 9 and Good 4 ,26, 42, 66).....................................66 Model 13 (Predict Bad 31, 22 and Good 4, 26, 42, 66)...................................67 Model 14 (Predict Bad 31, 33 and Good 4, 26, 42, 66)...................................67 Model 15 (Predict Bad 31, 53 and Good 4, 26, 42, 66)...................................68 Model 16 (Predict Bad 31, 56 and Good 4, 26, 42, 66)...................................68 Model 17 (Predict Bad 9 Using Artificial Sets)...............................................69 Model 18 (Predict Bad 33 Using Artificial Sets).............................................70 Stepwise Model Using 5 Predictors (Predict Bad 9)......................................71 Stepwise Model Using 5 Predictors (Predict Bad 22)....................................71 Stepwise Model Using 5 Predictors (Predict Bad 31)....................................72
ix
Figure 43. Figure 44. Figure 45.
Stepwise Model Using 5 Predictors (Predict Bad 33)....................................72 Stepwise Model Using 5 Predictors (Predict Bad 53)....................................73 Stepwise Model Using 5 Predictors (Predict Bad 56)....................................73
x
LIST OF TABLES
Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11.
Potential Model Predictors...............................................................................35 Bad Generators—Reasons for Replacement....................................................37 Training Set using only Original Data.............................................................38 Data Multiplication of Bad Observations........................................................39 Models Predicting Single Generator................................................................46 Models Predicting Pair of Generators Including 53........................................47 Models Predicting Pair of Generators Including 31........................................48 Summary Statistics for each Predictor variable including the Minimum, Maximum, Average and Standard Deviation...................................................50 Artificial Training Set to Predict Gen 9...........................................................51 Artificial Training Set to Predict Gen 33.........................................................51 Stepwise Good Generated Model....................................................................53
xi
THIS PAGE INTENTIONALLY LEFT BLANK
xii
LIST OF ACRONYMS AND ABBREVIATIONS
ANN(s)
Artificial Neural Network(s)
BP Back Propagation BL Batch Learning CBM GSS FFN
IL Incremental IMD-HUMS LMS Least-Mean-Squares MLN Multi OBS PM Predetermined RBFN SLN
Condition Based Maintenance Ground Station System Feed Forward Networks
Learning
Integrated Mechanical Diagnostics Health and Usage Maintenance System Layer Networks On Board System
Maintenance
Radial Basis Function Networks Single Layer Networks
xiii
THIS PAGE INTENTIONALLY LEFT BLANK
xiv
ACKNOWLEDGMENTS
I would like to acknowledge the help of my excellent thesis advisor, Professor Lyn R. Whitaker, and my second reader, Samuel E. Buttrey, for their direction and assistance as this research was developed.
Also, I would like to mention that this thesis could not have been completed without the presence, support and encouragement of my wife Elena and my son Vasilis.
xv
THIS PAGE INTENTIONALLY LEFT BLANK
xvi
EXECUTIVE SUMMARY
Readiness is a key factor for military forces to stay effective and reliable in a continuously growing and demanding environment. Increased readiness can be achieved by increasing availability through performing efficient maintenance, performing less corrective maintenance actions, and identifying more accurate preventive maintenance periods. Today, the United States and allied forces spend billions of dollars for time or phased maintenance periods that overlook several facts and realities of operational use. Important savings can be gained by using hardware and software to evaluate component health and the conditions of systems based on operational usage and performing maintenance in relation to statistical and engineering analyses that predict availability and readiness.
Nowadays, the majority of maintenance processes are accomplished by either the predetermined preventive or the corrective approach. The former approach has fixed maintenance intervals; the latter is performed after the fault of the component. Because both approaches are costly, some industries have started to perform maintenance action in a predictive manner, Condition Based Maintenance (CBM), where the condition is the key parameter to set the maintenance intervals and appropriate maintenance tasks.
Condition Based Maintenance (CBM) is a technology weapon that tries hard to recognize initial faults before they develop into critical failures, which permits more precise scheduling of the preventive maintenance. The causes that have motivated a boost in the action of CBM include the need for reduced maintenance and logistics costs, protection against failure of mission-important equipment, and upgraded equipment availability.
In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of data obtained from sensors located throughout the aircraft. The IMD-HUMS improves aircraft availability for operators by identifying potential
viix
problems early so that maintenance can be performed before it becomes an issue that could impact flight operations. The system also provides operators with accurate flight parameter data, monitored automatically on each flight, allowing them to better schedule routine maintenance and, in some cases, avoid unnecessary early repair and overhaul.
Neural networks are used in numerous fields, including medical diagnostics. In this thesis neural networks are used for machinery diagnostics and specifically for diagnosing the UH-60L helicopter’s electrical generator. In order to accomplish this, a database collected from IMD-HUMS is used. The emphasis in this thesis is to develop a neural network that would utilize the collected data from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that would predict a potential failure of a UH-60L helicopter generator. Many different neural networks are evaluated for their success rate for this faulting diagnosis.
As in any prediction/forecasting model, the selection of appropriate model inputs is extremely important. However, in most ANN Artifiacial Neural Network) applications, less attention is given to this task. The main reason for this is that ANNs belong to the class of data-driven approaches, whereas conventional statistical methods are model driven. In the latter, the structure of the model has to be determined first, which is done with the aid of empirical or analytical approaches, before the unknown model parameters can be estimated. Data-driven approaches, on the other hand, have the ability to determine which model inputs are critical, so there is less need for “...a priori rationalization about relationships between variables...” However, presenting a large number of inputs to ANN models and relying on the network to determine the critical model inputs usually increases network size. This has a number of disadvantages, such as decreasing processing speed, increasing the amount of data required to estimate the connection weights efficiently and degrading performance of the AAN. This is particularly true for complex problems, where the number of potential inputs is large and where no a priori knowledge is available to suggest which inputs to include.
Clementine which is the software used in this research, incorporates several
features to avoid some of the common pitfalls of ANNs, including sensitivity analysis, network accuracy, and feedback graph. With these options selected, a sensitivity analysis
xviii
will provide information on which input fields are most important in predicting the output field, a network accuracy will provide the percentage of records for which the prediction of the model matches the observed value in the data, and the feedback graph will depict the accuracy of the network over time as it learns.
In practice, building an ANN forecasting model involves a lot of trial and error. Consequently, the objective of this thesis is to provide a practical, non-technical introduction to structure an ANN forecasting model using real operating data of UH-60L helicopters. The success of ANN applications for an individual researcher depends on three key factors. First, the researcher must have the time, patience, and resources to experiment. Second, the ANN software must allow automated routines, such as walk-forward testing, optimization of hidden neurons, and testing of input variable combinations—either through direct programming or the use of batch/script files. Third, the researcher must maintain a good set of records that lists all parameters for each network tested.
This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However, the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.
xix
THIS PAGE INTENTIONALLY LEFT BLANK
xx
I. INTRODUCTION
Readiness is a key factor in enabling military forces to stay effective and reliable in a continuously growing and demanding environment. Increased readiness can be achieved by increasing availability through performing efficient maintenance, performing fewer corrective maintenance actions, and identifying more accurate preventive maintenance periods. Today, the United States and allied forces spend billions of dollars on time or phased-maintenance approaches that overlook several facts and realities of operational use. Important savings can be gained by using hardware and software to evaluate component health and the conditions of systems based on operational usage and performing maintenance in relation to statistical and engineering analyses that predict availability and readiness.
The emphasis in this thesis is to develop a neural network that utilizes data collected from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that can predict a failure of a UH-60L helicopter generator. Many different neural networks will be evaluated for their success rate on this faulting diagnosis. A.
CONDITION BASED MAINTENANCE
Maintenance is usually carried out in either time-based scheduled periods (so-called preventive maintenance) or by corrective maintenance. Preventive maintenance aims to avoid system or component failure by performing repair, service, or replacement within the fixed time intervals. On the other hand, corrective maintenance is performed after the failure or when an apparent fault has taken place (Davis, A., 1998). For several types of equipment or systems the maintenance action must be done without delay, but for many others it can be delayed depending on the equipment’s function. In many cases the preventive maintenance can be divided into two groups: Condition-Based Maintenance (CBM) and Predetermined Maintenance (PM). PM is scheduled in time, while CBM mostly has dynamic or on-request intervals (Figure 1).
1
Figure 1. Overview of Maintenance Terminology
Entire CBM schemes involve a number of efficient capabilities, like sensing and data acquisition, signal processing, condition and health estimation, prognostics, and decision assistance. Moreover, in order for the user to have access to the system, a Human System Interface (HSI) development is necessary. Generally, the integration of various hardware and software components is needed to implement a CBM system.
A complete architecture for CBM systems should cover the range of functions from data collection through the recommendation of specific maintenance actions. The major tasks that assist CBM consist of (http://www.osacbm.org):
• • • • • • •
Sensing and data acquisition
Signal processing and feature extraction Production of alarms or alerts
Fault or failure diagnosis and health evaluation
Prognostics: projection of health profiles to future health or estimation of remaining useful life
Decision aiding: maintenance recommendations, or evaluation of asset readiness for a particular operational setting
Management and control of data flows or test sequences
2
• • •
Management of historical data storage and historical data access System configuration management Human system interface.
CBM makes use of information collected on equipment through monitoring devices. As equipment becomes more complex, more manufacturers are providing these monitoring devices to assist companies or organizations handle and maintain their equipment (Tsang, A., 1995). CBM uses this online data to compare equipment conditions to predefined operating thresholds. Data that happen to fall outside these thresholds generates a maintenance alert by the software that signals a problem or area of concern.
B. IMD-HUMS
In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of data obtained from sensors located throughout the aircraft. The IMD-HUMS improves aircraft availability for operators by identifying potential problems early so that maintenance can be performed before it becomes an issue that could impact flight operations. The system also provides operators with accurate flight parameter data, monitored automatically on each flight, allowing them to better schedule routine maintenance and, in some cases, avoid unnecessary early repair and overhaul. The IMD-HUMS consists of two main subsystems: the On-Board System (OBS) and the Ground Station System (GSS) (System Users Manual For IMD-HUMS, 1995).
1. • • • • • •
On-Board System (OBS) Cockpit display unit (CDU) Data transfer unit (DTU) Remote data concentrator (RDC) Main processor unit (MPU) 2 junction boxes (JB1/JB2)
20 drive train and gearbox accelerometers
3
The OBS is comprised of the following components (Figure 2):
• • • • • •
4 engine accelerometers 5 trim and balance accelerometers
1 4g body accelerometer for regime recognition Main and tail rotor magnetic RPM sensors Main rotor blade tracker
Engine output shaft optical tachometers.
The heart of the IMD-HUMS OBS is the Main Processing Unit (MPU). The MPU collects the data from the accelerometers, analyzes the inputs, and records the data, seeking for vibration exceedances and events. It calculates time spent in various flight regimes, performs various diagnostic algorithms, and stores the data to an onboard data cartridge. The OBS also provides for crew interaction through a Cockpit Display Unit (CDU) in order to support prompted procedure actions related to power assurance checks, power train analyses, and rotor track and balance data acquisitions. Besides prompted actions, the OBS uses regimes information to automatically store power train and rotor vibration data.
2.
Ground Station System (GSS)
The GSS is the major user interface for the IMD-HUMS. It performs after-flight debrief and is designed to analyze, process, and compile flight data into useful information for the maintenance crew, logistics teams, the operations department, and engineering support. The IMD-HUMS GSS functions include:
4
Figure 2. OBS & GSS (From: IMD-HUMS User Manual, 2005)
5
• • • • • • • • •
Rotor Time and Balance Strip Charts of Aircraft Data Engine Performance Trending
Usage Computation and Tracking Regime Identification and Processing Flight Operations Management Fault/BIT Display Maintenance Management
C. PREVIOUS WORK
Willard and Klesch in their 2005 thesis, used 36,742 observations from monitored components of 30 UH-60L helicopter’s generators. The data was collected during the two-year period where the IMD-HUMS were installed. Each IMD-HUMS acquisition concerning the shaft, spur gear, and bearing of generators results in 170 variables. Each generator is assigned a binary value 1 or 0 to classify its known state. The value of one was given to the generators that were removed for fault, hence referred as bad generators. The value of zero was given to the generators that were not removed, referred to as good generators. To accomplish this generator classification, maintenance records and photographs from the 101st AVN Division were used.
Principal components and other techniques were applied to reduce the 170 initial predictors to only 10. A logistic regression model and random forest classifiers were used on each generator, and the plotted probabilities of being bad were smoothed and used to predict the current functional condition of generators in the test set. Only Condition Indicators (CI) computed in the last 20 observations of each generator were used in the predictive models because generators classified as bad were not necessary bad through their entire two-year history. Due to the highly variable nature of the predictor values, the model had lower success predicting states with just one acquisition. One the other hand, some surprising cases of generators which were wrongly presumed to be bad and, conversely, another generator which was wrongly assumed to be good, were classified correctly by this study’s approach.
6
D. AREA OF RESEARCH AND APPROACH
ANNs have a number of traits that make them an attractive alternative to
conventionally configured expert systems. First, many are capable of discriminating non-linear relationships. Second, they are capable of functioning with a certain degree of background noise and erroneous information with minimal degradation of their pattern recognition abilities. Third, they have the ability to generalize, having the ability to classify previously unseen vector patterns into existing and, in some cases, new output categories. They are also capable of identifying multiple faults. These are all areas where traditional expert systems typically fall short. Moreover, ANNs are data-based rather than rule-based. This means that they may be capable of correctly discriminating relationships previously hidden from the best of “experts”.
ANNs are not without their disadvantages. They, like all computer algorithms, are capable only of manipulating numbers and require an engineer to discern the intelligence of their output. Their success is largely limited to the quality of the data that they are provided. If the input vectors provided are inadequate to describe the decision space fully, then their likelihood for success is small. Again, they require an engineer to provide the proper inputs. Finally, they may be able to distinguish new relationships, but the relationships themselves remain hidden; all that is seen external to the network are the input and the output vectors. It is generally believed that the relationships are somehow hidden in the connection weights and the hidden layers but meaningful extraction of this information has yet to occur.
ANNs appeared to have potential in numerous fields, including machinery diagnostics. The question might be asked whether an ANN should theoretically be capable of recognizing patterns in vibration signatures. It is the scope of this research to determine whether this potential can be realized in the region of machinery diagnostics and specifically for the UH-60L helicopter’s electrical generator. In order to accomplish this, a database collected from IMD-HUMS is be used. Pattern recognition is an essential component of rotating machinery condition forecasting; therefore, examining and training different model structures and shapes in trying to identify patterns that are “storing” the “weights” of the networks is being researched.
7
E. STATISTICAL TOOLS
ANN software prices start at a few hundred dollars and can go to hundreds of thousands, or even more. The most expensive ones are generally packaged with more complete data mining products, which contain ANNs as one of the capabilities offered. This research utilizes one of those statistical packages, Clementine 10.0 produced by SPSS. This product is designed to function on servers and networks and has the ability to handle massive databases. The software also provides some handy features useful in evaluating the function fit by the NN, such as a computation of sensitivities. Through this software, the researcher had the opportunity to apply and, at the same time, train a number of different kinds of ANNs, in an effort to find the most suitable for the database of interest.
In addition, S-PLUS 7.0 from Insightful and Excel from Microsoft, two inexpensive and well-spread tools were used to assist the models in this research. F.
ORGANIZATION OF STUDY
This thesis begins with Chapter I, which briefly introduces the reader to the modern concept of CBM and the tools available to support it, like IMD-HUMS manufactured by Goodrich Corporation. A summary of previous relative work is provided, along with the author’s area of interest and the tools utilized to achieve the objective of this research.
Chapter II is dedicated to the traditional explanation of ANNs. Theory and pictures are used concurrently trying to clarify what is commonly known to ANNs as a “black box” solution. Data enter the “black box” and a prediction comes out of it.
Chapter III describes the database, procedures that are followed to clean and choose the final data in use, and presents the steps of the methodology that leads to the result of this research.
Chapter IV briefly presents and discusses the results and outputs of the models that were trained in supporting the goal of this research. Most of the structures of these models are summarized in Appendix B.
Finally, Chapter V summarizes the conclusions and recommendations for future research ideas associated with the database and used techniques.
8
II. ARTIFICIAL NEURAL NETWORKS OVERVIEW
An Artificial Neural Network (ANN) is an information processing system that has certain performance characteristics in common with biological ANNs. They are parallel in nature and the fundamental idea behind them is that, if it works in nature, it must be able to work in computers. ANNs are data-based vice rule-based, so they possess the potential of being able to operate where analytical solutions are inadequate. They are reputed to be robust and extremely tolerant of noisy data. A. HISTORY
The ANN concept was first introduced in 1943 by W. McCullock and W. Pitts, who, while trying to describe how the brain’s neurons might work, modeled an ANN using electrical circuits. In 1949, D. Hebb introduced the training of the ANNs in his book “The Organization of Behavior,” in which he argued that if two nerves fire at the same time, their connection is strengthened and thereby it is also possible that the same two nerves will fire again. By the 1950’s, while computers became more advanced, researchers were eventually able to simulate such a hypothetical ANN. N. Rochester of IBM laboratories was an early pioneer in this field but, unfortunately, his effort failed. During 1959 and 1962, B. Widrow and M. Hoff developed two models (ADALINE and MADALINE) that recognized binary patterns, introduced a new learning algorithm applying the Least-Mean-Squares (LMS) learning rule, and used it to train adaptive ANNs.
For almost for two decades (1960-1980), interest in ANNs faded because of a lack of new ideas and computational power. In 1972, T. Kohonen and J. Anderson developed a similar network independently of one another, with both resulting in a collection of analog ADALINE circuits. The first multi-layered network was developed in 1975, an unsupervised network.
Since 1980, many researchers have boosted the idea and today ANNs are extremely popular as prediction and forecasting tools in a number of areas. The future of ANNs, though, lies in the development of hardware.
9
B. BIOLOGICAL NEURON
In principal, the brain is composed of almost 10 billion neurons; each is attached to about 10,000 other neurons. The main segment of the cell is called the soma or cell body (Figure 3). Neurons have a large number of extensions called dendrites. Each neuron receives electrochemical signals from other neurons connected throughout different axons. At the synapses—between the dendrite and axons—electrical signals are modulated in various amounts. If the sum of these electrical signals is sufficiently influential to activate the neuron, it produced an electrochemical output along the axon, and passes this signal to the other neurons, whose dendrites are attached at any of the axon terminals (Norgaardm M., Ravn O., Poulsen N., Hansen L., 2000). These attached neurons may then fire. It is essential to point out that a neuron fires only if the entire signal received at the cell body surpasses a certain level. The neuron either fires or it doesn't; there aren't different levels of firing.
Figure 3. A Biological Neuron (From: Lawrence, J., 1993)
The human brain is composed of these interrelated, electrochemical, broadcasting neurons. From a huge number of extremely naive processing units (each carrying out a weighted sum of its inputs, and then firing a binary output if the total input exceeds a certain level) the brain controls very complex tasks. This is the model on which artificial ANNs are based. Although ANNs haven't even approximated modeling the
10
complexity of the human brain, they have appeared to be excellent at problems that are easy for a human but difficult for a conventional computer, such as image recognition and forecast based on earlier knowledge. C. ARTIFICIAL NEURON
The artificial neuron is meant to mimic the major characteristics of the biological neuron. The three crucial mechanisms of the artificial neuron are:
(1)
The synapses or connecting links that give weights,wj to the input values,
xj for j=1, 2,…, m (Figure 4).
(2)
A summer that adds all the weighted input values to calculate the input to the activation functionv=w0+∑wjxj, where w0 called the bias (not to
j=1m
be confused with statistical bias in prediction or estimation) which is an
arithmetic value associated with the neuron. It is suitable to consider the bias as the weight for an input x0 whose value is constantly equivalent to 1, so that v=∑wjxj.
j=0m
(3)
X0=1 X1 X2 . . . Xm An activation function g that maps v to g (v), the output value of the neuron. This function is a monotone function.
w0 w1 w2 v g g(v) Σwm Figure 4. An Artificial Neuron
The artificial neuron has two stages of operation; the training stage and the using stage (Fausett, L., 1994). During the former, the neuron can be trained to fire or not, for specific input patterns or vector. In the latter, when a taught input pattern is identified at the input, its relayed output updates the current output. If the input pattern does not
11
belong in the taught list of input vector or patterns, the firing rule is used to determine whether to fire or not. Is important to state that the firing rule is related to all the input patterns, not only the ones on which the neuron was trained. D.
ARCHITECTURE OF NEURAL NETWORKS 1.
Single Layer Networks (SLN)
The structure of an artificial ANN consists of the ‘input layer’ connected to the ‘hidden layer’, which is connected to the ‘output layer.’ Any action of the input units corresponds to the raw data that is fed into the network, while the action of each hidden unit is determined by the behavior of the input units and the weights on the links between the input and the hidden units (Ripley, B., 1996). The status of the output units depends on the activity of the hidden units and the weights between the hidden and output units (Figure 5).
Figure 5. Single Layer Network
This simple class of network is especially interesting because the hidden units are free to build their own versions of the input. The weights between the input and hidden units decide when each hidden unit is active, and so by adjusting these weights, a hidden unit can choose what it represents.
2.
Multi Layer Networks (MLN)
The most frequent ANN model is the multilayer perceptron (MLP). This family of ANN is known as a supervised network because it needs a preferred output for learning purposes (Kartalopoulos, S., 1996). The goal of this network is to generate a model that
12
correctly matches the input to the output using chronological data so that the model can then be used to create the output when the desired output is not known. A graphical demonstration of an MLP is shown below (Figure 6).
Figure 6. Multi Layer Network
3.
Feed -Forward Networks (FFN)
Feed-forward ANNs are the most accepted and extensively used models in several realistic applications. FFNs permit signals to pass through one way only: from input to output. There is no feedback or loop and they tend to be straight-forward networks that connect inputs with outputs. They are broadly used in pattern recognition. This structure of organization is also referred to as bottom-up or top-down. (Figure 7)
4.
Radial Basis Function Networks (RBFN)
The input of a Radial Basis Function (RBF) network is nonlinear while the output is linear. Because of their nonlinear approximation properties, RBF networks are capable of modeling complex mappings, which perceptron ANNs can only model by using multiple intermediary layers (Bishop, C., 1995). To use an RBF network we need to state
13
the hidden unit activation function, the quantity of processing units, a criterion for modeling a given task, and a training algorithm for finding the parameters of the network.
Figure 7. Feed Forward and RBF Network Representation
E. LEARNING PROCESS
One of the most essential phases of ANN is the learning process. Learning can be done in a supervised or unsupervised way.
1. Supervised Learning
In supervised learning, both the inputs and the outputs are known. During the training, the net runs the inputs and compares its resulting outputs against the known outputs. The differences of this comparison (the errors) are calculated, and the system adjusts the weights that manage the network. The aim is to establish a set of weights that minimizes the error. One famous method, which is frequent to many learning procedures, is the Least Mean Square (LMS) convergence (Duda, R., 2000). In this case, the network learns “offline” because the learning and the operation stages are different. Supervised learning can be subdivided into the following three general types:
a. Hebbian Learning
Hebbian learning is based on the premise that those connections that
receive the most signal energy should in turn be strengthened. In this type of ANN, connection weights increase in a manner proportional to the magnitude of the signals
14
provided that both the input through the path and the desired output are high. While historically important and neurologically accurate, it is not widely used in neural computing applications.
b. Delta Rule Learning
Today, the most frequent form of learning in use is the delta rule. Here,
weights are adjusted based on a direct comparison between the actual and desired outputs. Back propagation is one learning rule based on the generalized delta rule:
Wij=C1Eij+C2Mij+C3Xij (2.1)
where Wijis the weight of the connection from the ithelement in the current layer to the jthelement of the previous layer; C1,C2,andC3 are coefficients varying from 0 to 1; Eijis the error proportional to the difference between the actual and desired output of the network;Mijis the momentum term based on the difference between the previous weight of the given connection and the weight immediately prior to that; and Xijis the activation energy associated with that particular connection (Ripley, B., 1996).
c. Competitive Learning
Competitive learning is where the output of processing elements is
weighted according to the magnitude of its response relative to those of other processing elements. The “winning” processing element weighting is then modified according to comparison between actual and desired outputs. Thus only the strongest activation energies are adjusted; weak signals get progressively weaker unless the magnitudes of their response become comparable to those of the “winners”.
2. Unsupervised Learning
In unsupervised training, the network is provided only with inputs and not with preferred outputs. In the training phase, an input pattern is applied to the input layer and the net is permitted to achieve equilibrium (“winner”). Thereby, weight changes are made according to some instructions. The model itself should then decide what features it will use to cluster the input data. This type of organization is also known self-organized or adoption. Here the network learns “online,” because it learns and operates at the same time.
15
3. Activation Functions
The activation function is basically used to introduce nonlinearity to the net. The
activation functiong(i) transforms the presented input of an artificial neuron during its activation, and determines how influential should be the output from the neuron, based on the sum of the inputs. If the artificial neuron must mimic a biological neuron, the activation functiong(i) has to be a simple threshold function returning binary values. However, this is not always the approach that artificial neurons implement. Sometimes it is more powerful and efficient to have a smooth differentiable activation function. The output from this group of activation functions lies within the ranges of [0,1] or [-1,1], depending on which activation function is applied. Some cases, where the identity function is used as the activation function, do not have these restrictions. On while, inputs and weights have no boundaries and take values within the Rrange(−∞,+∞), in practice they often have small values centered around zero or are rescaled to have such small values.
As pointed out before, there are many different activation functions, but the most frequently used are the identity function (Figure 8), the sigmoid (Figure 9), and the hyperbolic tangent (Figure 10). It is obvious that all the functions should be differentiable because the back propagation (BP) algorithm requires this property in order to work out the data process within the network (Bishop, C., 1995).
An identity function, also known as an identity map or an identity transformation, is a function which does not have any effect. It always returns the same value that was used as its argument. For ANNs, that is reflected by the absence of hidden layers (perceptron).
16
g(x)=x
Figure 8. Identity Function
A sigmoid function, also known as logistic, is an S-shaped curve that maps all input to the range [0, 1]. It has a limit of 0 as x approaches negative infinity, and 1 as x approaches infinity.
1
g(x)=
1+e−x
Figure 9. Sigmoid Function
A hyperbolic tangent function is analogous to a sigmoid, but it maps all of its input to the range [-1, 1]. It has a limit of -1 as x approaches negative infinity and 1 as x approaches infinity. The constants a and b define the possible output range (a) as well as the slope (b). The neuron transforms the stimulus in a nonlinear way, an essential precondition for solving a variety of highly complex problems.
17
e2x−1
g(x)==2x
e+1
(a) (b)
Figure 10. Hyperbolic Tangent Function
4.
Gradient Descent
→
→
As clarified previously, in training, a function estimator frequently decreases to result in a value of w that minimizes a scalar error functionE(w). This is a typical optimization issue and various methods have been developed to answer it. The most frequent one is gradient descent. Gradient descent consists in thinking that E is the height of a landscape on the weight space: to locate a minimum, beginning from an arbitrary point, march descending until a minimum is reached (Figure 11) (Sobajic D., 1993). From the figure we can see that gradient descent will not at all times converge to an absolute minimum of E, but only to a local minimum. In the majority of these cases, this local minimum is good enough, given a realistic initial value ofw.
→
Figure 11. Gradient Descent.
18
5. Back propagation Algorithm
Back propagation (BP) is one of the earliest training algorithms, first developed by P. Werbos and is widely used for training supervised networks. The goal of BP is to minimize the square error of the predictions over all the observations (Fausett, L., 1994). Based on this algorithm, the output error is assumed to be collectively contributed by all connection weights. Basically, at the center of the algorithm, an application of the chain rule for ordered partial derivatives takes place to compute the sensitivity that a cost function has with respect to the environment and weights of the net.
Initial weights are usually chosen as small random values, so that each neuron will adapt a different set of weights. The network’s input zjto a node jis resolved by summing the weights of its inputs
zj=∑wi,jxi∀j (2.2)
i
where xidesignates the input in one node, and wi,jthe weight from node i to node j. Next, the node’s threshold valueθ (bias) is added to net’s input zj and the calculated value is filtered through an activation function, usually a sigmoid function (Figure 12):
F(zj) =
1
(2.3) -zj+θj
(1 + e)
Figure 12. Sigmoid Function
19
The sigmoid function is also known as a ”squashing” function because it maps its inputs on a preset range number between 0 and 1.
The learning procedure in a BP net has two stages. At the first stage, each input pattern Ip is provided to the net sequentially, and propagated forward until the output. In the second stage, a technique called “gradient descent” is applied to minimize the total error on the input patterns within the training set. During this technique, weights are altered in proportion to the negative of an error derivative with respect to each weight
∆wj,i=−ε[
∂E
] (2.4) ∂wj,i
Then, the weights moving toward the steepest descent of the error surface, which is defined by the total error
1
E=ΣpΣj(tp,j−op,j)2 (2.5)
2
where op,j denotes as the responding output of node j to patternp, and tp,j is the target output for nodej. After the error on each pattern is calculated (1.2), all the weights are readjusted in proportion to this error, and back-propagated from the outputs to the inputs, applying the gradient descent method. The new calculated weights decrease the overall error in the net. The idea of gradient descent using only a single weight is presented in the following picture.
Figure 13. Gradient Descent Using One Weight
20
An application of the chain rule is used to develop the BP learning rule, while reworking the error gradient for each pattern as the product of two partial derivatives. The first partial derivative represents the change in error as a function of the network input∂Ep/∂wj,i, while the second partial derivative represents the effect of a weight
change on a change in the network input∂zp,j/∂wj,i. Modifying the error gradient turns
into
∂Ep∂wj,i
=[∂Ep∂zp,j
]*[∂zp,j∂wj,i
] (2.6)
Using the equation (2.2) for the net input zj to a nodej, we can solve directly for the second partial derivative and derive the network’s output op,ifor the pattern pto the nodei:
∂zp,j∂wj,i
∂(∑wi,koi,k)
k
=[
∂wj,i
]=op,i (2.7)
Naming the negative of the first partial derivative as the error signal:
dp,j=−
∂Ep∂zp,j
(2.8)
the corresponding change in the weight wi,j with respect to the error Ep becomes
∆pwj,i=η*dp,j*op,j (2.9)
where η is a parameter describing the learning rate. The speed and accuracy of the learning process during the iterations to update the weights also depend on the learning rateη. A low learning rate can guarantee more stable convergence, but a high learning rate can accelerate convergence in several cases.
The next step in the BP algorithm is to calculate the dp,j for each node in the net. The equation (1.8) can be rewritten as:
21
dp,j=−[
∂Ep∂op,j
]*[∂op,j∂zp,j
] (2.10)
To compute the first partial derivative there are two cases to examine:
a. First Case
Assume that j is an output node of the net; then, from equation (1.5) it
follows that:
∂Ep∂op,j
=2(tp,j−op,j) (2.11)
substitute equation (1.11) to equation (1.8) becomes:
dp,j=2(tp,j−op,j)*f(zp,j) (2.12)
b. Second Case
Assume that is not anjoutput node of the net, then applying again the
chain rule, we obtain:
∂Ep∂op,j
=∑[
k
∂Ep
∂zp,k∂op,j∂Ep∂zp,k
][i
][∂zp,k
]=
=∑[
k
∂(∑wk,iop,i)
∂op,j
]=
(2.13)
=∑[
kk
∂Ep∂zp,k
]wk,j
=∑dp,kwk,j
Combining now the above cases, we form an iterated process for
calculating the signal error dp,j for all nodes in the net. These errors can then be used to update its weights.
As BP uses a gradient descent method, the corresponding net tracks the
contour of an error surface with weight updates moving in the direction of the steepest descent. Assuming a plain net without hidden layers, it is easy to minimize the error using gradient descent because the error surface is bowlshaped (Figure 13). The net will always locate an optimal solution at the base of the bowl. Such optimal solutions are
22
called global minima. On the other hand, cases which are more complex require the existence of an extra hidden layer to carry out the solution to such difficult problems. Here, error surfaces become also complex, containing possibly many minima. Because some minima are deeper than others, it is possible that gradient descent will not locate a global minimum, and the network may be trapped in a local minimum, which is a suboptimal solution.
It is clear that we want to avoid local minima while training a BP net.
Although in some case this may be difficult to do, in practice it is essential to try to find how often and under what circumstances local minima occur. Moreover we have to study possible approaches for avoiding them. It is known in the ANNs theory that the more hidden layers you have in a net, the less possible you meet a local minimum during training. Although additional hidden nodes amplify the complexity of the error surface, the extra dimensionality increases the number of possible flee paths.
The BP algorithm analyzed in this chapter only involves only weight
changes that are proportional to the derivative of the error. As we mentioned before at equation (1.9), the increment of the learning rate contributes η to an increment of the weight changes on each iteration, and the faster the net learns (although the magnitude of the learning rate can also control whether the net reaches a stable solution). If the learning rate gets too large, then the weight changes no longer approximate a gradient descent procedure and that often results in oscillation of the weights. Obviously, we want to get the largest learning rate without causing oscillation, achieving the best learning speed while minimizing the training time for the net. One technique that has been used is a small modification of the BP algorithm that contains a momentum term.
The idea of momentum is that previous changes in the weights should
control the current direction of movement in the weight space. This idea is implemented by the adjusted weight update rule:
∆wj,i(η+1)=ε*dp,jap,j+α∆wj,i(η) (2.14)
where η is the learning rate. With momentum, if the weights begin to move in a specific
23
direction in their space, they tend to keep on moving in that direction. Momentum can aid the net to overcome a local minimum, in addition to speeding learning, particularly along extensive flat error surfaces.
In Clementine, the default learning rate is 0.25 and the default momentum
parameter is 0.9. When using BP for a series of problems, much smaller values than these are often used. For especially complex problems, a learning rate of 0.01 is very common.
6. Efficient Algorithms
Selecting the proper value for the learning rate η is not an easy concept. If η has a small value, then the learning procedure will be too slow. On the other hand, if η is assigned a large value, the learning procedure may diverge. An acceptable value of η can be established by trial and error, but this is a quite boring and wasteful process (Bishop, C., 1995). In order to deal with this problem, a large range of efficient learning methods has been developed. Here, we briefly present the most essential theoretical ideas underlying them. One of the most basic ideas for speeding up the learning procedure is to use the second-order information regarding the error function. Assuming a quadratic error in one dimension, the best learning rate is the inverse of the second order derivative (Figure 14), which can aid in designing capable learning methods.
24
Figure 14. Learning Rate effect on Gradient Descent (From: Fausett, L., 1994)
If it is possible to calculate this second-order derivative, then it is feasible to achieve a good learning rate. Unhappily, the error function might not be quadratic at all. Therefore, setting the learning coefficient to the inverse of the second-order derivative only works near the optimum, in regions where the quadratic approximation is valid. But if the second-order derivative is negative, this does not work. For handling such circumstances, particular care must be taken. Moreover, a few other issues come up when the dimension of the weight space is bigger than one, which is the most common case in practice. The second order derivative is not a single number anymore, but a matrix called the Hessian and defined as:
25
⎛∂2E
2⎜⎜∂w1⎜2
∂E⎜⎜∂w2∂w1
∂2E⎜H=2=⎜.
∂w⎜
.⎜⎜.⎜2
∂E⎜⎜∂w∂w⎝n1
∂2E∂w1∂w2∂2E2∂w2
...∂2E∂wn∂w2
∂2E⎞⎟...
∂w1∂wn⎟
⎟∂2E⎟...
∂w2∂wn⎟
⎟
..⎟ (2.15)
⎟
..⎟
⎟..⎟∂2E⎟
...2⎟
∂wn⎠
Sometimes it is probable to have dissimilar curvatures in different directions (Figure 15). This can generate a major problem if there is, for instance, a second derivative of one hundred (100) in one direction, and a second derivative of one (1) in another. In this case, the learning coefficient must be less than 0.002 in order to avoid divergence. This means that convergence will be very slow in the direction where the second derivative is one (1). This phenomenon is called ‘ill conditioning.’ Efficient algorithms frequently try to change the weight space in order to have the same curvatures in all directions. This has to be done cautiously so that instances where the curvature is negative work as well. Some of the best algorithms are QuickProp, Levenberg Marquardt, and Conjugate Gradient.
26
Figure 15. Ill Conditioning. (From: Bishop, C., 1995)
27
7. Batch Vs Incremental Learning
In unsupervised learning, as described earlier, the error function is frequently defined as a sum of error terms over a finite amount of training samples that consist of pair vectors (input, output). Again, the error function is:
pi
with
E=∑Ei (2.16)
21
Ei=(fw(xi)−yi) (2.17)
2
Applying the steepest descent on E is known as “batch learning” (BL) for the reason that the gradient of the error has to be calculated on the full training set previous to weights modification. A different method to modify weights aiming to minimize E is “incremental learning” (IL). Within this method, the gradient descent steps are applied on Ei instead of E. Some other known names for IL are online or stochastic learning. Now the obvious question that arises is which of these methods is the best. The answer is not so simple and depends always on the particular problem to be solved. Here are a few of the tips to consider (Simpson P., 1996):
a.
Advantages of Incremental Learning (IL)
•
IL is usually faster, particularly when the training set is redundant. In situations where the training set has input and output patterns that are similar, BL wastes time calculating and adding similar gradients before setting one weight update.
IL often results in better outcomes. This happens because the randomness of IL generates noise in the weight updates. This noise aids weights to jump out of bad local optima.
IL is capable of tracking changes. As an example, consider a learning model of the dynamics of a mechanical system. While this system gets older, its properties might slowly evolve and IL can track this type of drift. b.
Advantages of Batch Learning (BL)
•
•
•
In IL, noise causes the weights to continuously oscillate around a local optimum, and they never converge to a constant stable value. This is not the case in BL, making it easier to analyze.
28
• •
Various acceleration methods can operate only in BL, such as some of the algorithms mentioned earlier (QuickProp, Conjugate Gradient).
Another benefit related to the absence of noise in BL is that the theoretical analysis of the weight dynamics and convergence rates are very simple.
29
THIS PAGE INTENTIONALLY LEFT BLANK
30
III. DATA DESCRIPTION AND METHODOLOGY
Vibration analysis is among the most powerful tools available for the detection and isolation of incipient faults in mechanical systems. Among the methods of vibration analysis in use today and under continuous study are broadband vibration monitoring, time-domain analysis, and frequency analysis. All have varying degrees of utility in machinery condition monitoring and diagnostics and all have characteristics that lend themselves particularly well to specific applications. Since the effectiveness of ANN is directly related to how effectively the chosen inputs define a particular decision space, the selection of the optimum vibration parameters for inputs to the ANNs is critical. Thus, a good understanding of elementary machinery diagnostics techniques is essential. A. SOURCES OF VIBRATION
In mechanical systems, any mechanical component which periodically comes in contact with a second component to transmit an axial, radial, or torsional load is a potential source of mechanical vibration. In machines with a gear train, the principal components involved with load transfer will be its torsional power source, such as a motor; the gear meshes; the bearings; and those items that interconnect them, the shafts. Additionally, because vibrational isolation is seldom complete, additional extraneous sources of vibration will also be present. The diagnostician is generally interested in extracting the vibrations created by specific machinery components and ignoring the other sources as extraneous noise. In this study, we are particularly interested in the vibrations generated by the rotating machinery’s gears, bearings, and shafts. As such, the discussion will be limited to these sources of vibration.
1.
Gear Vibration
In a gear train, the gear mesh is the dominating source of mechanical vibration. This vibration primarily stems from the non-uniformity in the transmission of angular motion from one gear to its mate. The non-uniformity of the angular motion occurs due to geometric deviations of the contact surfaces from the ideal involutes shape and the elastic deformation that any mechanical system undergoes when transmitting a load (Mark, W.D., 1998). Moreover, torque fluctuations and deflections of the gearbox can also be
31
sources of vibration in gears. Clearly, any damage that occurs to the gear contact surface, as well as other mechanical linkages to the gear mesh, will also have an effect on the gear’s vibration (Mattew, J., and Alfredson, R.J., 1987).
2. Bearings
Bearing vibrations occur for much the same reasons as gears. However, because bearings are not situated directly along the power transmission train and support largely static loads, they characteristically generate a small vibration signal until the damage inflicted upon them reaches advanced stages. Because of the low magnitude of these signals, they are often masked by much stronger gear-related signals. Partially because of this belated detection of trouble, antifriction bearings are among the most common causes of machinery failure in moderately sized machines. The frequencies associated with the bearing-related signals generally depend on the location of the damage, the dimensions of the bearings, and the shaft rotation speed.
3. Shafts
Shafts generally produce vibration signals at their rotational frequency and its harmonics. Shafts are also prone to a number of different faults, all of which register at the shaft rotative frequency. In the case of bent shafts and shafts misalignments, the second harmonic is the dominant frequency in 90 percent of the cases (Collacott, R.A., 1979). Imbalances in the shaft or load characteristically generate a dominant signal at the shaft rotative frequency but there tends to be a phase shift as well. Mechanical looseness can also introduce increases in the shaft rotational frequency but also characteristically involves higher harmonics as well (Hewlett Packard, 1983). B. DATA COLLECTION
The database used in this research was provided by Goodrich Corporation and collected through the IMD-HUMS installed on 30 UH-60L helicopters. The period of data collection starts at 9/22/2003 and stops at 7/31/2005. The total number of observations utilized was 36,742 and each observation consists of 169 fields. For the database’s structure and reference ease, each generator was assigned a number starting from 1 up to 66. Appendix A summarizes the allocation of generators among the helicopters and the number of the recorded observations for each generator.
32
C. SELECTING VARIABLES
As in any prediction or forecasting model, the selection of appropriate model inputs is extremely important. However, in most ANN applications, less attention is given to this task. The main reason for this is that ANNs belong to the class of data-driven approaches, whereas conventional statistical methods are model-driven. In the latter, the structure of the model has to be determined first, which is done with the aid of empirical or analytical approaches, before the unknown model parameters can be estimated. Data-driven approaches, on the other hand, have the ability to determine which model inputs are critical, so there is less need for “...a priori rationalization about relationships between variables...” However, presenting a large number of inputs to ANN models and relying on the network to determine the critical model inputs usually increases network size. This has a number of disadvantages, such as decreasing processing speed; increasing the amount of data required to estimate the connection weights efficiently, and degrading the AAN performance. This is particularly true for complex problems, where the number of potential inputs is large and where no a priori knowledge is available to suggest which inputs to include.
1. Input Vector
According the vibration theory mentioned before and after an assiduous study of the entire data recorded by the IMD-HUMS, the variables which potentially could form a well-informed input vector—all of them or subsets—to the training process of this research model, are shown and briefly explained below (Goodrich Corporation, 1998):
a.
Torque
Torque is a measure of how much force acting on an object causes that
object to rotate.
b.
SO_1 (Shaft Order 1)
SO_1 is the once-per-revolution energy in the signal average and is used
to detect shaft imbalance.
c.
SO_2 (Shaft Order 2)
SO_2 is the twice-per-revolution energy in the signal average and is used
to detect shaft misalignment.
33
d. SO_3
SO_3 is the thrice-per-revolution energy in the signal average and is used
to detect shaft disparity.
e.
Signal Average RMS
Frequencies that are integer multiples of the basic shaft frequency will be
enhanced by the averaging process and other frequencies will be relatively attenuated. Depending on the mechanical environment, the signal average generally requires about 100 revolutions to converge to a usable waveform.
f.
Residual Kurtosis
The residual analysis first removes all the strong tones from the signal
average to produce a residual signal so as to minimize the interference of these strong tones. Here the process handles kurtosis, which measures the thickness of the tails of the distribution of bearing vibrations after the background signal has been removed (Harris, 2002).
g.
Residual RMS
The residual process deals with the Root Mean Square, which is the
overall energy level of the vibration data.
h.
Side Band Modulation_1
This analysis is designed to reveal any sideband activities that may be the
result of certain gear faults, such as eccentricity, misalignment, or looseness. The indicator characterizes the degree of sideband modulation for the first sideband.
i.
Gear Distributed Fault
This attribute is an effective detector for distributed gear faults, like wear
and multiple tooth cracks. It is a dimensionless measurement calculated from the ratio of explained and unexplained variances of a vibration generated at the meshing of gears.
j.
G2_1
G2_1 is an algorithm developed by Goodrich Corporation to compute the
ratio of the signal average peak to peak and the gear meshing energies.
k.
Residual Peak to Peak
The residual process deals with the algebraic difference between the
extremes of the vibration quantity.
34
l. Gear Misalignment_1
Gear Misalignment_1 is a dimensionless measurement resulting from the
ratio of the energies of the vibrations produced when gears mesh (Harris, 2002).
m.
Ball Energy
Ball Energy is the total energy associated with the bearing ball spin defect
frequency and its harmonics.
n.
Cage Energy
Cage Energy is the total energy associated with the bearing cage defect
frequency and its harmonics. Usually it is detectable only at the later stage of a bearing failure, but some studies show that this indicator may increase before the others.
o.
Inner Race Energy
Inner Race Energy is the total energy associated with the bearing inner
race defect frequency and its harmonics.
p.
Outer Race Energy
Outer Race Energy is the total energy associated with the bearing outer
race defect frequency and its harmonics.
q.
Envelope RMS
The main purpose of envelope analysis is to sum and normalize the six
multiples of frequencies above or below the Root Mean Square value of the vibration.
Table 1 enumerates the above potential predictors for later reference purposes.
Predictors
1 Torque 2
SO1
10
G2_1
Gear Misalignment_1 Ball Energy Cage Energy Inner Race Energy Inner Race Energy Envelope RMS
11Residual Peak to Peak
3 SO2 124 SO3 135 6 7 9
Signal Average RMS Residual Kurtosis Residual RMS Gear Distributed Fault
141516
8 Side Band Modulation_117
Table 1.
Potential Model Predictors
35
2. Output Vector
In this research’s models, the output vector consists only of one variable, which is the operating condition of the generator. Each generator is assigned a binary value of 1 or 0 to classify its known state. The value of 1 corresponds to generators removed for fault while the value of 0 corresponds to good generators. The fact that each generator is assigned a state of 0 (good) or 1 (bad) does not mean these generators are actually in the assigned state. The given state of 0 (good) or 1 (bad) is based only upon whether a generator was removed for fault or not, according the maintenance records. A generator with a hidden fault would be assigned a state of 0 (good). Similarly, a generator which was taken out for a malfunction and given a state of 1 (bad) could have been mechanically good (Willard, L., Klesch, G., 2005). Unlikely the previous work of Willard and Klesch (2005), the entire history of “good” and “bad” generators is used here. Thus most “bad” generators should have an initial period of “good” following by “bad” as they fail.
The following table shows the generators that were confirmed as bad, either after maintenance or based on IMD-HUMS data, and assigned the value of 1 (Table 2). The failure of two of the generators, numbers 9 and 33, were detected during operation by a generator warning light. Faults in the remaining four generators, numbers 22, 31, 53, and 56, did not trigger the generator warning light. However each of the four generators had unusually high SO1 readings upon removal. Three of these generators, numbers 22, 31, and 53, showed evidence of fault or wear. The removal of generator number 56 resulted from the case of an identifiable buzz (Willard, L., Klesch, G., 2005).
36
Bad Generators—Reasons for Replacement
Generator Comments 9 22 31
Generator failed during shutdown upon APU generator coming on during start. SO1 near 2 ips. After Spline Adapter Coupler replacement, SO1 returned to 0.05 ips. SO1 at 3 ips. After Spline Adapter Coupler replacement, SO1 remained high and so generator was replaced
33 Generator bad 53 56 Note:
SO1 at 3 ips. After Spline Adapter Coupler replacement SO1 returned below 0.05 ips. SO1 over 4 ips. Generator and Spline Adapter Coupler were replaced.
Replacement of generators 9 & 22 were made due to maintenance records while the rest due
to Johnny Wright and Ground Station Team, IMD-HUMS Fault Detections, Goodrich Corporation. Draft 5/25/2005 (Ver 117)
Table 2.
D.
Bad Generators—Reasons for Replacement
DATA PREPROCESSING
Data preprocessing is frequently used to analyze and transform the input and output variables to minimize noise, emphasize essential relationships, identify trends, and flatten the distribution of the variables to aid the ANN in learning the relevant patterns. Since ANNs are pattern matchers, the representation of the data is important in designing a successful network. In most datasets there is a large variability in the scale of range fields. To balance this effect of scale, range fields are transformed so that they all have the same scale. In Clementine, range fields are rescaled to have values between 0 and 1. The transformation used is:
xi'=
xi−xmin
(2.18)
xmax−xmin
where xi' is the rescaled value of input field x for record i, xiis the original value of x for record i, xmin is the minimum value of x for all records, and xmax is the maximum value of x for all records.
An additional problem for network representation of the utilized database was that, out of 36,742 total observations, only 1,477 cases referred to the bad generators. This fact directly affects the learning procedure of the network by creating a tendency to predict only good generators. For example, Table 3 gives results for a AAN using the
37
default settings in Clementine where the classification is perfect for “good” generators while the classification for the “bad” generators is that actually they are “good” too. Changing the ANN architecture consistently gives similar results; most observations are predicted as “good.” To correct this situation, the records of bad generators were replicated so that the ratio of good to bad generators was close to 1. We can consider this scheme as a weighting technique to emphasize the input vector of the bad generators and, concurrently, the information they might convey. The multiplication factors differed for each generator. They chose so that each bad generator had about the same number of total observations and so that the number of bad generators was about equal to the number of good generators. Table 4 describes the latter procedure of data manipulation, and Figure 16 presents the setup of Clementine to create the new normalized database.
ORIGINAL
SET
ALL DATA W/O GEN
TRAINING ACCURACY
PREDICTION ACCURACY BAD GOOD 9 98.479 % 22 98.173 % 36742 observations
31 97.111 % 0 %
33 97.657 % 53 97.151 % 56 96.928 %
Table 3.
38
Training Set using only Original Data
100 %
Generator ObservationsFactorResultTotal Ratio 9 435 13 5655 22 336 16 5376 Bad Gen 31 302 18 5436 32737
33 245 22 5390 53 61 55 5390 56 98 90 5490 0.93
Good Gen All 35265Data Multiplication of Bad Observations
Table 4.
Figure 16. Clementine Preprocessing Data
39
E. DATA SETS
In ANNs, it is a common practice to partition the database into three separate sets called the training, test, and validation sets.
1. Training Set
The training set is the largest set and is used by the ANN to learn the patterns that exist in the data. The training set used for this research consists of the observations of five (5) bad and fifty-six (56) good generators, for a total of about 60,000 records.
2. Test Sets
The test sets, varying in size from 10% to 30% of the training set, are used to evaluate the ability of a trained ANN to generalize to a new set of data. The researcher fits the parameters and the topology of the network that achieves the best results on a test set. Using the features of the software, the size of a test set was always taken to be at 25% of the training set. Periodically during the process of training, new test sets are selected from the training set. Although this sample was selected randomly from the training set, the same seed of (12345) was used, in order to duplicate results for different model fits. Figure 17 shows the format chosen for the training and testing sets.
Figure 17. Model Architecture GUI
40
For the purposes of training, each record is considered as an observation. Thus it is likely that the test set contains records from each of the 61 generators.
3. Validation Set
The validation set is used last as a final check on the performance of the trained network. The size of the validation set should keep a balance between obtaining a sufficient sample size to evaluate a trained network and having enough remaining observations for both training and testing. The validation set consists of one (1) bad and four (4) good generators. The records from these five generators are not used in training the network. They are only used for validation. F.
NETWORK ARCHITECTURE AND EVALUATION CRITERIA
As a baseline, this analysis uses a model architecture consisting of one input layer containing the predictors for the model, one output layer giving the estimate of the probability that the record is bad, and one hidden layer. Clementine provides six training methods for building ANN models but for this research utilized only the following two: (Figure 17)
•
Prune: This method starts with a large network and removes (prunes) the
weakest units in the hidden and input layers as training proceeds. This method is usually slow, but it often yields better results than other methods.
RBFN: The radial basis function network uses a technique similar to k-means clustering to partition the data based on values of the target field.
•
Clementine incorporates several features to avoid some of the common pitfalls of ANNs, including sensitivity analysis, network accuracy, and feedback graph. With these options selected, a sensitivity analysis will provide information on which input fields are most important in predicting the output field, network accuracy will provide the percentage of records for which the prediction of the model matches the observed value in the data, and the feedback graph will depict the accuracy of the network over time as it learns (Clementine 10.0 Node Reference, 1999). Moreover, a Confidence level ($N-Binary) is provided for each observation after the prediction, which, in this model with a flag (0 or 1) output, is calculated by using the formula:
$N-Binary=2|0.5−RawOutput | (2.19)
where the RawOutput is the output unit value scaled so that it is between o and 1.
41
If the output unit value is below 0.5, the observation is predicted as 0 (false), and if it is 0.5 or above, the observation is predicted as 1 (true). For example, if the ANN prediction value is 0.72, the prediction is displayed as “true,” and the confidence will be
20.5−0.72=0.44. A part of some output results is presented in Figure 18.
Figure 18. Clementine Prediction Table
42
IV. RESULTS AND DISCUSSION
A.
MODEL WITH ALL PREDICTORS
The first attempt of this experiment was to predict the state of one (1) bad and four (4) good generators. The good generators were chosen randomly and form a constant group for the rest of the experiment. The bad generators were sequentially chosen, one at a time, in order to observe how the model reacts and how this might affect the learning procedure. Initially, all the predictors were used to form the input vector for the training procedure of the network (Table 1).
Appendix B provides the structure of the models in Tables 4, 5 and 6 along with their sensitivity tables. The training sets used for those models consist of the observations of five (5) bad and fifty-six (56) good generators. The size of the test sets was always taken to be at 25% of the training set while the validation set consists of one (1) bad and four (4) good generators.
It is obvious from Table 4 that all the models behave well through the learning process and their training accuracy, computed on the test set taken from the training set, is very high (above 99%). Moreover, the prediction accuracy of the models, for the group of the good generators in the validation set, is also high (above 86%). On the other hand, it is noticeable that four (4) models could not predict the bad generator in the validation set at all, or to be more specific, they predicted that the generator was good. These models are even unable to predict generators 9 and 33, which are certainly bad because they had been replaced due to total failure and not preventively. The most likely explanation is that the patterns for the bad generators are too close to those to the good ones. It is also likely that the patterns of predictors for bad generators differ among the bad generators. There are a variety of potential causes for these differences. It is plausible that variability among aircrafts, among generators, or even among placement of IMD-HUMS acidometers could cause differences in readings from generator to generator. We also expect that bad generators with different failure causes will have different vibration patterns. Projection pursuit implemented by the statistical software Ggobi, used to gain a
43
visual perspective of the relationship among the variables. Ggobi plots two-dimensional projections of multi-dimensional data and Figure 19 depicts some of the data leading to the assumption of a unique failure type for each “bad” generator.
Figure 19. Ggobi screen for “bad” generators
Attempting to explore and understand further the models’ behavior, we tested five (5) new models to predict the condition of the two generators, 31 and 53, which were well predicted from models 3 and 5 of Table 5. Each model includes the whole database
44
excluding those data of the validation set which are the “bad” generators in question and the “good” generators 4, 26, 42 and 66. For these models, we exclude from the training set the data of the pair referenced in table.
The prediction accuracy for generator 53 drops from 78.69 % in the single model 5, to a range of 77.05 % - 59.02 % in the paired models. This seems to indicate that all the bad generators contribute to the prediction (Table 6).
In contrast, for generator 31, the accuracy of models 8 and 9 drops less than 10% and for model 10 is almost 0%. This might indicate that generator 31 is more closely related to those generators that form the corresponding pairs (Table 7). Once again, we notice that the previous assumptions of close bad and good generator patterns and the possibility of unique reasons for generator failures are valid. Additionally, we observe that any change made on the models didn’t affect the accuracy of good generators.
45
MODEL VALIDATION SET TRAINING ACCURACY (%) PREDICTION ACCURACY (%)
0
BAD 9 4 90.73 1
GOOD
42 100 66
BAD 22 100 0
99.913 26 96.85 4 100 2
GOOD
42 100 66
BAD 31 100 .74
99.721 26 95.40 4 100 3
GOOD
42 100 66
BAD 33 100 0
99.708 26 95.66 4 100 4
GOOD
42 100 66
BAD 53 100 78.69
99.821 26 99.74 4 100 5
GOOD
42 100 66
BAD 56 100 0
99.448 26 99.77 4 100 6
GOOD
42 100 66
100
99.376 26 86.20
Table 5.
46
Models Predicting Single Generator
MODEL VALIDATION SET TRAINING ACCURACY (%) FIRST PREDICTION (%) SECOND PREDICTION (%)
53 78.69 77.05 BAD
9 99.31 0 4 100 100 .976
26 97.77 100 GOOD
42 100 100 66
100 100 7
53 78.69 72.13 BAD
22 100 0 4 100 100 99.707
26 97.77 99.47 GOOD
42 100 100 66
100 100 8
53 78.69 59.02 BAD
31 100 0.33 4 100 100 99.478
26 97.77 95.66 GOOD
42 100 100 66
100 100 9
53 78.69 78.69 BAD
33 99.18 0 4 100 100 99.728
26 97.77 99.87 GOOD
42 100 100 66
100 100 10
53 78.69 70.49 BAD
56 100 6.12 4 100 100 99.0
26 97.77 98.69 GOOD
42 100 100 66
100 100 11
Table 6.
47
Models Predicting Pair of Generators Including 53
MODEL VALIDATION SET TRAINING ACCURACY (%) FIRST PREDICTION (%) SECOND PREDICTION (%)
31 .74 97.68 BAD
9 99.54 0 4 100 100 99.830
26 95.66 96.85 GOOD
42 100 100 66
100 100 12
31 .74 7.62 BAD
22 100 0 4 100 100 99.795
26 95.66 96.98 GOOD
42 100 100 66
100 100 13
31 .74 1.99 BAD
33 100 0 4 100 100 99.780
26 95.66 99.87 GOOD
42 100 10 66
100 100 14
31 .74 0.33 BAD
53 100 59.02 4 100 100 99.478
26 95.66 95.66 GOOD
42 100 100 66
100 100 15
31 .74 82.12 BAD
56 100 24.49 4 100 100 99.509
26 95.66 98.29 GOOD
42 100 100 66
100 100 16
Table 7.
48
Models Predicting Pair of Generators Including 31
B. ARTIFICIAL TRAINING SETS
Leaving out certain bad generators degrades the performance of the network when fitting the ANN. If including patterns of predictors for those bad generators is important for classification, then there may be other patterns of predictors for bad generators that have not yet been observed and are not included in the data set.
In this section an attempt is made to predict patterns of bad generators not included in the training set. First assume that a “bad” generator is any pattern which is not like the “good” generators in the data set. With the large number of good generators in the data set and the fact that the predictors of the good generators seem to clump together in two dimensions, this seems like a reasonable assumption. The ANN classifies as “bad” any pattern which is not like the good generators of the training set. To accomplish this, artificial records of “bad” generators are constructed and included in the training set. These artificial records are constructed by simulating values of their prediction variables using uniform distributions with lower and higher limits taken to be respectively the minimum and maximum value for that predicted variable. Table 8 gives these values for each predictor.
Uniform random numbers were generated in Excel. With Excel’s limited memory, only 65,000 uniform random numbers could be generated at one time. With seventeen (17) predictors, limiting the number of these artificial records to 65,000 makes for a sparse set of “not good” records.
49
Table 8.
Summary Statistics for each Predictor variable including the Minimum,
Maximum, Average and Standard Deviation.
The following table (Table 9) presents the results of forecasting generator 9 using the above idea of training a uniformly distributed artificial set, combined along with the rest of generators regardless of their condition. We can see that propagating the artificial training set, even up to 1.3 million (1.3M) observations, the model still has no ability to predict the real condition of the bad generator in question. It does while still having excellent prediction accuracy on the group of good generators. The model starts to have encouraging prediction accuracy (11.72 %) after the artificial set reaches 2.6M observations but, at this point, it becomes less effective on the group of good generators. As the artificial set reaches a size of 2.75 M observations, it has excellent prediction accuracy (100 %) on the generator in question but now the predictions for the good group become very poor. This suggests once again that the patterns of good and bad generators are closely related. We can also conjecture that, because of the large number of predictor variables, we are seeing the so called “curse of dimensionality.” This means that the
50
complexity of the model grows exponentially with the dimension, rapidly outstripping the computational and memory storage capabilities of computers. For this data training more than 2.75M observations become infeasible.
The same method is applied to generator 33. The results are analogous and summarized in Table 10, while the structures of the networks are provided in Appendix C.
ARTIFICIAL
SET
ALL DATA W/O GEN
MODEL ACCURACY
PREDICTION ACCURACY BAD GOOD 65,000 93.676 % 0% Excellent 650,000 98.140 % 0% 1.30 M 2.60 M 2.75 M
Table 9.
ARTIFICIAL
SET
ALL DATA W/O GEN
MODEL ACCURACY
PREDICTION ACCURACY FOR BAD
FOR GOOD
Artificial Training Set to Predict Gen 9 9
99.101 % 99.137 %
0% 11.72%
Average
99.243 100% Poor 65,000 91.983% 0% 650,000 98.053 % 0% 1.30 M 2.60 M 2.75 M
Table 10. Artificial Training Set to Predict Gen 33
51
33
98.915 % 99.209 % 99.537 %
0% 0.41% 0.40%
Excellent
C. STEPWISE PREDICTORS USAGE
A simpler, but sometimes very effective, way of dealing with high-dimensional data is to reduce the number of dimensions. At this phase of the research we pursued a stepwise approach, similar to that of forward selection used for regression models. First were trained separate networks for each input variable. The network achieving the best training accuracy is then preserved. The effect of adding each of the rest of the inputs to this model in sequence is evaluated. This procedure is repeated for one, two, three, etc., predictors until the addition of extra predictors does not result in a significant improvement in model performance. Like any process, this approach has several disadvantages. The biggest are that it requires substantial computation and that it is also unable to capture the importance of certain combinations of predictors that might be insignificant on their own.
This phase of the research was the most time-consuming and many single and combined predictors were evaluated, aiming to find the model that better classified all the generators of the validation set. In the following table (Table 11), for space and time saving reasons, we present only one model as representative of those with the best ability to classify correctly.
PREDICTION ACCURACY
MODEL TO Predict Gen9 Predict Gen22 Predict Gen31 Predict Gen33
PREDICTORS
TRAINING ACCURACY
BAD
GOOD (4,26,42,66) > 85 %
ResidualRMS SideBandMod1 ResidualPeakToPeak
BallEnergy InnerRaceEnergy
86.029 % 39.08%
84.036 % 26.49 % > 83 %
87.345 % 5.36 % > 72 %
85.377 % 40.82 % > 90 %
52
PREDICTION ACCURACY
MODEL TO Predict Gen53 Predict Gen56
PREDICTORS
TRAINING ACCURACY
BAD
GOOD (4,26,42,66) > 73 %
87.321 % 36.07 %
86.386% 2.04 % > 69 %
Table 11. Stepwise Good Generated Model
This last phase comes to verify that no matter which strategy was applied, the patterns of good and bad generators are so closely related that no model of the structure that we trained could distinguish the different pre-assigned condition of those generators. The time that the one model was a good classifier for the bad generator, it failed to predict the good ones and via versa; if the model performed well on the good generators, it failed to recognize the bad ones. The model summarized in Table 11 achieved the best overall performance for the database of this research. Even though the overall prediction accuracy appears low for the records of “bad” generators, the pattern of predicted
ˆ, as a function of time is very different for “bad” probability that a record is “bad”p
ˆ generators than for “good” generators. As an example, Figures 20 and 21 give plots of p
versus time for a “bad” and “good” generator respectively. The “good” generator has
ˆ close to zero, where as the “bad” generator pˆ is vary considerably and show value of p
ˆversus time for the rest of the an increasing trend in time. Appendix D contains plots of p
generators in the validation set and sensitivity analysis of this model.
53
ˆ from “bad” Generator 9 Figure 20. p
ˆ from “good” Generator 66 Figure 21. p
54
V. CONCLUSIONS AND RECOMMENDATIONS
The emphasis in this thesis was to develop an ANN that would utilize the collected data from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that would predict a potential failure of a UH-60L helicopter generator. Many different ANNs were evaluated for their success rate on this faulting diagnosis.
The first models that this research shaped were trained, tested, and evaluated using only the data collected by IMD-HUMS. The whole database was normalized and populated accordingly. One (1) bad and four (4) good generators, which were left out of the training and test sets, were used for validation purposes for those models. The method was applied sequentially to each bad generator, always maintaining the same group of good generators. Two of the six models were considered as good classifiers: with accuracy above 78.69% and .74%, respectively. The other four models had zero ability to classify the bad generators, although they all predicted very well the group of good generators.
The next phase of the experiment was to generate a uniformly distributed artificial data set, using it along with the original database to form a training/testing set for further classifications. Artificial sets up to 2.6M were created in an effort to capture the bad generator patterns. Because the learning phase was time -and resource- consuming, the researcher was limited to testing two generators, the ones originally replaced for failure (9 and 33). One model started to show some promising results by the time the researcher reached the computer’s limits. Obviously, the “curse of dimensionality” comes into play at this point of this research.
During the last portion of this experiment, the researcher tried to deal with the multi-dimensional problem and, at the same time, shape a model with good generalization behavior. A stepwise strategy was followed and many models were trained with various combinations of predictors. Once again, the produced classifiers were not generally successful in generator prediction.
Concluding, the researcher realized that in each phase of this experiment, the bad and good generators patterns are very closely related. This affects the learning procedure
55
of the network by blocking its ability to build a model capable of classifying the good and bad generators concurrently. Additionally, it is possible that the reason of failure for each bad generator is unique, so that the size of this database and the structure of those models are not capable of capturing those patterns in a generalized form. On the other hand, this research exploits many paths, identifies various issues about the classification of the UH-60L helicopter generators, and finally comes up with models capable of classifying a big portion of the database in question.
ANNs are a category of artificial intelligence technology that mimics the human brain’s skill at identifying patterns. In theory, ANNs are capable of approximating any continuous function. Such flexibility makes for a potentially powerful forecasting tool but the big number of parameters that must be selected makes the design process difficult. In practice, building an ANN forecasting model involves a lot of trial and error. Consequently, the objective of this thesis was to provide a practical, non-technical introduction to structure an ANN forecasting model using real operating data of UH-60L helicopters. The success of ANN applications for an individual researcher depends on three key factors. First, the researcher must have the time, patience, and resources to experiment. Second, the ANN software must allow automated routines, such as walk-forward testing, optimization of hidden neurons, and testing of input variable combinations—either through direct programming or the use of batch/script files. Third, the researcher must maintain a good set of records that lists all parameters for each network tested.
This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However, the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.
Further future work can be conducted on building models with more than one hidden layer which can explain more complex data, but which requires experience in
56
manipulating the parameters of the utilized software and good a priori knowledge of the database itself. Furthermore, enhancing the current database with new real data, accompanied with proper maintenance records, will benefit and improve any future effort on predicting the generators’ conditions, regardless of the techniques that will be used by the researcher.
57
THIS PAGE INTENTIONALLY LEFT BLANK
58
APPENDIX A
TAIL #GENERATOR OBSERVATIONS FINAL CONDITION 9126351 92232 92235 92238 92239 92241 92243 92246
1 526 35 526 2 980 36 980 3 313 37 313 4 439 38 439 5 282 39 282 6 651 40 651 7 476 41 476 8 7 42 7 9
92250
435
Good
Bad
10 488 43 923 11 825 44 825 12 545 45 545 13 917 46 917 14 617 47 617 15 487 48 487 16 873 49 873 92253 92255 93277 93285 93293
Good
9326500
59
TAIL #GENERATOROBSERVATIONSFINAL CONDITION 9326506
17 58850 58326507
18 42351 4239326509
19 552 520 326515
53
61
Bad Good
54 5879326516
21 55355 55322
9326518
336
Bad
Good Bad Good
23 12 56
98
57 2509326519
24 62158 6219326524
25 45759 4579426530
26 76160 7619426533
27 107761 10779426534
28 29862 29426537
29 110263 11029426545
30 254 25431
9426549
302
Bad Good Bad Good Good
32 28765 533
245
9926829
34 1166 256
60
APPENDIX B
Figure 22. Model 1 (Predict Bad 09 and Good 4, 26, 42, 66)
Figure 23. Model 2 (Predict Bad 22 and Good 4, 26, 42, 66)
61
Figure 24. Model 3 (Predict Bad 31 and Good 4, 26, 42, 66)
Figure 25. Model 4 (Predict Bad 33 and Good 4, 26, 42, 66)
62
Figure 26. Model 5 (Predict Bad 53 and Good 4, 26, 42, 66)
Figure 27. Model 6 (Predict Bad 56 and Good 4, 26, 42, 66)
63
Figure 28. Model 7 (Predict Bad 53, 9 and Good 4, 26, 42, 66)
Figure 29. Model 8 (Predict Bad 53, 22 and Good 4, 26, 42, 66)
Figure 30. Model 9 (Predict Bad 53, 31 and Good 4, 26, 42, 66)
Figure 31. Model 10 (Predict Bad 53, 33 and Good 4, 26, 42, 66)
65
Figure 32. Model 11 (Predict Bad 53, 56 and Good 4, 26, 42, 66)
Figure 33. Model 12 (Predict Bad 31, 9 and Good 4 ,26, 42, 66)
66
Figure 34. Model 13 (Predict Bad 31, 22 and Good 4, 26, 42, 66)
Figure 35. Model 14 (Predict Bad 31, 33 and Good 4, 26, 42, 66)
67
Figure 36. Model 15 (Predict Bad 31, 53 and Good 4, 26, 42, 66)
Figure 37. Model 16 (Predict Bad 31, 56 and Good 4, 26, 42, 66)
68
APPENDIX C
Figure 38. Model 17 (Predict Bad 9 Using Artificial Sets)
69
Figure 39. Model 18 (Predict Bad 33 Using Artificial Sets)
70
APPENDIX D
Figure 40. Stepwise Model Using 5 Predictors (Predict Bad 9)
Figure 41. Stepwise Model Using 5 Predictors (Predict Bad 22)
71
Figure 42. Stepwise Model Using 5 Predictors (Predict Bad 31)
Figure 43. Stepwise Model Using 5 Predictors (Predict Bad 33)
72
Figure 44. Stepwise Model Using 5 Predictors (Predict Bad 53)
Figure 45. Stepwise Model Using 5 Predictors (Predict Bad 56)
73
THIS PAGE INTENTIONALLY LEFT BLANK
74
LIST OF REFERENCES
Bechhoefer E., Bernhard A., Use of Non-Gaussian Distribution for Analysis of Shaft Components, IEEE, 2005.
Bechhoefer E., Mayhew E., Mechanical Diagnostics Systems Engineering in IMD-HUMS, IEEE, 2005.
Bechhoefer E., Power D., IMD HUMS Rotor Track and Balance Techniques, IEEE, 2002.
Bishop, C., NNs for Pattern Recognition, Clarendon Press, 1995.
Buttrey S., Koyak R., Read R., Whitaker L., Prognostics of Complex Rotating Machinery Statistical Concepts and Capabilities of the NPS Team, Naval Postgraduate School, 2003.
Clementine 10.0 User’s Guide, SSPS, Integral Solutions, 2005. Davis A., Handbook of Condition Monitoring, Chapman & Hall, 1998.
Duda R., Hart P., Stork D., Pattern Classification, Second Edition, John Wiley & Sons, 2000.
Elyurek M., Establishing a Vibration Threshold Value, Which Ensures a Negligible False Alarm Rate for Each Gear in CH-53 Aircraft Using the Operational Data, Master’s Thesis, Department of Operations Analysis, Naval Postgraduate School, 2003. Fausett L., Fundamentals of NNs, Prentice Hall, 1994.
GGobi Data Visualization System, http://www.ggobi.org, 10 August 2006.
Harris C., Harris' Shock and Vibration Handbook, Fifth Edition, McGraw-Hill, 2002. Kartalopoulos S., Understanding NNs And Fuzzy Logic, IEEE Press, 1996.
Lawrence J., Introduction to Neural Networks, Fifth Edition, California Scientific, 1993. Mark, W.D., “Gear Noise Origins,” AGARD Conference Proceedings, 1998. Norgaardm M., Ravn O., Poulsen N., Hansen L., NNs for Modeling and Control of Dynamic Systems, Springer, 2000.
Rao S., Mechanical Vibrations, Fourth Edition, Pearson Prentice Hall, 2004. Ripley B., Pattern Recognition and NNs, Cambridge University Press, 1996.
75
Simpson P., NN Applications, IEEE Technical Activities Board, 1996. Sobajic D., NN Computing for the Electric Power Industry, Lawrence Erlbaun Associates, 1993.
Software Requirements Specifications, Goodrich Corporation, 1998.
System Users Manual for Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), U.S. Army UH-60A/L, February 2005. Technical Manual 1-1520-237-10, Operators Manual for UH-60L Helicopter, Headquarters, Department of the Army, May 2003.
Tsang, A., Condition-Based Maintenance: Tools and Decision Making, Journal of Quality in Maintenance Engineering, 1995.
Willard L., Klesch G., Using Integrated Mechanical Diagnostics Health And Usage Management System (Imd-Hums) Data To Predict Uh-60l Electrical Generator
Condition, Master’s Thesis, Department of Operations Analysis, Naval Postgraduate School, 2005.
76
INITIAL DISTRIBUTION LIST
1. Defense Technical Information Center Ft. Belvoir, Virginia
2. Dudley Knox Library Naval Postgraduate School Monterey, California 3. Professor Lyn R. Whitaker Department of Operations Research Naval Postgraduate School Monterey, California
4. Professor Samuel E. Buttrey Department of Operations Research Naval Postgraduate School Monterey, California 5. Maj Tourvalis Evangelos Greece, Volos
77
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- baoquwan.com 版权所有 湘ICP备2024080961号-7
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务