您的当前位置：首页 APPLICATION OF NEURAL NETWORKS TO PREDICT

APPLICATION OF NEURAL NETWORKS TO PREDICT

来源：暴趣科技网

NAVAL

POSTGRADUATE

SCHOOL

MONTEREY, CALIFORNIA

THESIS

APPLICATION OF NEURAL NETWORKS TO PREDICT UH-60L ELECTRICAL GENERATOR CONDITION USING

(IMD-HUMS) DATA

Evangelos Tourvalis

December 2006

Thesis Advisor: Lyn R. Whitaker Second Reader: Samuel E. Buttrey

Approved for public release; distribution is unlimited

THIS PAGE INTENTIONALLY LEFT BLANK

REPORT DOCUMENTATION PAGE

Form Approved OMB No. 0704-0188

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503.

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED December 2006 Master’s Thesis 4. TITLE AND SUBTITLE: Application of Neural Networkss to Predict UH-60L 5. FUNDING NUMBERS Electrical Generator Condition using (IMD-HUMS) data 6. AUTHOR(S) Tourvalis Evangelos

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING

Naval Postgraduate School ORGANIZATION REPORT Monterey, CA 93943-5000 NUMBER 9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING

N/A AGENCY REPORT NUMBER 11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. 12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Approved for public release; distribution is unlimited 13. ABSTRACT (maximum 200 words)

In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System

(IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of vibratory and flight regime data obtained from sensors located throughout the aircraft.

The purpose of this research is to predict failures of the UH-60L’s electrical generators, applying Artificial Neural

Networks (ANN) on the IMD-HUMS-produced data. Artificial NNs are data based vice rule based, thereby possessing the potential capability to operate where analytical solutions are inadequate. They are reputed to be robust and highly tolerant of noisy data. Software tools such as Clementine 10.0, S-Plus 7.0, and Excel are used to establish these predictions.

This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However,

the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.

14. SUBJECT TERMS Condition Based Maintenance, IMD-HUMS, ANNs, Backpropagation, 15. NUMBER OF Learning Process PAGES

16. PRICE CODE 17. SECURITY

CLASSIFICATION OF REPORT

Unclassified

NSN 7540-01-280-5500

18. SECURITY

CLASSIFICATION OF THIS PAGE

Unclassified

19. SECURITY 20. LIMITATION CLASSIFICATION OF OF ABSTRACT ABSTRACT

Unclassified UL

Standard Form 298 (Rev. 2-)

Prescribed by ANSI Std. 239-18

THIS PAGE INTENTIONALLY LEFT BLANK

Approved for public release; distribution is unlimited bold

APPLICATION OF NEURAL NETWORKS TO PREDICT UH-60L ELECTRICAL GENERATOR CONDITION USING (IMD-HUMS) DATA

Evangelos Tourvalis Major, Hellenic Air Force

Submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN OPERATIONS RESEARCH

from the

NAVAL POSTGRADUATE SCHOOL

December 2006

Author: Evangelos Tourvalis

Approved by: Lyn R. Whitaker

Thesis Advisor

Samuel E. Buttrey Second Reader

James N. Eagle

Chairman, Department of Operations Research

iii

THIS PAGE INTENTIONALLY LEFT BLANK

ABSTRACT

The purpose of this research is to predict failures of the UH-60L’s electrical generators, applying Artificial Neural Networks (ANN) on the IMD-HUMS-produced data. Artificial NNs are data based vice rule based, thereby possessing the potential capability to operate where analytical solutions are inadequate. They are reputed to be robust and highly tolerant of noisy data. Software tools such as Clementine 10.0, S-Plus 7.0, and Excel are used to establish these predictions.

This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However, the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition monitoring and diagnostics problem.

THIS PAGE INTENTIONALLY LEFT BLANK

TABLE OF CONTENTS

INTRODUCTION........................................................................................................1 A. CONDITION BASED MAINTENANCE......................................................1 B. IMD-HUMS......................................................................................................3

1. On-Board System (OBS).....................................................................3 2. Ground Station System (GSS)............................................................4 C. PREVIOUS WORK.........................................................................................6 D. AREA OF RESEARCH AND APPROACH.................................................7 E. STATISTICAL TOOLS..................................................................................8 F. ORGANIZATION OF STUDY......................................................................8 ARTIFICIAL NEURAL NETWORKS OVERVIEW.............................................9 A. HISTORY.........................................................................................................9 B. BIOLOGICAL NEURON.............................................................................10 C. ARTIFICIAL NEURON...............................................................................11 D. ARCHITECTURE OF NEURAL NETWORKS........................................12

1. Single Layer Networks (SLN)...........................................................12 2. Multi Layer Networks (MLN)..........................................................12 3. Feed -Forward Networks (FFN).......................................................13 4. Radial Basis Function Networks (RBFN)........................................13 E. LEARNING PROCESS.................................................................................14

1. Supervised Learning..........................................................................14

a. Hebbian Learning...................................................................14 b. Delta Rule Learning................................................................15 c. Competitive Learning..............................................................15 2. Unsupervised Learning.....................................................................15 3. Activation Functions..........................................................................16 4. Gradient Descent................................................................................18 5. Back propagation Algorithm............................................................19

a. First Case.................................................................................22 b. Second Case.............................................................................22 6. Efficient Algorithms...........................................................................24 7. Batch Vs Incremental Learning.......................................................28

a. Advantages of Incremental Learning (IL).............................28 b. Advantages of Batch Learning (BL)......................................28 DATA DESCRIPTION AND METHODOLOGY..................................................31

A. SOURCES OF VIBRATION........................................................................31

1. Gear Vibration...................................................................................31 2. Bearings..............................................................................................32 3. Shafts...................................................................................................32 B. DATA COLLECTION..................................................................................32 C. SELECTING VARIABLES..........................................................................33

1. Input Vector.......................................................................................33

vii

II.

III.

D. E.

F. IV.

a. Torque......................................................................................33 b. SO_1 (Shaft Order 1)..............................................................33 c. SO_2 (Shaft Order 2)..............................................................33 d. SO_3........................................................................................34 e. Signal Average RMS...............................................................34 f. Residual Kurtosis....................................................................34 g. Residual RMS..........................................................................34 h. Side Band Modulation_1........................................................34 i. Gear Distributed Fault............................................................34 j. G2_1.........................................................................................34 k. Residual Peak to Peak.............................................................34 l. Gear Misalignment_1.............................................................35 m. Ball Energy..............................................................................35 n. Cage Energy............................................................................35 o. Inner Race Energy..................................................................35 p. Outer Race Energy..................................................................35 q. Envelope RMS.........................................................................35 2. Output Vector.....................................................................................36 DATA PREPROCESSING...........................................................................37 DATA SETS...................................................................................................40 1. Training Set........................................................................................40 2. Test Sets..............................................................................................40 3. Validation Set.....................................................................................41 NETWORK ARCHITECTURE AND EVALUATION CRITERIA........41

RESULTS AND DISCUSSION................................................................................43 A. MODEL WITH ALL PREDICTORS..........................................................43 B. ARTIFICIAL TRAINING SETS.................................................................49 C. STEPWISE PREDICTORS USAGE...........................................................52 CONCLUSIONS AND RECOMMENDATIONS...................................................55

APPENDIX A.........................................................................................................................59 APPENDIX B.........................................................................................................................61 APPENDIX C.........................................................................................................................69 APPENDIX D.........................................................................................................................71 LIST OF REFERENCES......................................................................................................75 INITIAL DISTRIBUTION LIST.........................................................................................77

viii

LIST OF FIGURES

Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. Figure 11. Figure 12. Figure 13. Figure 14. Figure 15. Figure 16. Figure 17. Figure 18. Figure 19. Figure 20. Figure 21. Figure 22. Figure 23. Figure 24. Figure 25. Figure 26. Figure 27. Figure 28. Figure 29. Figure 30. Figure 31. Figure 32. Figure 33. Figure 34. Figure 35. Figure 36. Figure 37. Figure 38. Figure 39. Figure 40. Figure 41. Figure 42.

Overview of Maintenance Terminology............................................................2 OBS & GSS (From: IMD-HUMS User Manual, 2005)...................................5 A Biological Neuron (From: Lawrence, J., 1993)...........................................10 An Artificial Neuron........................................................................................11 Single Layer Network......................................................................................12 Multi Layer Network.......................................................................................13 Feed Forward and RBF Network Representation............................................14 Identity Function..............................................................................................17 Sigmoid Function.............................................................................................17 Hyperbolic Tangent Function..........................................................................18 Gradient Descent..............................................................................................18 Sigmoid Function.............................................................................................19 Gradient Descent Using One Weight...............................................................20 Learning Rate effect on Gradient Descent (From: Fausett, L., 1994).............25 Ill Conditioning. (From: Bishop, C., 1995).....................................................27 Clementine Preprocessing Data.......................................................................39 Model Architecture GUI..................................................................................40 Clementine Prediction Table............................................................................42 Ggobi screen for “bad” generators...................................................................44 p

ˆ from “bad” Generator 9...............................................................................54 p

ˆ from “good” Generator 66...........................................................................54 Model 1 (Predict Bad 09 and Good 4, 26, 42, 66)...........................................61 Model 2 (Predict Bad 22 and Good 4, 26, 42, 66)...........................................61 Model 3 (Predict Bad 31 and Good 4, 26, 42, 66)...........................................62 Model 4 (Predict Bad 33 and Good 4, 26, 42, 66)...........................................62 Model 5 (Predict Bad 53 and Good 4, 26, 42, 66)...........................................63 Model 6 (Predict Bad 56 and Good 4, 26, 42, 66)...........................................63 Model 7 (Predict Bad 53, 9 and Good 4, 26, 42, 66)....................................... Model 8 (Predict Bad 53, 22 and Good 4, 26, 42, 66)..................................... Model 9 (Predict Bad 53, 31 and Good 4, 26, 42, 66).....................................65 Model 10 (Predict Bad 53, 33 and Good 4, 26, 42, 66)...................................65 Model 11 (Predict Bad 53, 56 and Good 4, 26, 42, 66)...................................66 Model 12 (Predict Bad 31, 9 and Good 4 ,26, 42, 66).....................................66 Model 13 (Predict Bad 31, 22 and Good 4, 26, 42, 66)...................................67 Model 14 (Predict Bad 31, 33 and Good 4, 26, 42, 66)...................................67 Model 15 (Predict Bad 31, 53 and Good 4, 26, 42, 66)...................................68 Model 16 (Predict Bad 31, 56 and Good 4, 26, 42, 66)...................................68 Model 17 (Predict Bad 9 Using Artificial Sets)...............................................69 Model 18 (Predict Bad 33 Using Artificial Sets).............................................70 Stepwise Model Using 5 Predictors (Predict Bad 9)......................................71 Stepwise Model Using 5 Predictors (Predict Bad 22)....................................71 Stepwise Model Using 5 Predictors (Predict Bad 31)....................................72

Figure 43. Figure 44. Figure 45.

Stepwise Model Using 5 Predictors (Predict Bad 33)....................................72 Stepwise Model Using 5 Predictors (Predict Bad 53)....................................73 Stepwise Model Using 5 Predictors (Predict Bad 56)....................................73

LIST OF TABLES

Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11.

Potential Model Predictors...............................................................................35 Bad Generators—Reasons for Replacement....................................................37 Training Set using only Original Data.............................................................38 Data Multiplication of Bad Observations........................................................39 Models Predicting Single Generator................................................................46 Models Predicting Pair of Generators Including 53........................................47 Models Predicting Pair of Generators Including 31........................................48 Summary Statistics for each Predictor variable including the Minimum, Maximum, Average and Standard Deviation...................................................50 Artificial Training Set to Predict Gen 9...........................................................51 Artificial Training Set to Predict Gen 33.........................................................51 Stepwise Good Generated Model....................................................................53

THIS PAGE INTENTIONALLY LEFT BLANK

xii

LIST OF ACRONYMS AND ABBREVIATIONS

ANN(s)

Artificial Neural Network(s)

BP Back Propagation BL Batch Learning CBM GSS FFN

IL Incremental IMD-HUMS LMS Least-Mean-Squares MLN Multi OBS PM Predetermined RBFN SLN

Condition Based Maintenance Ground Station System Feed Forward Networks

Learning

Integrated Mechanical Diagnostics Health and Usage Maintenance System Layer Networks On Board System

Maintenance

Radial Basis Function Networks Single Layer Networks

xiii

THIS PAGE INTENTIONALLY LEFT BLANK

xiv

ACKNOWLEDGMENTS

I would like to acknowledge the help of my excellent thesis advisor, Professor Lyn R. Whitaker, and my second reader, Samuel E. Buttrey, for their direction and assistance as this research was developed.

Also, I would like to mention that this thesis could not have been completed without the presence, support and encouragement of my wife Elena and my son Vasilis.

THIS PAGE INTENTIONALLY LEFT BLANK

xvi

EXECUTIVE SUMMARY

Readiness is a key factor for military forces to stay effective and reliable in a continuously growing and demanding environment. Increased readiness can be achieved by increasing availability through performing efficient maintenance, performing less corrective maintenance actions, and identifying more accurate preventive maintenance periods. Today, the United States and allied forces spend billions of dollars for time or phased maintenance periods that overlook several facts and realities of operational use. Important savings can be gained by using hardware and software to evaluate component health and the conditions of systems based on operational usage and performing maintenance in relation to statistical and engineering analyses that predict availability and readiness.

Nowadays, the majority of maintenance processes are accomplished by either the predetermined preventive or the corrective approach. The former approach has fixed maintenance intervals; the latter is performed after the fault of the component. Because both approaches are costly, some industries have started to perform maintenance action in a predictive manner, Condition Based Maintenance (CBM), where the condition is the key parameter to set the maintenance intervals and appropriate maintenance tasks.

Condition Based Maintenance (CBM) is a technology weapon that tries hard to recognize initial faults before they develop into critical failures, which permits more precise scheduling of the preventive maintenance. The causes that have motivated a boost in the action of CBM include the need for reduced maintenance and logistics costs, protection against failure of mission-important equipment, and upgraded equipment availability.

viix

problems early so that maintenance can be performed before it becomes an issue that could impact flight operations. The system also provides operators with accurate flight parameter data, monitored automatically on each flight, allowing them to better schedule routine maintenance and, in some cases, avoid unnecessary early repair and overhaul.

Neural networks are used in numerous fields, including medical diagnostics. In this thesis neural networks are used for machinery diagnostics and specifically for diagnosing the UH-60L helicopter’s electrical generator. In order to accomplish this, a database collected from IMD-HUMS is used. The emphasis in this thesis is to develop a neural network that would utilize the collected data from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that would predict a potential failure of a UH-60L helicopter generator. Many different neural networks are evaluated for their success rate for this faulting diagnosis.

As in any prediction/forecasting model, the selection of appropriate model inputs is extremely important. However, in most ANN Artifiacial Neural Network) applications, less attention is given to this task. The main reason for this is that ANNs belong to the class of data-driven approaches, whereas conventional statistical methods are model driven. In the latter, the structure of the model has to be determined first, which is done with the aid of empirical or analytical approaches, before the unknown model parameters can be estimated. Data-driven approaches, on the other hand, have the ability to determine which model inputs are critical, so there is less need for “...a priori rationalization about relationships between variables...” However, presenting a large number of inputs to ANN models and relying on the network to determine the critical model inputs usually increases network size. This has a number of disadvantages, such as decreasing processing speed, increasing the amount of data required to estimate the connection weights efficiently and degrading performance of the AAN. This is particularly true for complex problems, where the number of potential inputs is large and where no a priori knowledge is available to suggest which inputs to include.

Clementine which is the software used in this research, incorporates several

features to avoid some of the common pitfalls of ANNs, including sensitivity analysis, network accuracy, and feedback graph. With these options selected, a sensitivity analysis

xviii

will provide information on which input fields are most important in predicting the output field, a network accuracy will provide the percentage of records for which the prediction of the model matches the observed value in the data, and the feedback graph will depict the accuracy of the network over time as it learns.

In practice, building an ANN forecasting model involves a lot of trial and error. Consequently, the objective of this thesis is to provide a practical, non-technical introduction to structure an ANN forecasting model using real operating data of UH-60L helicopters. The success of ANN applications for an individual researcher depends on three key factors. First, the researcher must have the time, patience, and resources to experiment. Second, the ANN software must allow automated routines, such as walk-forward testing, optimization of hidden neurons, and testing of input variable combinations—either through direct programming or the use of batch/script files. Third, the researcher must maintain a good set of records that lists all parameters for each network tested.

xix

THIS PAGE INTENTIONALLY LEFT BLANK

I. INTRODUCTION

Readiness is a key factor in enabling military forces to stay effective and reliable in a continuously growing and demanding environment. Increased readiness can be achieved by increasing availability through performing efficient maintenance, performing fewer corrective maintenance actions, and identifying more accurate preventive maintenance periods. Today, the United States and allied forces spend billions of dollars on time or phased-maintenance approaches that overlook several facts and realities of operational use. Important savings can be gained by using hardware and software to evaluate component health and the conditions of systems based on operational usage and performing maintenance in relation to statistical and engineering analyses that predict availability and readiness.

The emphasis in this thesis is to develop a neural network that utilizes data collected from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that can predict a failure of a UH-60L helicopter generator. Many different neural networks will be evaluated for their success rate on this faulting diagnosis. A.

CONDITION BASED MAINTENANCE

Maintenance is usually carried out in either time-based scheduled periods (so-called preventive maintenance) or by corrective maintenance. Preventive maintenance aims to avoid system or component failure by performing repair, service, or replacement within the fixed time intervals. On the other hand, corrective maintenance is performed after the failure or when an apparent fault has taken place (Davis, A., 1998). For several types of equipment or systems the maintenance action must be done without delay, but for many others it can be delayed depending on the equipment’s function. In many cases the preventive maintenance can be divided into two groups: Condition-Based Maintenance (CBM) and Predetermined Maintenance (PM). PM is scheduled in time, while CBM mostly has dynamic or on-request intervals (Figure 1).

Figure 1. Overview of Maintenance Terminology

Entire CBM schemes involve a number of efficient capabilities, like sensing and data acquisition, signal processing, condition and health estimation, prognostics, and decision assistance. Moreover, in order for the user to have access to the system, a Human System Interface (HSI) development is necessary. Generally, the integration of various hardware and software components is needed to implement a CBM system.

A complete architecture for CBM systems should cover the range of functions from data collection through the recommendation of specific maintenance actions. The major tasks that assist CBM consist of (http://www.osacbm.org):

• • • • • • •

Sensing and data acquisition

Signal processing and feature extraction Production of alarms or alerts

Fault or failure diagnosis and health evaluation

Prognostics: projection of health profiles to future health or estimation of remaining useful life

Decision aiding: maintenance recommendations, or evaluation of asset readiness for a particular operational setting

Management and control of data flows or test sequences

• • •

Management of historical data storage and historical data access System configuration management Human system interface.

CBM makes use of information collected on equipment through monitoring devices. As equipment becomes more complex, more manufacturers are providing these monitoring devices to assist companies or organizations handle and maintain their equipment (Tsang, A., 1995). CBM uses this online data to compare equipment conditions to predefined operating thresholds. Data that happen to fall outside these thresholds generates a maintenance alert by the software that signals a problem or area of concern.

B. IMD-HUMS

In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of data obtained from sensors located throughout the aircraft. The IMD-HUMS improves aircraft availability for operators by identifying potential problems early so that maintenance can be performed before it becomes an issue that could impact flight operations. The system also provides operators with accurate flight parameter data, monitored automatically on each flight, allowing them to better schedule routine maintenance and, in some cases, avoid unnecessary early repair and overhaul. The IMD-HUMS consists of two main subsystems: the On-Board System (OBS) and the Ground Station System (GSS) (System Users Manual For IMD-HUMS, 1995).

1. • • • • • •

On-Board System (OBS) Cockpit display unit (CDU) Data transfer unit (DTU) Remote data concentrator (RDC) Main processor unit (MPU) 2 junction boxes (JB1/JB2)

20 drive train and gearbox accelerometers

The OBS is comprised of the following components (Figure 2):

• • • • • •

4 engine accelerometers 5 trim and balance accelerometers

1 4g body accelerometer for regime recognition Main and tail rotor magnetic RPM sensors Main rotor blade tracker

Engine output shaft optical tachometers.

The heart of the IMD-HUMS OBS is the Main Processing Unit (MPU). The MPU collects the data from the accelerometers, analyzes the inputs, and records the data, seeking for vibration exceedances and events. It calculates time spent in various flight regimes, performs various diagnostic algorithms, and stores the data to an onboard data cartridge. The OBS also provides for crew interaction through a Cockpit Display Unit (CDU) in order to support prompted procedure actions related to power assurance checks, power train analyses, and rotor track and balance data acquisitions. Besides prompted actions, the OBS uses regimes information to automatically store power train and rotor vibration data.

Ground Station System (GSS)

The GSS is the major user interface for the IMD-HUMS. It performs after-flight debrief and is designed to analyze, process, and compile flight data into useful information for the maintenance crew, logistics teams, the operations department, and engineering support. The IMD-HUMS GSS functions include:

Figure 2. OBS & GSS (From: IMD-HUMS User Manual, 2005)

• • • • • • • • •

Rotor Time and Balance Strip Charts of Aircraft Data Engine Performance Trending

Usage Computation and Tracking Regime Identification and Processing Flight Operations Management Fault/BIT Display Maintenance Management

C. PREVIOUS WORK

Willard and Klesch in their 2005 thesis, used 36,742 observations from monitored components of 30 UH-60L helicopter’s generators. The data was collected during the two-year period where the IMD-HUMS were installed. Each IMD-HUMS acquisition concerning the shaft, spur gear, and bearing of generators results in 170 variables. Each generator is assigned a binary value 1 or 0 to classify its known state. The value of one was given to the generators that were removed for fault, hence referred as bad generators. The value of zero was given to the generators that were not removed, referred to as good generators. To accomplish this generator classification, maintenance records and photographs from the 101st AVN Division were used.

Principal components and other techniques were applied to reduce the 170 initial predictors to only 10. A logistic regression model and random forest classifiers were used on each generator, and the plotted probabilities of being bad were smoothed and used to predict the current functional condition of generators in the test set. Only Condition Indicators (CI) computed in the last 20 observations of each generator were used in the predictive models because generators classified as bad were not necessary bad through their entire two-year history. Due to the highly variable nature of the predictor values, the model had lower success predicting states with just one acquisition. One the other hand, some surprising cases of generators which were wrongly presumed to be bad and, conversely, another generator which was wrongly assumed to be good, were classified correctly by this study’s approach.

D. AREA OF RESEARCH AND APPROACH

ANNs have a number of traits that make them an attractive alternative to

conventionally configured expert systems. First, many are capable of discriminating non-linear relationships. Second, they are capable of functioning with a certain degree of background noise and erroneous information with minimal degradation of their pattern recognition abilities. Third, they have the ability to generalize, having the ability to classify previously unseen vector patterns into existing and, in some cases, new output categories. They are also capable of identifying multiple faults. These are all areas where traditional expert systems typically fall short. Moreover, ANNs are data-based rather than rule-based. This means that they may be capable of correctly discriminating relationships previously hidden from the best of “experts”.

ANNs are not without their disadvantages. They, like all computer algorithms, are capable only of manipulating numbers and require an engineer to discern the intelligence of their output. Their success is largely limited to the quality of the data that they are provided. If the input vectors provided are inadequate to describe the decision space fully, then their likelihood for success is small. Again, they require an engineer to provide the proper inputs. Finally, they may be able to distinguish new relationships, but the relationships themselves remain hidden; all that is seen external to the network are the input and the output vectors. It is generally believed that the relationships are somehow hidden in the connection weights and the hidden layers but meaningful extraction of this information has yet to occur.

ANNs appeared to have potential in numerous fields, including machinery diagnostics. The question might be asked whether an ANN should theoretically be capable of recognizing patterns in vibration signatures. It is the scope of this research to determine whether this potential can be realized in the region of machinery diagnostics and specifically for the UH-60L helicopter’s electrical generator. In order to accomplish this, a database collected from IMD-HUMS is be used. Pattern recognition is an essential component of rotating machinery condition forecasting; therefore, examining and training different model structures and shapes in trying to identify patterns that are “storing” the “weights” of the networks is being researched.

E. STATISTICAL TOOLS

ANN software prices start at a few hundred dollars and can go to hundreds of thousands, or even more. The most expensive ones are generally packaged with more complete data mining products, which contain ANNs as one of the capabilities offered. This research utilizes one of those statistical packages, Clementine 10.0 produced by SPSS. This product is designed to function on servers and networks and has the ability to handle massive databases. The software also provides some handy features useful in evaluating the function fit by the NN, such as a computation of sensitivities. Through this software, the researcher had the opportunity to apply and, at the same time, train a number of different kinds of ANNs, in an effort to find the most suitable for the database of interest.

In addition, S-PLUS 7.0 from Insightful and Excel from Microsoft, two inexpensive and well-spread tools were used to assist the models in this research. F.

ORGANIZATION OF STUDY

This thesis begins with Chapter I, which briefly introduces the reader to the modern concept of CBM and the tools available to support it, like IMD-HUMS manufactured by Goodrich Corporation. A summary of previous relative work is provided, along with the author’s area of interest and the tools utilized to achieve the objective of this research.

Chapter II is dedicated to the traditional explanation of ANNs. Theory and pictures are used concurrently trying to clarify what is commonly known to ANNs as a “black box” solution. Data enter the “black box” and a prediction comes out of it.

Chapter III describes the database, procedures that are followed to clean and choose the final data in use, and presents the steps of the methodology that leads to the result of this research.

Chapter IV briefly presents and discusses the results and outputs of the models that were trained in supporting the goal of this research. Most of the structures of these models are summarized in Appendix B.

Finally, Chapter V summarizes the conclusions and recommendations for future research ideas associated with the database and used techniques.

II. ARTIFICIAL NEURAL NETWORKS OVERVIEW

An Artiﬁcial Neural Network (ANN) is an information processing system that has certain performance characteristics in common with biological ANNs. They are parallel in nature and the fundamental idea behind them is that, if it works in nature, it must be able to work in computers. ANNs are data-based vice rule-based, so they possess the potential of being able to operate where analytical solutions are inadequate. They are reputed to be robust and extremely tolerant of noisy data. A. HISTORY

The ANN concept was first introduced in 1943 by W. McCullock and W. Pitts, who, while trying to describe how the brain’s neurons might work, modeled an ANN using electrical circuits. In 1949, D. Hebb introduced the training of the ANNs in his book “The Organization of Behavior,” in which he argued that if two nerves fire at the same time, their connection is strengthened and thereby it is also possible that the same two nerves will fire again. By the 1950’s, while computers became more advanced, researchers were eventually able to simulate such a hypothetical ANN. N. Rochester of IBM laboratories was an early pioneer in this field but, unfortunately, his effort failed. During 1959 and 1962, B. Widrow and M. Hoff developed two models (ADALINE and MADALINE) that recognized binary patterns, introduced a new learning algorithm applying the Least-Mean-Squares (LMS) learning rule, and used it to train adaptive ANNs.

For almost for two decades (1960-1980), interest in ANNs faded because of a lack of new ideas and computational power. In 1972, T. Kohonen and J. Anderson developed a similar network independently of one another, with both resulting in a collection of analog ADALINE circuits. The first multi-layered network was developed in 1975, an unsupervised network.

Since 1980, many researchers have boosted the idea and today ANNs are extremely popular as prediction and forecasting tools in a number of areas. The future of ANNs, though, lies in the development of hardware.

B. BIOLOGICAL NEURON

In principal, the brain is composed of almost 10 billion neurons; each is attached to about 10,000 other neurons. The main segment of the cell is called the soma or cell body (Figure 3). Neurons have a large number of extensions called dendrites. Each neuron receives electrochemical signals from other neurons connected throughout different axons. At the synapses—between the dendrite and axons—electrical signals are modulated in various amounts. If the sum of these electrical signals is sufficiently influential to activate the neuron, it produced an electrochemical output along the axon, and passes this signal to the other neurons, whose dendrites are attached at any of the axon terminals (Norgaardm M., Ravn O., Poulsen N., Hansen L., 2000). These attached neurons may then fire. It is essential to point out that a neuron fires only if the entire signal received at the cell body surpasses a certain level. The neuron either fires or it doesn't; there aren't different levels of firing.

Figure 3. A Biological Neuron (From: Lawrence, J., 1993)

The human brain is composed of these interrelated, electrochemical, broadcasting neurons. From a huge number of extremely naive processing units (each carrying out a weighted sum of its inputs, and then firing a binary output if the total input exceeds a certain level) the brain controls very complex tasks. This is the model on which artificial ANNs are based. Although ANNs haven't even approximated modeling the

complexity of the human brain, they have appeared to be excellent at problems that are easy for a human but difficult for a conventional computer, such as image recognition and forecast based on earlier knowledge. C. ARTIFICIAL NEURON

The artificial neuron is meant to mimic the major characteristics of the biological neuron. The three crucial mechanisms of the artiﬁcial neuron are:

(1)

The synapses or connecting links that give weights,wj to the input values,

xj for j=1, 2,…, m (Figure 4).

(2)

A summer that adds all the weighted input values to calculate the input to the activation functionv=w0+∑wjxj, where w0 called the bias (not to

j=1m

be confused with statistical bias in prediction or estimation) which is an

arithmetic value associated with the neuron. It is suitable to consider the bias as the weight for an input x0 whose value is constantly equivalent to 1, so that v=∑wjxj.

j=0m

(3)

X0=1 X1 X2 . . . Xm An activation function g that maps v to g (v), the output value of the neuron. This function is a monotone function.

w0 w1 w2 v g g(v) Σwm Figure 4. An Artificial Neuron

The artificial neuron has two stages of operation; the training stage and the using stage (Fausett, L., 1994). During the former, the neuron can be trained to fire or not, for specific input patterns or vector. In the latter, when a taught input pattern is identified at the input, its relayed output updates the current output. If the input pattern does not

belong in the taught list of input vector or patterns, the firing rule is used to determine whether to fire or not. Is important to state that the firing rule is related to all the input patterns, not only the ones on which the neuron was trained. D.

ARCHITECTURE OF NEURAL NETWORKS 1.

Single Layer Networks (SLN)

The structure of an artificial ANN consists of the ‘input layer’ connected to the ‘hidden layer’, which is connected to the ‘output layer.’ Any action of the input units corresponds to the raw data that is fed into the network, while the action of each hidden unit is determined by the behavior of the input units and the weights on the links between the input and the hidden units (Ripley, B., 1996). The status of the output units depends on the activity of the hidden units and the weights between the hidden and output units (Figure 5).

Figure 5. Single Layer Network

This simple class of network is especially interesting because the hidden units are free to build their own versions of the input. The weights between the input and hidden units decide when each hidden unit is active, and so by adjusting these weights, a hidden unit can choose what it represents.

Multi Layer Networks (MLN)

The most frequent ANN model is the multilayer perceptron (MLP). This family of ANN is known as a supervised network because it needs a preferred output for learning purposes (Kartalopoulos, S., 1996). The goal of this network is to generate a model that

correctly matches the input to the output using chronological data so that the model can then be used to create the output when the desired output is not known. A graphical demonstration of an MLP is shown below (Figure 6).

Figure 6. Multi Layer Network

Feed -Forward Networks (FFN)

Feed-forward ANNs are the most accepted and extensively used models in several realistic applications. FFNs permit signals to pass through one way only: from input to output. There is no feedback or loop and they tend to be straight-forward networks that connect inputs with outputs. They are broadly used in pattern recognition. This structure of organization is also referred to as bottom-up or top-down. (Figure 7)

Radial Basis Function Networks (RBFN)

The input of a Radial Basis Function (RBF) network is nonlinear while the output is linear. Because of their nonlinear approximation properties, RBF networks are capable of modeling complex mappings, which perceptron ANNs can only model by using multiple intermediary layers (Bishop, C., 1995). To use an RBF network we need to state

the hidden unit activation function, the quantity of processing units, a criterion for modeling a given task, and a training algorithm for finding the parameters of the network.

Figure 7. Feed Forward and RBF Network Representation

E. LEARNING PROCESS

One of the most essential phases of ANN is the learning process. Learning can be done in a supervised or unsupervised way.

1. Supervised Learning

In supervised learning, both the inputs and the outputs are known. During the training, the net runs the inputs and compares its resulting outputs against the known outputs. The differences of this comparison (the errors) are calculated, and the system adjusts the weights that manage the network. The aim is to establish a set of weights that minimizes the error. One famous method, which is frequent to many learning procedures, is the Least Mean Square (LMS) convergence (Duda, R., 2000). In this case, the network learns “offline” because the learning and the operation stages are different. Supervised learning can be subdivided into the following three general types:

a. Hebbian Learning

Hebbian learning is based on the premise that those connections that

receive the most signal energy should in turn be strengthened. In this type of ANN, connection weights increase in a manner proportional to the magnitude of the signals

provided that both the input through the path and the desired output are high. While historically important and neurologically accurate, it is not widely used in neural computing applications.

b. Delta Rule Learning

Today, the most frequent form of learning in use is the delta rule. Here,

weights are adjusted based on a direct comparison between the actual and desired outputs. Back propagation is one learning rule based on the generalized delta rule:

Wij=C1Eij+C2Mij+C3Xij (2.1)

where Wijis the weight of the connection from the ithelement in the current layer to the jthelement of the previous layer; C1,C2,andC3 are coefficients varying from 0 to 1; Eijis the error proportional to the difference between the actual and desired output of the network;Mijis the momentum term based on the difference between the previous weight of the given connection and the weight immediately prior to that; and Xijis the activation energy associated with that particular connection (Ripley, B., 1996).

c. Competitive Learning

Competitive learning is where the output of processing elements is

weighted according to the magnitude of its response relative to those of other processing elements. The “winning” processing element weighting is then modified according to comparison between actual and desired outputs. Thus only the strongest activation energies are adjusted; weak signals get progressively weaker unless the magnitudes of their response become comparable to those of the “winners”.

2. Unsupervised Learning

In unsupervised training, the network is provided only with inputs and not with preferred outputs. In the training phase, an input pattern is applied to the input layer and the net is permitted to achieve equilibrium (“winner”). Thereby, weight changes are made according to some instructions. The model itself should then decide what features it will use to cluster the input data. This type of organization is also known self-organized or adoption. Here the network learns “online,” because it learns and operates at the same time.

3. Activation Functions

The activation function is basically used to introduce nonlinearity to the net. The

activation functiong(i) transforms the presented input of an artificial neuron during its activation, and determines how influential should be the output from the neuron, based on the sum of the inputs. If the artificial neuron must mimic a biological neuron, the activation functiong(i) has to be a simple threshold function returning binary values. However, this is not always the approach that artificial neurons implement. Sometimes it is more powerful and efficient to have a smooth differentiable activation function. The output from this group of activation functions lies within the ranges of [0,1] or [-1,1], depending on which activation function is applied. Some cases, where the identity function is used as the activation function, do not have these restrictions. On while, inputs and weights have no boundaries and take values within the Rrange(−∞,+∞), in practice they often have small values centered around zero or are rescaled to have such small values.

As pointed out before, there are many different activation functions, but the most frequently used are the identity function (Figure 8), the sigmoid (Figure 9), and the hyperbolic tangent (Figure 10). It is obvious that all the functions should be differentiable because the back propagation (BP) algorithm requires this property in order to work out the data process within the network (Bishop, C., 1995).

An identity function, also known as an identity map or an identity transformation, is a function which does not have any effect. It always returns the same value that was used as its argument. For ANNs, that is reflected by the absence of hidden layers (perceptron).

g(x)=x

Figure 8. Identity Function

A sigmoid function, also known as logistic, is an S-shaped curve that maps all input to the range [0, 1]. It has a limit of 0 as x approaches negative infinity, and 1 as x approaches infinity.

g(x)=

1+e−x

Figure 9. Sigmoid Function

A hyperbolic tangent function is analogous to a sigmoid, but it maps all of its input to the range [-1, 1]. It has a limit of -1 as x approaches negative infinity and 1 as x approaches infinity. The constants a and b define the possible output range (a) as well as the slope (b). The neuron transforms the stimulus in a nonlinear way, an essential precondition for solving a variety of highly complex problems.

e2x−1

g(x)==2x

e+1

(a) (b)

Figure 10. Hyperbolic Tangent Function

Gradient Descent

→

As clarified previously, in training, a function estimator frequently decreases to result in a value of w that minimizes a scalar error functionE(w). This is a typical optimization issue and various methods have been developed to answer it. The most frequent one is gradient descent. Gradient descent consists in thinking that E is the height of a landscape on the weight space: to locate a minimum, beginning from an arbitrary point, march descending until a minimum is reached (Figure 11) (Sobajic D., 1993). From the figure we can see that gradient descent will not at all times converge to an absolute minimum of E, but only to a local minimum. In the majority of these cases, this local minimum is good enough, given a realistic initial value ofw.

→

Figure 11. Gradient Descent.

5. Back propagation Algorithm

Back propagation (BP) is one of the earliest training algorithms, first developed by P. Werbos and is widely used for training supervised networks. The goal of BP is to minimize the square error of the predictions over all the observations (Fausett, L., 1994). Based on this algorithm, the output error is assumed to be collectively contributed by all connection weights. Basically, at the center of the algorithm, an application of the chain rule for ordered partial derivatives takes place to compute the sensitivity that a cost function has with respect to the environment and weights of the net.

Initial weights are usually chosen as small random values, so that each neuron will adapt a different set of weights. The network’s input zjto a node jis resolved by summing the weights of its inputs

zj=∑wi,jxi∀j (2.2)

where xidesignates the input in one node, and wi,jthe weight from node i to node j. Next, the node’s threshold valueθ (bias) is added to net’s input zj and the calculated value is filtered through an activation function, usually a sigmoid function (Figure 12):

F(zj) =

(2.3) -zj+θj

(1 + e)

Figure 12. Sigmoid Function

The sigmoid function is also known as a ”squashing” function because it maps its inputs on a preset range number between 0 and 1.

The learning procedure in a BP net has two stages. At the first stage, each input pattern Ip is provided to the net sequentially, and propagated forward until the output. In the second stage, a technique called “gradient descent” is applied to minimize the total error on the input patterns within the training set. During this technique, weights are altered in proportion to the negative of an error derivative with respect to each weight

∆wj,i=−ε[

∂E

] (2.4) ∂wj,i

Then, the weights moving toward the steepest descent of the error surface, which is defined by the total error

E=ΣpΣj(tp,j−op,j)2 (2.5)

where op,j denotes as the responding output of node j to patternp, and tp,j is the target output for nodej. After the error on each pattern is calculated (1.2), all the weights are readjusted in proportion to this error, and back-propagated from the outputs to the inputs, applying the gradient descent method. The new calculated weights decrease the overall error in the net. The idea of gradient descent using only a single weight is presented in the following picture.

Figure 13. Gradient Descent Using One Weight

An application of the chain rule is used to develop the BP learning rule, while reworking the error gradient for each pattern as the product of two partial derivatives. The first partial derivative represents the change in error as a function of the network input∂Ep/∂wj,i, while the second partial derivative represents the effect of a weight

change on a change in the network input∂zp,j/∂wj,i. Modifying the error gradient turns

into

∂Ep∂wj,i

=[∂Ep∂zp,j

]*[∂zp,j∂wj,i

] (2.6)

Using the equation (2.2) for the net input zj to a nodej, we can solve directly for the second partial derivative and derive the network’s output op,ifor the pattern pto the nodei:

∂zp,j∂wj,i

∂(∑wi,koi,k)

∂wj,i

]=op,i (2.7)

Naming the negative of the first partial derivative as the error signal:

dp,j=−

∂Ep∂zp,j

(2.8)

the corresponding change in the weight wi,j with respect to the error Ep becomes

∆pwj,i=η*dp,j*op,j (2.9)

where η is a parameter describing the learning rate. The speed and accuracy of the learning process during the iterations to update the weights also depend on the learning rateη. A low learning rate can guarantee more stable convergence, but a high learning rate can accelerate convergence in several cases.

The next step in the BP algorithm is to calculate the dp,j for each node in the net. The equation (1.8) can be rewritten as:

dp,j=−[

∂Ep∂op,j

]*[∂op,j∂zp,j

] (2.10)

To compute the first partial derivative there are two cases to examine:

a. First Case

Assume that j is an output node of the net; then, from equation (1.5) it

follows that:

∂Ep∂op,j

=2(tp,j−op,j) (2.11)

substitute equation (1.11) to equation (1.8) becomes:

dp,j=2(tp,j−op,j)*f(zp,j) (2.12)

b. Second Case

Assume that is not anjoutput node of the net, then applying again the

chain rule, we obtain:

∂Ep∂op,j

=∑[

∂Ep

∂zp,k∂op,j∂Ep∂zp,k

][i

][∂zp,k

=∑[

∂(∑wk,iop,i)

∂op,j

(2.13)

=∑[

∂Ep∂zp,k

]wk,j

=∑dp,kwk,j

Combining now the above cases, we form an iterated process for

calculating the signal error dp,j for all nodes in the net. These errors can then be used to update its weights.

As BP uses a gradient descent method, the corresponding net tracks the

contour of an error surface with weight updates moving in the direction of the steepest descent. Assuming a plain net without hidden layers, it is easy to minimize the error using gradient descent because the error surface is bowlshaped (Figure 13). The net will always locate an optimal solution at the base of the bowl. Such optimal solutions are

called global minima. On the other hand, cases which are more complex require the existence of an extra hidden layer to carry out the solution to such difficult problems. Here, error surfaces become also complex, containing possibly many minima. Because some minima are deeper than others, it is possible that gradient descent will not locate a global minimum, and the network may be trapped in a local minimum, which is a suboptimal solution.

It is clear that we want to avoid local minima while training a BP net.

Although in some case this may be difficult to do, in practice it is essential to try to find how often and under what circumstances local minima occur. Moreover we have to study possible approaches for avoiding them. It is known in the ANNs theory that the more hidden layers you have in a net, the less possible you meet a local minimum during training. Although additional hidden nodes amplify the complexity of the error surface, the extra dimensionality increases the number of possible flee paths.

The BP algorithm analyzed in this chapter only involves only weight

changes that are proportional to the derivative of the error. As we mentioned before at equation (1.9), the increment of the learning rate contributes η to an increment of the weight changes on each iteration, and the faster the net learns (although the magnitude of the learning rate can also control whether the net reaches a stable solution). If the learning rate gets too large, then the weight changes no longer approximate a gradient descent procedure and that often results in oscillation of the weights. Obviously, we want to get the largest learning rate without causing oscillation, achieving the best learning speed while minimizing the training time for the net. One technique that has been used is a small modification of the BP algorithm that contains a momentum term.

The idea of momentum is that previous changes in the weights should

control the current direction of movement in the weight space. This idea is implemented by the adjusted weight update rule:

∆wj,i(η+1)=ε*dp,jap,j+α∆wj,i(η) (2.14)

where η is the learning rate. With momentum, if the weights begin to move in a specific

direction in their space, they tend to keep on moving in that direction. Momentum can aid the net to overcome a local minimum, in addition to speeding learning, particularly along extensive flat error surfaces.

In Clementine, the default learning rate is 0.25 and the default momentum

parameter is 0.9. When using BP for a series of problems, much smaller values than these are often used. For especially complex problems, a learning rate of 0.01 is very common.

6. Efficient Algorithms

Selecting the proper value for the learning rate η is not an easy concept. If η has a small value, then the learning procedure will be too slow. On the other hand, if η is assigned a large value, the learning procedure may diverge. An acceptable value of η can be established by trial and error, but this is a quite boring and wasteful process (Bishop, C., 1995). In order to deal with this problem, a large range of efficient learning methods has been developed. Here, we briefly present the most essential theoretical ideas underlying them. One of the most basic ideas for speeding up the learning procedure is to use the second-order information regarding the error function. Assuming a quadratic error in one dimension, the best learning rate is the inverse of the second order derivative (Figure 14), which can aid in designing capable learning methods.

Figure 14. Learning Rate effect on Gradient Descent (From: Fausett, L., 1994)

If it is possible to calculate this second-order derivative, then it is feasible to achieve a good learning rate. Unhappily, the error function might not be quadratic at all. Therefore, setting the learning coefficient to the inverse of the second-order derivative only works near the optimum, in regions where the quadratic approximation is valid. But if the second-order derivative is negative, this does not work. For handling such circumstances, particular care must be taken. Moreover, a few other issues come up when the dimension of the weight space is bigger than one, which is the most common case in practice. The second order derivative is not a single number anymore, but a matrix called the Hessian and defined as:

⎛∂2E

󰁇2⎜󰁊󰁊⎜∂w1⎜2

∂E⎜󰁊󰁊󰁇󰁊󰁊󰁇⎜∂w2∂w1

∂2E⎜H=󰁊󰁇2=⎜.

∂w⎜

.⎜⎜.⎜2

∂E⎜󰁊󰁊󰁇󰁊󰁊󰁇⎜∂w∂w⎝n1

∂2E󰁊󰁊󰁇󰁊󰁊󰁇∂w1∂w2∂2E󰁊󰁊󰁇2∂w2

...∂2E󰁊󰁊󰁇󰁊󰁊󰁇∂wn∂w2

∂2E⎞󰁊󰁊󰁇󰁊󰁊󰁇⎟...

∂w1∂wn⎟

⎟∂2E⎟󰁊󰁊󰁇󰁊󰁊󰁇...

∂w2∂wn⎟

⎟

..⎟ (2.15)

⎟

..⎟

⎟..⎟∂2E⎟

...󰁊󰁊󰁇2⎟

∂wn⎠

Sometimes it is probable to have dissimilar curvatures in different directions (Figure 15). This can generate a major problem if there is, for instance, a second derivative of one hundred (100) in one direction, and a second derivative of one (1) in another. In this case, the learning coefficient must be less than 0.002 in order to avoid divergence. This means that convergence will be very slow in the direction where the second derivative is one (1). This phenomenon is called ‘ill conditioning.’ Efficient algorithms frequently try to change the weight space in order to have the same curvatures in all directions. This has to be done cautiously so that instances where the curvature is negative work as well. Some of the best algorithms are QuickProp, Levenberg Marquardt, and Conjugate Gradient.

Figure 15. Ill Conditioning. (From: Bishop, C., 1995)

7. Batch Vs Incremental Learning

In unsupervised learning, as described earlier, the error function is frequently defined as a sum of error terms over a finite amount of training samples that consist of pair vectors (input, output). Again, the error function is:

with

E=∑Ei (2.16)

󰁇21󰁊󰁇󰁊󰁇󰁊󰁊

Ei=(fw(xi)−yi) (2.17)

Applying the steepest descent on E is known as “batch learning” (BL) for the reason that the gradient of the error has to be calculated on the full training set previous to weights modification. A different method to modify weights aiming to minimize E is “incremental learning” (IL). Within this method, the gradient descent steps are applied on Ei instead of E. Some other known names for IL are online or stochastic learning. Now the obvious question that arises is which of these methods is the best. The answer is not so simple and depends always on the particular problem to be solved. Here are a few of the tips to consider (Simpson P., 1996):

Advantages of Incremental Learning (IL)

•

IL is usually faster, particularly when the training set is redundant. In situations where the training set has input and output patterns that are similar, BL wastes time calculating and adding similar gradients before setting one weight update.

IL often results in better outcomes. This happens because the randomness of IL generates noise in the weight updates. This noise aids weights to jump out of bad local optima.

IL is capable of tracking changes. As an example, consider a learning model of the dynamics of a mechanical system. While this system gets older, its properties might slowly evolve and IL can track this type of drift. b.

Advantages of Batch Learning (BL)

•

In IL, noise causes the weights to continuously oscillate around a local optimum, and they never converge to a constant stable value. This is not the case in BL, making it easier to analyze.

• •

Various acceleration methods can operate only in BL, such as some of the algorithms mentioned earlier (QuickProp, Conjugate Gradient).

Another benefit related to the absence of noise in BL is that the theoretical analysis of the weight dynamics and convergence rates are very simple.

THIS PAGE INTENTIONALLY LEFT BLANK

III. DATA DESCRIPTION AND METHODOLOGY

Vibration analysis is among the most powerful tools available for the detection and isolation of incipient faults in mechanical systems. Among the methods of vibration analysis in use today and under continuous study are broadband vibration monitoring, time-domain analysis, and frequency analysis. All have varying degrees of utility in machinery condition monitoring and diagnostics and all have characteristics that lend themselves particularly well to specific applications. Since the effectiveness of ANN is directly related to how effectively the chosen inputs define a particular decision space, the selection of the optimum vibration parameters for inputs to the ANNs is critical. Thus, a good understanding of elementary machinery diagnostics techniques is essential. A. SOURCES OF VIBRATION

In mechanical systems, any mechanical component which periodically comes in contact with a second component to transmit an axial, radial, or torsional load is a potential source of mechanical vibration. In machines with a gear train, the principal components involved with load transfer will be its torsional power source, such as a motor; the gear meshes; the bearings; and those items that interconnect them, the shafts. Additionally, because vibrational isolation is seldom complete, additional extraneous sources of vibration will also be present. The diagnostician is generally interested in extracting the vibrations created by specific machinery components and ignoring the other sources as extraneous noise. In this study, we are particularly interested in the vibrations generated by the rotating machinery’s gears, bearings, and shafts. As such, the discussion will be limited to these sources of vibration.

Gear Vibration

In a gear train, the gear mesh is the dominating source of mechanical vibration. This vibration primarily stems from the non-uniformity in the transmission of angular motion from one gear to its mate. The non-uniformity of the angular motion occurs due to geometric deviations of the contact surfaces from the ideal involutes shape and the elastic deformation that any mechanical system undergoes when transmitting a load (Mark, W.D., 1998). Moreover, torque fluctuations and deflections of the gearbox can also be

sources of vibration in gears. Clearly, any damage that occurs to the gear contact surface, as well as other mechanical linkages to the gear mesh, will also have an effect on the gear’s vibration (Mattew, J., and Alfredson, R.J., 1987).

2. Bearings

Bearing vibrations occur for much the same reasons as gears. However, because bearings are not situated directly along the power transmission train and support largely static loads, they characteristically generate a small vibration signal until the damage inflicted upon them reaches advanced stages. Because of the low magnitude of these signals, they are often masked by much stronger gear-related signals. Partially because of this belated detection of trouble, antifriction bearings are among the most common causes of machinery failure in moderately sized machines. The frequencies associated with the bearing-related signals generally depend on the location of the damage, the dimensions of the bearings, and the shaft rotation speed.

3. Shafts

Shafts generally produce vibration signals at their rotational frequency and its harmonics. Shafts are also prone to a number of different faults, all of which register at the shaft rotative frequency. In the case of bent shafts and shafts misalignments, the second harmonic is the dominant frequency in 90 percent of the cases (Collacott, R.A., 1979). Imbalances in the shaft or load characteristically generate a dominant signal at the shaft rotative frequency but there tends to be a phase shift as well. Mechanical looseness can also introduce increases in the shaft rotational frequency but also characteristically involves higher harmonics as well (Hewlett Packard, 1983). B. DATA COLLECTION

The database used in this research was provided by Goodrich Corporation and collected through the IMD-HUMS installed on 30 UH-60L helicopters. The period of data collection starts at 9/22/2003 and stops at 7/31/2005. The total number of observations utilized was 36,742 and each observation consists of 169 fields. For the database’s structure and reference ease, each generator was assigned a number starting from 1 up to 66. Appendix A summarizes the allocation of generators among the helicopters and the number of the recorded observations for each generator.

C. SELECTING VARIABLES

As in any prediction or forecasting model, the selection of appropriate model inputs is extremely important. However, in most ANN applications, less attention is given to this task. The main reason for this is that ANNs belong to the class of data-driven approaches, whereas conventional statistical methods are model-driven. In the latter, the structure of the model has to be determined first, which is done with the aid of empirical or analytical approaches, before the unknown model parameters can be estimated. Data-driven approaches, on the other hand, have the ability to determine which model inputs are critical, so there is less need for “...a priori rationalization about relationships between variables...” However, presenting a large number of inputs to ANN models and relying on the network to determine the critical model inputs usually increases network size. This has a number of disadvantages, such as decreasing processing speed; increasing the amount of data required to estimate the connection weights efficiently, and degrading the AAN performance. This is particularly true for complex problems, where the number of potential inputs is large and where no a priori knowledge is available to suggest which inputs to include.

1. Input Vector

According the vibration theory mentioned before and after an assiduous study of the entire data recorded by the IMD-HUMS, the variables which potentially could form a well-informed input vector—all of them or subsets—to the training process of this research model, are shown and briefly explained below (Goodrich Corporation, 1998):

Torque

Torque is a measure of how much force acting on an object causes that

object to rotate.

SO_1 (Shaft Order 1)

SO_1 is the once-per-revolution energy in the signal average and is used

to detect shaft imbalance.

SO_2 (Shaft Order 2)

SO_2 is the twice-per-revolution energy in the signal average and is used

to detect shaft misalignment.

d. SO_3

SO_3 is the thrice-per-revolution energy in the signal average and is used

to detect shaft disparity.

Signal Average RMS

Frequencies that are integer multiples of the basic shaft frequency will be

enhanced by the averaging process and other frequencies will be relatively attenuated. Depending on the mechanical environment, the signal average generally requires about 100 revolutions to converge to a usable waveform.

Residual Kurtosis

The residual analysis first removes all the strong tones from the signal

average to produce a residual signal so as to minimize the interference of these strong tones. Here the process handles kurtosis, which measures the thickness of the tails of the distribution of bearing vibrations after the background signal has been removed (Harris, 2002).

Residual RMS

The residual process deals with the Root Mean Square, which is the

overall energy level of the vibration data.

Side Band Modulation_1

This analysis is designed to reveal any sideband activities that may be the

result of certain gear faults, such as eccentricity, misalignment, or looseness. The indicator characterizes the degree of sideband modulation for the first sideband.

Gear Distributed Fault

This attribute is an effective detector for distributed gear faults, like wear

and multiple tooth cracks. It is a dimensionless measurement calculated from the ratio of explained and unexplained variances of a vibration generated at the meshing of gears.

G2_1

G2_1 is an algorithm developed by Goodrich Corporation to compute the

ratio of the signal average peak to peak and the gear meshing energies.

Residual Peak to Peak

The residual process deals with the algebraic difference between the

extremes of the vibration quantity.

l. Gear Misalignment_1

Gear Misalignment_1 is a dimensionless measurement resulting from the

ratio of the energies of the vibrations produced when gears mesh (Harris, 2002).

Ball Energy

Ball Energy is the total energy associated with the bearing ball spin defect

frequency and its harmonics.

Cage Energy

Cage Energy is the total energy associated with the bearing cage defect

frequency and its harmonics. Usually it is detectable only at the later stage of a bearing failure, but some studies show that this indicator may increase before the others.

Inner Race Energy

Inner Race Energy is the total energy associated with the bearing inner

race defect frequency and its harmonics.

Outer Race Energy

Outer Race Energy is the total energy associated with the bearing outer

race defect frequency and its harmonics.

Envelope RMS

The main purpose of envelope analysis is to sum and normalize the six

multiples of frequencies above or below the Root Mean Square value of the vibration.

Table 1 enumerates the above potential predictors for later reference purposes.

Predictors

1 Torque 2

SO1

G2_1

Gear Misalignment_1 Ball Energy Cage Energy Inner Race Energy Inner Race Energy Envelope RMS

11Residual Peak to Peak

3 SO2 124 SO3 135 6 7 9

Signal Average RMS Residual Kurtosis Residual RMS Gear Distributed Fault

141516

8 Side Band Modulation_117

Table 1.

Potential Model Predictors

2. Output Vector

In this research’s models, the output vector consists only of one variable, which is the operating condition of the generator. Each generator is assigned a binary value of 1 or 0 to classify its known state. The value of 1 corresponds to generators removed for fault while the value of 0 corresponds to good generators. The fact that each generator is assigned a state of 0 (good) or 1 (bad) does not mean these generators are actually in the assigned state. The given state of 0 (good) or 1 (bad) is based only upon whether a generator was removed for fault or not, according the maintenance records. A generator with a hidden fault would be assigned a state of 0 (good). Similarly, a generator which was taken out for a malfunction and given a state of 1 (bad) could have been mechanically good (Willard, L., Klesch, G., 2005). Unlikely the previous work of Willard and Klesch (2005), the entire history of “good” and “bad” generators is used here. Thus most “bad” generators should have an initial period of “good” following by “bad” as they fail.

The following table shows the generators that were confirmed as bad, either after maintenance or based on IMD-HUMS data, and assigned the value of 1 (Table 2). The failure of two of the generators, numbers 9 and 33, were detected during operation by a generator warning light. Faults in the remaining four generators, numbers 22, 31, 53, and 56, did not trigger the generator warning light. However each of the four generators had unusually high SO1 readings upon removal. Three of these generators, numbers 22, 31, and 53, showed evidence of fault or wear. The removal of generator number 56 resulted from the case of an identifiable buzz (Willard, L., Klesch, G., 2005).

Bad Generators—Reasons for Replacement

Generator Comments 9 22 31

Generator failed during shutdown upon APU generator coming on during start. SO1 near 2 ips. After Spline Adapter Coupler replacement, SO1 returned to 0.05 ips. SO1 at 3 ips. After Spline Adapter Coupler replacement, SO1 remained high and so generator was replaced

33 Generator bad 53 56 Note:

SO1 at 3 ips. After Spline Adapter Coupler replacement SO1 returned below 0.05 ips. SO1 over 4 ips. Generator and Spline Adapter Coupler were replaced.

Replacement of generators 9 & 22 were made due to maintenance records while the rest due

to Johnny Wright and Ground Station Team, IMD-HUMS Fault Detections, Goodrich Corporation. Draft 5/25/2005 (Ver 117)

Table 2.

Bad Generators—Reasons for Replacement

DATA PREPROCESSING

Data preprocessing is frequently used to analyze and transform the input and output variables to minimize noise, emphasize essential relationships, identify trends, and flatten the distribution of the variables to aid the ANN in learning the relevant patterns. Since ANNs are pattern matchers, the representation of the data is important in designing a successful network. In most datasets there is a large variability in the scale of range fields. To balance this effect of scale, range fields are transformed so that they all have the same scale. In Clementine, range fields are rescaled to have values between 0 and 1. The transformation used is:

xi'=

xi−xmin

(2.18)

xmax−xmin

where xi' is the rescaled value of input field x for record i, xiis the original value of x for record i, xmin is the minimum value of x for all records, and xmax is the maximum value of x for all records.

An additional problem for network representation of the utilized database was that, out of 36,742 total observations, only 1,477 cases referred to the bad generators. This fact directly affects the learning procedure of the network by creating a tendency to predict only good generators. For example, Table 3 gives results for a AAN using the

default settings in Clementine where the classification is perfect for “good” generators while the classification for the “bad” generators is that actually they are “good” too. Changing the ANN architecture consistently gives similar results; most observations are predicted as “good.” To correct this situation, the records of bad generators were replicated so that the ratio of good to bad generators was close to 1. We can consider this scheme as a weighting technique to emphasize the input vector of the bad generators and, concurrently, the information they might convey. The multiplication factors differed for each generator. They chose so that each bad generator had about the same number of total observations and so that the number of bad generators was about equal to the number of good generators. Table 4 describes the latter procedure of data manipulation, and Figure 16 presents the setup of Clementine to create the new normalized database.

ORIGINAL

SET

ALL DATA W/O GEN

TRAINING ACCURACY

PREDICTION ACCURACY BAD GOOD 9 98.479 % 22 98.173 % 36742 observations

31 97.111 % 0 %

33 97.657 % 53 97.151 % 56 96.928 %

Table 3.

Training Set using only Original Data

100 %

Generator ObservationsFactorResultTotal Ratio 9 435 13 5655 22 336 16 5376 Bad Gen 31 302 18 5436 32737

33 245 22 5390 53 61 55 5390 56 98 90 5490 0.93

Good Gen All 35265Data Multiplication of Bad Observations

Table 4.

Figure 16. Clementine Preprocessing Data

E. DATA SETS

In ANNs, it is a common practice to partition the database into three separate sets called the training, test, and validation sets.

1. Training Set

The training set is the largest set and is used by the ANN to learn the patterns that exist in the data. The training set used for this research consists of the observations of five (5) bad and fifty-six (56) good generators, for a total of about 60,000 records.

2. Test Sets

The test sets, varying in size from 10% to 30% of the training set, are used to evaluate the ability of a trained ANN to generalize to a new set of data. The researcher fits the parameters and the topology of the network that achieves the best results on a test set. Using the features of the software, the size of a test set was always taken to be at 25% of the training set. Periodically during the process of training, new test sets are selected from the training set. Although this sample was selected randomly from the training set, the same seed of (12345) was used, in order to duplicate results for different model fits. Figure 17 shows the format chosen for the training and testing sets.

Figure 17. Model Architecture GUI

For the purposes of training, each record is considered as an observation. Thus it is likely that the test set contains records from each of the 61 generators.

3. Validation Set

The validation set is used last as a final check on the performance of the trained network. The size of the validation set should keep a balance between obtaining a sufficient sample size to evaluate a trained network and having enough remaining observations for both training and testing. The validation set consists of one (1) bad and four (4) good generators. The records from these five generators are not used in training the network. They are only used for validation. F.

NETWORK ARCHITECTURE AND EVALUATION CRITERIA

As a baseline, this analysis uses a model architecture consisting of one input layer containing the predictors for the model, one output layer giving the estimate of the probability that the record is bad, and one hidden layer. Clementine provides six training methods for building ANN models but for this research utilized only the following two: (Figure 17)

•

Prune: This method starts with a large network and removes (prunes) the

weakest units in the hidden and input layers as training proceeds. This method is usually slow, but it often yields better results than other methods.

RBFN: The radial basis function network uses a technique similar to k-means clustering to partition the data based on values of the target field.

•

Clementine incorporates several features to avoid some of the common pitfalls of ANNs, including sensitivity analysis, network accuracy, and feedback graph. With these options selected, a sensitivity analysis will provide information on which input fields are most important in predicting the output field, network accuracy will provide the percentage of records for which the prediction of the model matches the observed value in the data, and the feedback graph will depict the accuracy of the network over time as it learns (Clementine 10.0 Node Reference, 1999). Moreover, a Confidence level ($N-Binary) is provided for each observation after the prediction, which, in this model with a flag (0 or 1) output, is calculated by using the formula:

$N-Binary=2|0.5−RawOutput | (2.19)

where the RawOutput is the output unit value scaled so that it is between o and 1.

If the output unit value is below 0.5, the observation is predicted as 0 (false), and if it is 0.5 or above, the observation is predicted as 1 (true). For example, if the ANN prediction value is 0.72, the prediction is displayed as “true,” and the confidence will be

20.5−0.72=0.44. A part of some output results is presented in Figure 18.

Figure 18. Clementine Prediction Table

IV. RESULTS AND DISCUSSION

MODEL WITH ALL PREDICTORS

The first attempt of this experiment was to predict the state of one (1) bad and four (4) good generators. The good generators were chosen randomly and form a constant group for the rest of the experiment. The bad generators were sequentially chosen, one at a time, in order to observe how the model reacts and how this might affect the learning procedure. Initially, all the predictors were used to form the input vector for the training procedure of the network (Table 1).

Appendix B provides the structure of the models in Tables 4, 5 and 6 along with their sensitivity tables. The training sets used for those models consist of the observations of five (5) bad and fifty-six (56) good generators. The size of the test sets was always taken to be at 25% of the training set while the validation set consists of one (1) bad and four (4) good generators.

It is obvious from Table 4 that all the models behave well through the learning process and their training accuracy, computed on the test set taken from the training set, is very high (above 99%). Moreover, the prediction accuracy of the models, for the group of the good generators in the validation set, is also high (above 86%). On the other hand, it is noticeable that four (4) models could not predict the bad generator in the validation set at all, or to be more specific, they predicted that the generator was good. These models are even unable to predict generators 9 and 33, which are certainly bad because they had been replaced due to total failure and not preventively. The most likely explanation is that the patterns for the bad generators are too close to those to the good ones. It is also likely that the patterns of predictors for bad generators differ among the bad generators. There are a variety of potential causes for these differences. It is plausible that variability among aircrafts, among generators, or even among placement of IMD-HUMS acidometers could cause differences in readings from generator to generator. We also expect that bad generators with different failure causes will have different vibration patterns. Projection pursuit implemented by the statistical software Ggobi, used to gain a

visual perspective of the relationship among the variables. Ggobi plots two-dimensional projections of multi-dimensional data and Figure 19 depicts some of the data leading to the assumption of a unique failure type for each “bad” generator.

Figure 19. Ggobi screen for “bad” generators

Attempting to explore and understand further the models’ behavior, we tested five (5) new models to predict the condition of the two generators, 31 and 53, which were well predicted from models 3 and 5 of Table 5. Each model includes the whole database

excluding those data of the validation set which are the “bad” generators in question and the “good” generators 4, 26, 42 and 66. For these models, we exclude from the training set the data of the pair referenced in table.

The prediction accuracy for generator 53 drops from 78.69 % in the single model 5, to a range of 77.05 % - 59.02 % in the paired models. This seems to indicate that all the bad generators contribute to the prediction (Table 6).

In contrast, for generator 31, the accuracy of models 8 and 9 drops less than 10% and for model 10 is almost 0%. This might indicate that generator 31 is more closely related to those generators that form the corresponding pairs (Table 7). Once again, we notice that the previous assumptions of close bad and good generator patterns and the possibility of unique reasons for generator failures are valid. Additionally, we observe that any change made on the models didn’t affect the accuracy of good generators.

MODEL VALIDATION SET TRAINING ACCURACY (%) PREDICTION ACCURACY (%)

BAD 9 4 90.73 1

GOOD

42 100 66

BAD 22 100 0

99.913 26 96.85 4 100 2

GOOD

42 100 66

BAD 31 100 .74

99.721 26 95.40 4 100 3

GOOD

42 100 66

BAD 33 100 0

99.708 26 95.66 4 100 4

GOOD

42 100 66

BAD 53 100 78.69

99.821 26 99.74 4 100 5

GOOD

42 100 66

BAD 56 100 0

99.448 26 99.77 4 100 6

GOOD

42 100 66

100

99.376 26 86.20

Table 5.

Models Predicting Single Generator

MODEL VALIDATION SET TRAINING ACCURACY (%) FIRST PREDICTION (%) SECOND PREDICTION (%)

53 78.69 77.05 BAD

9 99.31 0 4 100 100 .976

26 97.77 100 GOOD

42 100 100 66

100 100 7

53 78.69 72.13 BAD

22 100 0 4 100 100 99.707

26 97.77 99.47 GOOD

42 100 100 66

100 100 8

53 78.69 59.02 BAD

31 100 0.33 4 100 100 99.478

26 97.77 95.66 GOOD

42 100 100 66

100 100 9

53 78.69 78.69 BAD

33 99.18 0 4 100 100 99.728

26 97.77 99.87 GOOD

42 100 100 66

100 100 10

53 78.69 70.49 BAD

56 100 6.12 4 100 100 99.0

26 97.77 98.69 GOOD

42 100 100 66

100 100 11

Table 6.

Models Predicting Pair of Generators Including 53

MODEL VALIDATION SET TRAINING ACCURACY (%) FIRST PREDICTION (%) SECOND PREDICTION (%)

31 .74 97.68 BAD

9 99.54 0 4 100 100 99.830

26 95.66 96.85 GOOD

42 100 100 66

100 100 12

31 .74 7.62 BAD

22 100 0 4 100 100 99.795

26 95.66 96.98 GOOD

42 100 100 66

100 100 13

31 .74 1.99 BAD

33 100 0 4 100 100 99.780

26 95.66 99.87 GOOD

42 100 10 66

100 100 14

31 .74 0.33 BAD

53 100 59.02 4 100 100 99.478

26 95.66 95.66 GOOD

42 100 100 66

100 100 15

31 .74 82.12 BAD

56 100 24.49 4 100 100 99.509

26 95.66 98.29 GOOD

42 100 100 66

100 100 16

Table 7.

Models Predicting Pair of Generators Including 31

B. ARTIFICIAL TRAINING SETS

Leaving out certain bad generators degrades the performance of the network when fitting the ANN. If including patterns of predictors for those bad generators is important for classification, then there may be other patterns of predictors for bad generators that have not yet been observed and are not included in the data set.

In this section an attempt is made to predict patterns of bad generators not included in the training set. First assume that a “bad” generator is any pattern which is not like the “good” generators in the data set. With the large number of good generators in the data set and the fact that the predictors of the good generators seem to clump together in two dimensions, this seems like a reasonable assumption. The ANN classifies as “bad” any pattern which is not like the good generators of the training set. To accomplish this, artificial records of “bad” generators are constructed and included in the training set. These artificial records are constructed by simulating values of their prediction variables using uniform distributions with lower and higher limits taken to be respectively the minimum and maximum value for that predicted variable. Table 8 gives these values for each predictor.

Uniform random numbers were generated in Excel. With Excel’s limited memory, only 65,000 uniform random numbers could be generated at one time. With seventeen (17) predictors, limiting the number of these artificial records to 65,000 makes for a sparse set of “not good” records.

Table 8.

Summary Statistics for each Predictor variable including the Minimum,

Maximum, Average and Standard Deviation.

The following table (Table 9) presents the results of forecasting generator 9 using the above idea of training a uniformly distributed artificial set, combined along with the rest of generators regardless of their condition. We can see that propagating the artificial training set, even up to 1.3 million (1.3M) observations, the model still has no ability to predict the real condition of the bad generator in question. It does while still having excellent prediction accuracy on the group of good generators. The model starts to have encouraging prediction accuracy (11.72 %) after the artificial set reaches 2.6M observations but, at this point, it becomes less effective on the group of good generators. As the artificial set reaches a size of 2.75 M observations, it has excellent prediction accuracy (100 %) on the generator in question but now the predictions for the good group become very poor. This suggests once again that the patterns of good and bad generators are closely related. We can also conjecture that, because of the large number of predictor variables, we are seeing the so called “curse of dimensionality.” This means that the

complexity of the model grows exponentially with the dimension, rapidly outstripping the computational and memory storage capabilities of computers. For this data training more than 2.75M observations become infeasible.

The same method is applied to generator 33. The results are analogous and summarized in Table 10, while the structures of the networks are provided in Appendix C.

ARTIFICIAL

SET

ALL DATA W/O GEN

MODEL ACCURACY

PREDICTION ACCURACY BAD GOOD 65,000 93.676 % 0% Excellent 650,000 98.140 % 0% 1.30 M 2.60 M 2.75 M

Table 9.

ARTIFICIAL

SET

ALL DATA W/O GEN

MODEL ACCURACY

PREDICTION ACCURACY FOR BAD

FOR GOOD

Artificial Training Set to Predict Gen 9 9

99.101 % 99.137 %

0% 11.72%

Average

99.243 100% Poor 65,000 91.983% 0% 650,000 98.053 % 0% 1.30 M 2.60 M 2.75 M

Table 10. Artificial Training Set to Predict Gen 33

98.915 % 99.209 % 99.537 %

0% 0.41% 0.40%

Excellent

C. STEPWISE PREDICTORS USAGE

A simpler, but sometimes very effective, way of dealing with high-dimensional data is to reduce the number of dimensions. At this phase of the research we pursued a stepwise approach, similar to that of forward selection used for regression models. First were trained separate networks for each input variable. The network achieving the best training accuracy is then preserved. The effect of adding each of the rest of the inputs to this model in sequence is evaluated. This procedure is repeated for one, two, three, etc., predictors until the addition of extra predictors does not result in a significant improvement in model performance. Like any process, this approach has several disadvantages. The biggest are that it requires substantial computation and that it is also unable to capture the importance of certain combinations of predictors that might be insignificant on their own.

This phase of the research was the most time-consuming and many single and combined predictors were evaluated, aiming to find the model that better classified all the generators of the validation set. In the following table (Table 11), for space and time saving reasons, we present only one model as representative of those with the best ability to classify correctly.

PREDICTION ACCURACY

MODEL TO Predict Gen9 Predict Gen22 Predict Gen31 Predict Gen33

PREDICTORS

TRAINING ACCURACY

BAD

GOOD (4,26,42,66) > 85 %

ResidualRMS SideBandMod1 ResidualPeakToPeak

BallEnergy InnerRaceEnergy

86.029 % 39.08%

84.036 % 26.49 % > 83 %

87.345 % 5.36 % > 72 %

85.377 % 40.82 % > 90 %

PREDICTION ACCURACY

MODEL TO Predict Gen53 Predict Gen56

PREDICTORS

TRAINING ACCURACY

BAD

GOOD (4,26,42,66) > 73 %

87.321 % 36.07 %

86.386% 2.04 % > 69 %

Table 11. Stepwise Good Generated Model

This last phase comes to verify that no matter which strategy was applied, the patterns of good and bad generators are so closely related that no model of the structure that we trained could distinguish the different pre-assigned condition of those generators. The time that the one model was a good classifier for the bad generator, it failed to predict the good ones and via versa; if the model performed well on the good generators, it failed to recognize the bad ones. The model summarized in Table 11 achieved the best overall performance for the database of this research. Even though the overall prediction accuracy appears low for the records of “bad” generators, the pattern of predicted

ˆ, as a function of time is very different for “bad” probability that a record is “bad”p

ˆ generators than for “good” generators. As an example, Figures 20 and 21 give plots of p

versus time for a “bad” and “good” generator respectively. The “good” generator has

ˆ close to zero, where as the “bad” generator pˆ is vary considerably and show value of p

ˆversus time for the rest of the an increasing trend in time. Appendix D contains plots of p

generators in the validation set and sensitivity analysis of this model.

ˆ from “bad” Generator 9 Figure 20. p

ˆ from “good” Generator 66 Figure 21. p

V. CONCLUSIONS AND RECOMMENDATIONS

The emphasis in this thesis was to develop an ANN that would utilize the collected data from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover patterns that would predict a potential failure of a UH-60L helicopter generator. Many different ANNs were evaluated for their success rate on this faulting diagnosis.

The first models that this research shaped were trained, tested, and evaluated using only the data collected by IMD-HUMS. The whole database was normalized and populated accordingly. One (1) bad and four (4) good generators, which were left out of the training and test sets, were used for validation purposes for those models. The method was applied sequentially to each bad generator, always maintaining the same group of good generators. Two of the six models were considered as good classifiers: with accuracy above 78.69% and .74%, respectively. The other four models had zero ability to classify the bad generators, although they all predicted very well the group of good generators.

The next phase of the experiment was to generate a uniformly distributed artificial data set, using it along with the original database to form a training/testing set for further classifications. Artificial sets up to 2.6M were created in an effort to capture the bad generator patterns. Because the learning phase was time -and resource- consuming, the researcher was limited to testing two generators, the ones originally replaced for failure (9 and 33). One model started to show some promising results by the time the researcher reached the computer’s limits. Obviously, the “curse of dimensionality” comes into play at this point of this research.

During the last portion of this experiment, the researcher tried to deal with the multi-dimensional problem and, at the same time, shape a model with good generalization behavior. A stepwise strategy was followed and many models were trained with various combinations of predictors. Once again, the produced classifiers were not generally successful in generator prediction.

Concluding, the researcher realized that in each phase of this experiment, the bad and good generators patterns are very closely related. This affects the learning procedure

of the network by blocking its ability to build a model capable of classifying the good and bad generators concurrently. Additionally, it is possible that the reason of failure for each bad generator is unique, so that the size of this database and the structure of those models are not capable of capturing those patterns in a generalized form. On the other hand, this research exploits many paths, identifies various issues about the classification of the UH-60L helicopter generators, and finally comes up with models capable of classifying a big portion of the database in question.

ANNs are a category of artificial intelligence technology that mimics the human brain’s skill at identifying patterns. In theory, ANNs are capable of approximating any continuous function. Such flexibility makes for a potentially powerful forecasting tool but the big number of parameters that must be selected makes the design process difficult. In practice, building an ANN forecasting model involves a lot of trial and error. Consequently, the objective of this thesis was to provide a practical, non-technical introduction to structure an ANN forecasting model using real operating data of UH-60L helicopters. The success of ANN applications for an individual researcher depends on three key factors. First, the researcher must have the time, patience, and resources to experiment. Second, the ANN software must allow automated routines, such as walk-forward testing, optimization of hidden neurons, and testing of input variable combinations—either through direct programming or the use of batch/script files. Third, the researcher must maintain a good set of records that lists all parameters for each network tested.

Further future work can be conducted on building models with more than one hidden layer which can explain more complex data, but which requires experience in

manipulating the parameters of the utilized software and good a priori knowledge of the database itself. Furthermore, enhancing the current database with new real data, accompanied with proper maintenance records, will benefit and improve any future effort on predicting the generators’ conditions, regardless of the techniques that will be used by the researcher.

THIS PAGE INTENTIONALLY LEFT BLANK

APPENDIX A

TAIL #GENERATOR OBSERVATIONS FINAL CONDITION 9126351 92232 92235 92238 92239 92241 92243 92246

1 526 35 526 2 980 36 980 3 313 37 313 4 439 38 439 5 282 39 282 6 651 40 651 7 476 41 476 8 7 42 7 9

92250

435

Good

Bad

10 488 43 923 11 825 44 825 12 545 45 545 13 917 46 917 14 617 47 617 15 487 48 487 16 873 49 873 92253 92255 93277 93285 93293

Good

9326500

TAIL #GENERATOROBSERVATIONSFINAL CONDITION 9326506

17 58850 58326507

18 42351 4239326509

19 552 520 326515

Bad Good

54 5879326516

21 55355 55322

9326518

336

Bad

Good Bad Good

23 12 56

57 2509326519

24 62158 6219326524

25 45759 4579426530

26 76160 7619426533

27 107761 10779426534

28 29862 29426537

29 110263 11029426545

30 254 25431

9426549

302

Bad Good Bad Good Good

32 28765 533

245

9926829

34 1166 256

APPENDIX B

Figure 22. Model 1 (Predict Bad 09 and Good 4, 26, 42, 66)

Figure 23. Model 2 (Predict Bad 22 and Good 4, 26, 42, 66)

Figure 24. Model 3 (Predict Bad 31 and Good 4, 26, 42, 66)

Figure 25. Model 4 (Predict Bad 33 and Good 4, 26, 42, 66)

Figure 26. Model 5 (Predict Bad 53 and Good 4, 26, 42, 66)

Figure 27. Model 6 (Predict Bad 56 and Good 4, 26, 42, 66)

Figure 28. Model 7 (Predict Bad 53, 9 and Good 4, 26, 42, 66)

Figure 29. Model 8 (Predict Bad 53, 22 and Good 4, 26, 42, 66)

Figure 30. Model 9 (Predict Bad 53, 31 and Good 4, 26, 42, 66)

Figure 31. Model 10 (Predict Bad 53, 33 and Good 4, 26, 42, 66)

Figure 32. Model 11 (Predict Bad 53, 56 and Good 4, 26, 42, 66)

Figure 33. Model 12 (Predict Bad 31, 9 and Good 4 ,26, 42, 66)

Figure 34. Model 13 (Predict Bad 31, 22 and Good 4, 26, 42, 66)

Figure 35. Model 14 (Predict Bad 31, 33 and Good 4, 26, 42, 66)

Figure 36. Model 15 (Predict Bad 31, 53 and Good 4, 26, 42, 66)

Figure 37. Model 16 (Predict Bad 31, 56 and Good 4, 26, 42, 66)

APPENDIX C

Figure 38. Model 17 (Predict Bad 9 Using Artificial Sets)

Figure 39. Model 18 (Predict Bad 33 Using Artificial Sets)

APPENDIX D

Figure 40. Stepwise Model Using 5 Predictors (Predict Bad 9)

Figure 41. Stepwise Model Using 5 Predictors (Predict Bad 22)

Figure 42. Stepwise Model Using 5 Predictors (Predict Bad 31)

Figure 43. Stepwise Model Using 5 Predictors (Predict Bad 33)

Figure 44. Stepwise Model Using 5 Predictors (Predict Bad 53)

Figure 45. Stepwise Model Using 5 Predictors (Predict Bad 56)

THIS PAGE INTENTIONALLY LEFT BLANK

LIST OF REFERENCES

Bechhoefer E., Bernhard A., Use of Non-Gaussian Distribution for Analysis of Shaft Components, IEEE, 2005.

Bechhoefer E., Mayhew E., Mechanical Diagnostics Systems Engineering in IMD-HUMS, IEEE, 2005.

Bechhoefer E., Power D., IMD HUMS Rotor Track and Balance Techniques, IEEE, 2002.

Bishop, C., NNs for Pattern Recognition, Clarendon Press, 1995.

Buttrey S., Koyak R., Read R., Whitaker L., Prognostics of Complex Rotating Machinery Statistical Concepts and Capabilities of the NPS Team, Naval Postgraduate School, 2003.

Clementine 10.0 User’s Guide, SSPS, Integral Solutions, 2005. Davis A., Handbook of Condition Monitoring, Chapman & Hall, 1998.

Duda R., Hart P., Stork D., Pattern Classification, Second Edition, John Wiley & Sons, 2000.

Elyurek M., Establishing a Vibration Threshold Value, Which Ensures a Negligible False Alarm Rate for Each Gear in CH-53 Aircraft Using the Operational Data, Master’s Thesis, Department of Operations Analysis, Naval Postgraduate School, 2003. Fausett L., Fundamentals of NNs, Prentice Hall, 1994.

GGobi Data Visualization System, http://www.ggobi.org, 10 August 2006.

Harris C., Harris' Shock and Vibration Handbook, Fifth Edition, McGraw-Hill, 2002. Kartalopoulos S., Understanding NNs And Fuzzy Logic, IEEE Press, 1996.

Lawrence J., Introduction to Neural Networks, Fifth Edition, California Scientific, 1993. Mark, W.D., “Gear Noise Origins,” AGARD Conference Proceedings, 1998. Norgaardm M., Ravn O., Poulsen N., Hansen L., NNs for Modeling and Control of Dynamic Systems, Springer, 2000.

Rao S., Mechanical Vibrations, Fourth Edition, Pearson Prentice Hall, 2004. Ripley B., Pattern Recognition and NNs, Cambridge University Press, 1996.

Simpson P., NN Applications, IEEE Technical Activities Board, 1996. Sobajic D., NN Computing for the Electric Power Industry, Lawrence Erlbaun Associates, 1993.

Software Requirements Specifications, Goodrich Corporation, 1998.

System Users Manual for Integrated Mechanical Diagnostics Health and Usage Management System (IMD-HUMS), U.S. Army UH-60A/L, February 2005. Technical Manual 1-1520-237-10, Operators Manual for UH-60L Helicopter, Headquarters, Department of the Army, May 2003.

Tsang, A., Condition-Based Maintenance: Tools and Decision Making, Journal of Quality in Maintenance Engineering, 1995.

Willard L., Klesch G., Using Integrated Mechanical Diagnostics Health And Usage Management System (Imd-Hums) Data To Predict Uh-60l Electrical Generator

Condition, Master’s Thesis, Department of Operations Analysis, Naval Postgraduate School, 2005.

INITIAL DISTRIBUTION LIST

1. Defense Technical Information Center Ft. Belvoir, Virginia

2. Dudley Knox Library Naval Postgraduate School Monterey, California 3. Professor Lyn R. Whitaker Department of Operations Research Naval Postgraduate School Monterey, California

4. Professor Samuel E. Buttrey Department of Operations Research Naval Postgraduate School Monterey, California 5. Maj Tourvalis Evangelos Greece, Volos

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文