Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems tesi di laurea magistrale Mining Invariant Relationships for Failure Analysis of Batch Software Systems Anno Accademico 2012/2013 relatori Ch.mo Prof. Stefano Russo Ch.mo Prof. Marcello Cinque correlatori Ch.mo Ing. Flavio Frattini Ch.mo Dr. Santonu Sarkar, Infosys Ltd., Bangalore, INDIA candidato Agostino Savignano Matr. M63000176 Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica This thesis is in part the result of a research work done during the InStep internship experience with Infosys Ltd Mining Invariant Relationships for Failure Analysis of Batch Software Systems multinational provider of business consulting, information technology, software engineering and outsourcing services. 160.000 employees, 8 billion revenues. Next Gen Computing Labs, within the Infosys campus of Bangalore, India. June-September 2013 Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Context A batch software system is a system where jobs run without end user interaction, or can be scheduled to run as resources permit. This class of systems shows a very repetitive behaviour. They are often used: to process large amount of transaction data for scientific computation as there may be failures in the execution of jobs, malfunctions have to be detected and actions have to be taken accordingly; manual detection is inefficient. A feasible approach to automatically and dynamically characterise the behaviour of a system, in order to detect execution problems, is represented by flow intensity invariants. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems A simple example of flow intensity invariant A platform for the batch processing of files: each file goes through a stage of elaboration. The processing time is proportional to the size Measuring the file size and the time spent in a stage, I(x) and I(y), (the flow intensities), the equation I ( y ) k I ( x) invariant relationship characterising the expected behaviour of the batch system. If there is an execution problem (e.g., file processing hangs) the equation does not hold any more. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Contribution An approach for the automatic discovery of a set of flow intensity invariant relationships in batch software systems, useful to characterise the expected behaviour of the system. A detection mechanism that helps the operations personnel in identifying deviations of the actual system behaviour from the correct expected one is also presented. The application of a tool implementing the approach, MinerVa, in two real world batch systems: A SaaS application by Infosys * SCoPE, the Supercomputer of the Federico II University * S. Sarkar, R. Ganesan, M. Cinque, F. Frattini, S. Russo, and A. Savignano, “Mining Invariants from SaaS Application Logs”, accepted for publication in the proceedings of the European Dependable Computing Conference, May 2014, Newcastle upon Tyne, UK Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Approach Source of knowledge of the system behaviour: event logs Objective automatically mine a set of invariant relationships between time series measurements from system logs (learning phase) use such relationships for detecting deviations of the system behaviour (detection phase) Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Approach Data Flow Learning phase Off-line Past observations RESULTS: a set of linear relationships, representing a model of the system Detection phase Either on-line or off-line Uses the model to predict the behaviour of a system on new observations Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Sampling: from Event Logs to Time Series PROBLEM How to retrieve time series from event logs SOLUTION With a certain time interval: retrieve the events in the application logs count the events, according to the different flow intensities the “sampling time” is determined using a tupling algorithm, applied on particular events, e.g., job arrivals or completions. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Mining: from Time Series to Invariants PROBLEM How to discover relationships between time series SOLUTION For each possible pair of flow intensities: Perform a linear regression using: AutoRegressive models with eXogenous inputs (ARX) Recursive Least Squares (RLS) algorithm Test the goodness of the regression: R2 coefficient of determination All the pairwise relationships which have a R2 value greater than a threshold are considered as linear : their regressed coefficient parameters are considered invariants Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Detection: from Invariants to Alerts PROBLEM How to detect a deviation from the expected behaviour SOLUTION Using the regressed coefficients to predict the expected behaviour For each regressed invariant relationship: compute the expected output compare the expected output with the actual output If the difference between the expected and the actual output is greater than an adaptive threshold, raise an alert. prediction interval based threshold Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems SaaS CPG application by Infosys A platform provided as a service by means of virtualized machines that: Receives transaction details data daily from several subscribers all around the world. Stores the data and performs analysis on the stored data. The work items consisting of several records are provided as a file in a specific format and with different frequencies by the various customers. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems SaaS CPG application by Infosys: Mining 9 months of observations Tupling on job completions Sampling time: 10 min Identified flow intensities: 33 108.000 workitems Linear regressions performed: 528 Discovered invariant relationships: 11 Time spent in post-processing validation - Time spent in archiving (i1) Time spent in transaction validation - # rows completed (i4) # files arrived - # files completed (i7) # files started - # files completed (i8) # files arrived - # files started (i9) # rows started - # rows completed(i10) ... Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems SaaS CPG application: Broken Invariants Month M1 M2 M4 Samples 4320 4464 4598 Threshold … i1 i7 i8 … i9 10% 4035 … 3865 3754 3182 … p.i. 5 … 14 13 4 … 10% 4204 … 3988 3903 3050 … p.i. 2 … 17 18 6 … 10% 4306 … 4208 4106 3422 … p.i. 0 … 8 12 4 ... Detected broken invariants with p.i. threshold correspond or include the SLA violations discussed with the operations personnel. Maintenance staff confirmed the possibility of violations for invariants i1-i11. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems SCoPE: Mining 8 months of observations Tupling on job arrivals Sampling time: 60 s Regression performed on a subset of node processing the same kind of jobs Identified flow intensities: 15 1.104.139 jobs Linear regressions performed: 105 Discovered invariant relationships: 69 # jobs active - # jobs arrived (i5) WORKLOAD_VMEMORY - # jobs started (i13) # jobs active - # jobs started (i16) # jobs started - # jobs arrived (i19) virtual memory - # of cores (i40) CPU% - # jobs active (i48) memory - # jobs started (i56) memory - CPU% (i66) Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems SCoPE: Broken Invariants Month Samples i5 i13 i16 i19 i40 i48 i66 M1 44640 13 0 0 431 41443 3328 6442 M2 43200 3 0 0 555 42097 4026 6743 M3 44640 2 0 1 371 44172 2958 6800 i5, i13, i16 are almost never broken i21 is broken for more than the 50% of samples and i40 is broken for almost each samples i19, i36, i48, i56, i59, i63, i66 are broken for less than the 2% of samples. Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Mining Invariant Relationships for Failure Analysis of Batch Software Systems Conclusions and Future Work It is possible to automatically discover invariants in batch systems They are useful to the operation personnel to reduce the quantity of data to be analysed for understanding the system behaviour in case of execution problems In some cases, to detect the execution problem itself (SLA violations) Future avenues of research: Use logs from different layers Improve the detection mechanism: Different adaptive threshold Improve the mining process: Perform a continuous invariant mining: forgetting factor RLS algorithm. Invariant identification with Prediction Error Method instead of the Least Squared Method Model relationships between multiple time series instead of only pairwise relationships Special thanks to Infosys and Dr. Sarkar, project mentor of the internship experience
© Copyright 2024 ExpyDoc