ADD
From MAGGIE
Contents |
[edit] Network Anomaly Detection and Diagnosis - Noman Latif
[edit] Problem Statement
The performance of the network depends on its bandwidth/ throughput, non-availability, network delay, loss of data packets, and network jitter leading to alteration in the order of arrival of data packets etc. Consequently performance of computer networks is regularly measured using various techniques/tools such as PathChirp, Pathload, Thruley (TCP), ABWE, TraceRoute. However such tools only measure the network performance metrics, but do not identify changes in their trends. This project aims to detect and report significant changes in the trends of observed network performance metrics by studying, designing and later implementing relevant algorithms. We intend to provide a platform for the automatic detection of an anomaly in the network and report the event to relevant entities for further investigation. We term this activity as Event Detection. We define Event Diagnosis as a technique that investigates the cause of performance change (i.e. the cause of the change). If time permits we shall extend the scope of our project to include event diagnosis.
[edit] Description
Quantifying the network performance and evaluating its behaviour is a non-trivial task; particularly identifying network anomalies and diagnosing their causes. Over the years tools such as PingER, IEPM and PerfSONAR have been collecting statistics reflecting network metrics; however these data archives are of no use unless useful inferences can me made from it.
The proposed project will intend to implement a collection of algorithms which parse data archives of (end-to-end) network performance statistics. And determine if anomalies in the patterns exist. This shall encompass study of time series data and trend analysis. Identifying these anomalies and forming patterns will help us in detecting and diagnosing anomalies at runtime.
Later, the project shall extend the implementation to include the diagnosis of the events identified. This shall include study and implementation of algorithms such as Principal Component Analysis (PCA) enabling us to know what caused the change in the pattern of network performance, whether it was a route change, congestion due to which latency increased, failure of a link or a node.
Performance measurement infrastructures developed by SLAC and NIIT are already in place which gather statistics using tools such pathChirp, pathload, Thrulay (TCP), ABWE, traceroute, iperf from various monitoring nodes. The project will be using the data collected by these services and design and implement algorithms to solve the problem of anomaly detection and diagnosis.
[edit] Analysis
There would be time based set of data gathered at various nodes i.e. historical data and coming data. A parallel set of implementation would be deployed at hundreds of links observing parameters. The data would be analyzed using some time series data algorithm to detect the change. The coming data would be compared with the history data, the change in trend or thresh hold value will trigger a event detected.
This process will first follow gathering of data and getting the required parameter of the data; Performing any changes if needed and then send the data to detection algorithm. As soon as an event is detected a notification would be send to Administrator or application would trigger a log. Once the detection is done next step would be to develop an event diagnosis algorithm, to diagnose where and why the problem has occurred.
[edit] System architecture
[edit] Event Detection
From this figure, we can see that there are three modules
1. Preprocessing 2. Detection Engine 3. Visualization
All of the modules have sub-modules. To detect an event the data we get has a lot of extra information so we have to do the following pre-processing steps
The main steps performed during this phase are:
1) Data Formatting 2) Data Extraction 3) Regularization(if required)
Detection module consisted of all the steps specified in the algorithm. The visualization module will highlight the results and which can be compared.
[edit] Progress
1. Study of plateau Algorithm (Completed)
2. Implementation of plateau Algorithm (Completed)
3. Analysis Algorithm results and Manual Results Using Iperf Data. (Current)

