RTPingERData

From MAGGIE

Jump to: navigation, search

Contents

[edit] Project Title

Real Time PingER Data

[edit] Project Aim

The aim of the Real-time display of PingER data project is to provide near real-time display of PingER data.

[edit] Motivation

Currently the PingER data is gathered fropm the monitoring sites on a daily basis by the archive site. This means that the data at the archive site (e.g. SLAC) is typically a day out of date. There are ways to look at the cached (non-gathered) data at the monitor sites, however this is not integrated with the main PingER analysis/visualization infrastructure. Accesss to real-time data at the central archive site will enable the detection of anomalous events more quickly. It will also enable visualization of the PingER data in close to real-time (e.g. within an hour of the measurements being made). This is important for providing information to manned Network Operations Centers (NOCs) such as ESnet's in Berkely California, Internet2's at Indiana University or NTC/PERN's in Pakistan.

[edit] Project Description

Currently the PingER archive site (pinger.slac.stanford.edu) uses a script called getdata.pl (written in perl) that is run by a Unix cronjob on a daily basis (around 1:00am Pacific Time) to gather (typically using an HTTP GET via Lynx) the previous day's (as defined at the monitoring host) data from the 30+ PingER monitoring sites. Getdata.pl corresponds with the ping_data.pl at the monitor site. There is also an interactive interface to ping_data_pl (see for example ping_data.pl). If a day is missed then the script is run manually to get the missing days data. This data is compressed and saved in flat files at the archive site. The filenames are of the form: /nfs/slac/g/net/pinger/pingerdata/hep/data/$mon_node/ping-$year-$mon-$mday.txt.gz, i.e. one file per monitoring node per day. Here $mon_node is the name of the monitoring host (e.g. pinger2.niit.edu.pk), $year is the 4 digit year (e.g. 2006), $mon is the month number (e.g. 3=March) and $monday is the day of the month. If there is already data for the monitor site and day in question then the existing file is replaced.

The initial goal is to automate the gathering of data at 30 minute intervals. Currently getdata.pl takes about 6 minutes to run and consumes about 12 seconds of user time and 2.5 seconds of system time (on a 2GHz Pentium 4 running Linux) to gather the data from all monitoring nodes. This elapsed time is mainly due to the network delays in getting the data. Initially the new data for today will simply replace preceding data for the same monitor node and date. A small part of the project will be to investigate whether it makes sense to provide a feature to gather only the new data, and if so implement it. A typical file transferred (e.g. for $mon_node="pcgiga.cern.ch") contains about 3 MBytes. Transfers take beween 1 and 40 seconds (with an average of 10 seconds) depending on the amount of data and the speed of the network between SLAC and the monitoring host. The longest time is for binp.inp.nsk.su, and pcgiga.cern.ch takes 20 seconds. Initially we will probably only gather the data on a 30 minute basis for a few major monitoring hosts.

The follow on goal is to include today's data in the analysis and visualization. Currently the gathered data is analyzed and saved by the analyze-hourly.pl perl script. This produces a table with a column for each hour of the day, and each row representing a monitor host/remote host pair. This table may be displayed via the web by means of the perl script pingtable.pl (see for example PingER Reports). The student will need to study analyze_hourly.pl and pingtable.pl and modify them to enable display of today's data via pingtable.pl.

Alternatively, rather than running it at each site, one uses the script (getdata.pl) to fetch the most recent data from the specified monitoring site. It will be based on ping_data_plot.pl with mods to enhance the form to allow selection of the monitoring site, easier ability to select monitoring node (show by alias name where country code comes first (to allow easier searching/grouping), selection of data format (to enable just getting the data as well as plotting). After selecting monitor site and remote site(s) from the form then ping_data_plot.pl (or a new test version) will be called using getdata.pl to get the data (initially at least just for one day (depending on time taken), default today), tell the user to be patient while getting the data, warn if site is not pingable, or there is no data etc. After getting the data, plot graphs of RTT & loss (probably overplotted as in the FNAL graphs to save screen space) for each selected pair (mon host remote host), also allow to choose to see freq distribution. We may add other metric choices besides RTT & Loss, e.g. IPDV, min RTT etc.

Follow on goals (maybe part of other projects) will be to:

  • provide more realtime vizualization of the data;
  • provide anomalous event detection of the PingER data (work is already in progress on anomalous event detection, the work here will be to extend/apply the techniques to the PingER data)

[edit] Requirements

  • The student will need to be or become proficient in Unix/Linux, perl, CGI scripts and Javascript.
  • The code will need to be production quality. Guidelines on how to write perl will be provided (see IEPM Perl Coding Style and Coding Style). Help on writing clean code is available here and here.
  • The student will need to apply for and get a Unix account at SLAC and will be provided access to the relevant computers, files and databases. (contact Umar Kalim for details).
Personal tools