Astronomy teaches us about the universe we live in. To date, there are three types of particles that travel from astrophysical sources (the stars) to the earth: photons or light, protons, and neutrinos. Our observations of the stars are limited to these particles. At high energy (>1 TeV), the universe is the most transparent to neutrinos. We study High Energy Neutrino Astronomy to view the most energetic particle production mechanisms in the universe. The Smoot Group at LBL is an active participant of Antarctic Muon and Neutrino Detector (AMANDA) collaboration.

DATA HANDLING AT AMANDA:
a new concept

AMANDA is an experiment under construction. The first phototube was placed in the south pole ice cap almost 10 years ago with a simple data acquisition (DAQ) which has evolved into the current DAQ which digitizes the signals from almost 300 phototubes. In the next 10 years, we hope to see ICE CUBE constructed with another 5000 phototubes. As the scale of the experiment changes, we need to plan a coherent data handling system,  in the same way as we plan our detector development. AMANDA software products must become a coherent system of maintainable products from the DAQ to the analysis graphical interface.

The table below contains the data sizes for the 1997 and 1998 data sets with projections for the next few years, out to 2005 when we expect the detector to be of the ICE CUBE scale. The raw data will be mined into managable sets of interesting events listed as GRB and up-going L2.

 data sets 1997 1998 1999 2000 2005 (ICE CUBE)
raw 0.5 Tbyte 1.4 Tbyte 2.3 Tbyte 3.5 Tbyte 20 Tbyte
calibration ? ? ? ?
GRB 10 Gbyte 28 Gbyte 46 Gbyte 71 Gbyte 400 Gbyte
up-going L1 100 Gbyte 280 Gbyte 460 Gbyte 710 Gbyte 4 Tbyte
up-going L2 10 Gbyte 28 Gbyte 46 Gbyte 71 Gbyte 400 Gbyte
 

In addition to disk space, at almost every step, the data has to be transfered to a disk cache for processing. High bandwidth links between the mass storage media and the processors, and ultimately high bandwith links between our collaborators and a central system are crucial to our efficient physics analysis.

Here is a list of the elements which contribute to a data handling system. More detail on these is given below.



A friendly analysis environment requires peripherals integrated into the above system.


Furthermore, a coherent design requires agreements with the collaborating institutions on

The AMANDA data handling system will be designed according to the scale of the raw data. By the 1997 season, 10 strings of ~50 phototubes were deployed in the south pole ice cap. Three additional strings with 60 phototubes were deployed for the 1998 season. Six more strings plus one digital string are scheduled for the 1999 season. The digital string will have dual readout for the analogue signal transported to the ice surface and for the digital signal. Therefore these strings will contribute to the raw data like two of the previous strings. The plan is to then begin deploying digital strings at a rate of 8 to 16 a year until the detector reaches a full kilometer cubed scale with 80 strings in about 2005. Below is a table of the size of the current data for the 1997 and 1998 seasons along with predictions for the years to come, scaled by the number of strings.

Also listed is the amount of data in two subset data streams. The GRB stream is selected from the raw data by taking all triggers within +- 15 minutes of a gamma ray burst. These occur about once a day. Given a 30 minute window, they select 2% of the raw data. The "up-going L1" stream is selected by requiring a reconstructed track with a zenith angle > 50 degrees. Since most of the data is the result of atmospheric muons traversing the detector from top to bottom, this filter selects about 20% of the data. The rate is adjustable by selecting higher zenith angles. We expect that the zenith angle distribution will give us our first signal of neutrinos from astrophysical sources between 80 and 110 degrees (ref), therefore leaving our "level 1" selection loose is probably desirable. The "up-going L2" stream is selected with a zenith angle>80 degrees and a minimum of 3 unscattered photons contributing to the track fit. These cuts are dialed up to make a small, well-constructed data sample for distribution.

Since AMANDA has a long history, a data handling system can not be retroactively fitted to the old data. It would be impossible, for example, to redesign the DAQ for the 1997 data. Our current approach is to start building a system for the mass storage, software trigger, data stream selection, and data distribution center and then negotiate cooperative efforts between the DAQ, calibration, and reconstruction. To build an integrated system, the AMANDA principle investigators will need to acknowledge the importance of such a system and provide the management structure for such a project. An integrated system is not built from one person's creativity, but from contributors with a unified concept. In my experience, the most successful experiments are those where software managers listen to new concepts and evaluate the impact on the project. New concepts are then either propagated into the unified concept or it is the manager's responsibility to critique the new concept and redirect the effort.

Given this current situation, it is appropriate that we outline the contribution that NERSC computing at LBL can make to the AMANDA/ICE CUBED software system.


These systems provide disk space, processing power, ethernet or internet connections and software products for a wide variety of scientific investigations including high energy and nuclear physics experiments.

MASS STORAGE

HPSS is a high performance mass storage facility operated by LBL's National Energy Research and Scientific Computing Center (NERSC). Its current capacity is 100 Tbytes. It services many scientific experiments. The 1997 data currently reside in 8,009 files. The 1998 data currently reside in 1,258 tar files, each containing about 25 compressed raw data files. Because HPSS is designed as a mass storage facility, high bandwidth access is available through HSI and PFTP to some of the LBL computer facilities. FTP is available for general data distribution. We currently use HSI to transfer data to our processing cluster, and FTP to transfer data to the local computer in the the physics department. It is generally agreed that the biggest limitation to computing today is not computer cycles or disk space, but connectivity. Our bandwidth is in general limited by the time required to transfer files. As new technologies become available, we hope to remain flexible enough to take advantage of them.

SOFTWARE TRIGGER and DATA STREAM SELECTION

Several types of software quantities are being investigated with the AMANDA data as trigger/selection criteria.

The data were collected with a hardware trigger consisting of a global coincidence. In 1997 the threshold was set to 16 PMT signals. In 1998 it was accidentally reduced to 12 PMT signals. In 1999 it will be set to 18 PMT signals. For ICE CUBE, a local coincidence trigger is planned.

The "up-going L1" stream is the result of running a "LEVEL 1 - up-going" trigger at LBL. For this, hits are calibrated, cleaned up and dead, noisy or funky PMTs are removed from the data. The Line fit is found, and events are passed if zenith angle >50 degrees. All events passing the selection criteria are reconstructed. The "up-going L2" stream is the result of selecting events with reconstructed trajectory zenith angle>80 and at least 3 direct hits. These cuts are dialed in to produce a manageable data sample for the average user's disk. Further cuts on the trajectory zenith angle, direct and indirect hits, and track length are left to be selected by the user.

A "GRB" stream is selected by replacing the "LEVEL 1 - up-going" trigger with a "LEVEL 1 - GRB" trigger. Gamma Ray Bursts are detected by a network of satellite born gamma ray observatories. AMANDA events that are within +- 15 minutes of the gps time of the burst are selected. Hits are cleaned up and dead, noisy or funky channels are removed from the data. The line fit is then found. The trajectory reconstruction is performed on all "GRB" candidates. Further cuts are left to be selected by the user.

In reality, we obtain more than two analyses from these two streams. From the up-going stream, we can count the number of up-going muons as a function of zenith angle. We can also measure the energy distribution of muons at various zenith angles. From the GRB analysis, we can count the number of muons coincident with the GRB. We can measure the time delay between the photon and neutrino signals, possibly placing limits on the mass of the muon neutrino or discovering how neutrinos are produced in relation to photons. We can make a sky map of the direction of the neutrino signals and compare it to the GRB sky map. We can measure the energy of the neutrinos which accompany GRBs. Finally, we can use the GRB sample to measure the zenith angle distribution of atmospheric muons to extend the "up-going" zenith angle into the "down-going" region. Measuring this distribution for high energy tracks may give us our first astrophysical neutrino signal.

Applying the LEVEL 1 trigger and reconstructing the data take a significant amount of CPU power. The LEVEL 1 trigger takes 10 msec/event on a 400Mhz processor. The reconstruction takes 200 msec/event. The raw 1997 data contains 1.4 billion events. To process all events by through the LEVEL 1 trigger takes 16 days on ten 400 Mhz processors. It takes 65 days on ten 400 Mhz processors to reconstruct the data passing the LEVEL 1 - up-going trigger.

We have several choices of parallel processing farms at LBL. Last spring the Cray T3E was used to filter the data with the Swedish filter. The disadvantage of the cray is that our code is not supported on this platform, including CERNLIB, so a full port was necessary. In addition, the operating system doesn't support pipes, which are a fundamental design feature of the AMANDA code, so significant modifications had to be made. Another option is the PDSF cluster. This cluster is designed to meet the needs of ATLAS and STAR. Since AMANDA is small by comparison, but will be growing over the next few years, this is an excellent option. If we become a big user of this system, we will need to allocate monetary resources for this facility. Our final option is the PC Cluster Project being studied by the future technologies group. This cluster contains 32 400Mhz machines with 2 dual processor front end machines. If we are willing to use the tools they are developing as research projects, we will be given 25% of their processing power.

DATA DISTRIBUTION

After making the data streams, distributing them to the collaboration as easily as possible is a high priority. For the near future it will be possible to put small samples of a few tens of Gbytes on disk and give public access to them through anonymous FTP. Even small samples, are too big for local disks in the long term. Staging the streams to disk, from hpss can be accomplished through a combination of Web tools and FTP "push" processes. At this point, we have to decide whether it is useful to add user defined filters (such as muff options) to the Web tools to make more specific, but much smaller samples to serve out. Most large collaborations have adopted a framework which allows users to define modules (click on "software" and "Modules" on 2nd line) for this purpose. The PDSF cluster has a well designed data distribution system based on this approach. An interesting possibility here at LBL, exists in the Clipper Project. This is a research project aimed at improving the quality of service for ESnet. As a "test case," we could be one of the first users of new high bandwidth links between LBL and Argon National Labs. This project is also planning to study a high bandwidth link to U. of Wisconsin and the California University system. With these being places of interest to the AMANDA collaboration (in addition to other normal bandwidth areas) we have a good match for a project to test their resources.

PRODUCTION SYSTEM

This outlines the resources available at in the LBL group for use by the AMANDA collaboration. Various combinations of these will be used to build a working system of data filtering, reconstruction, storage and distribution. We plan on spending the effort on each aspect to produce a working solution currently as well as a scalable product.

This page produced by Jodi Lamoureux
Feb 16, 1998

Send comments to:
JILamoureux@lbl.gov

Jodi's home page