JODI LAMOUREUX ON AMANDA

Astronomy teaches us about the universe we live in. To date, there are three types of particles that travel from astrophysical sources (the stars) to the earth: photons or light, protons, and neutrinos. Our observations of the stars are limited to these particles. At high energy (>1 TeV), the universe is the most transparent to neutrinos. We study High Energy Neutrino Astronomy to view the most energetic particle production mechanisms in the universe. The Smoot Group at LBL is an active participant of Antarctic Muon and Neutrino Detector (AMANDA) collaboration.

DATA HANDLING AT AMANDA:
a new concept
AMANDA is an experiment under construction. The first phototube was placed in the south pole ice cap almost 10 years ago with a simple data acquisition (DAQ) which has evolved into the current DAQ which digitizes the signals from almost 300 phototubes. In the next 10 years, we hope to see ICE CUBE constructed with another 5000 phototubes. As the scale of the experiment changes, we need to plan a coherent data handling system, in the same way as we plan our detector development. AMANDA software products must become a coherent system of maintainable products from the DAQ to the analysis graphical interface.
The table below contains the data sizes for the 1997 and 1998 data sets with projections for the next few years, out to 2005 when we expect the detector to be of the ICE CUBE scale. The raw data will be mined into managable sets of interesting events listed as GRB and up-going L2.

data sets 1997 1998 1999 2000 2005 (ICE CUBE)

raw 0.5 Tbyte 1.4 Tbyte 2.3 Tbyte 3.5 Tbyte 20 Tbyte

calibration ? ? ? ? ?

GRB 10 Gbyte 28 Gbyte 46 Gbyte 71 Gbyte 400 Gbyte

up-going L1 100 Gbyte 280 Gbyte 460 Gbyte 710 Gbyte 4 Tbyte

up-going L2 10 Gbyte 28 Gbyte 46 Gbyte 71 Gbyte 400 Gbyte

In addition to disk space, at almost every step, the data has to be transfered to a disk cache for processing. High bandwidth links between the mass storage media and the processors, and ultimately high bandwith links between our collaborators and a central system are crucial to our efficient physics analysis.
Here is a list of the elements which contribute to a data handling system. More detail on these is given below.

data acquisition system (digitized info --> computer memory)

mass storage of data (tapes, robots, web accessible disks)

calibration ( digital number --> time, position conversion)

software trigger

data stream selection

particle trajectory and energy reconstruction

data distribution center for remote sites

further calibration: (tuning after some data analysis)

A friendly analysis environment requires peripherals integrated into the above system.

detector monitor software
simulation package (generators, particle interactions, detector response)
event display package
graphical display of data as a relational database (paw).
etc.

Furthermore, a coherent design requires agreements with the collaborating institutions on

database for calibrations, geometry, run statistics
raw data structure
some small number of computer platforms where software will be supported.
computing standards and code review process

The AMANDA data handling system will be designed according to the scale of the raw data. By the 1997 season, 10 strings of ~50 phototubes were deployed in the south pole ice cap. Three additional strings with 60 phototubes were deployed for the 1998 season. Six more strings plus one digital string are scheduled for the 1999 season. The digital string will have dual readout for the analogue signal transported to the ice surface and for the digital signal. Therefore these strings will contribute to the raw data like two of the previous strings. The plan is to then begin deploying digital strings at a rate of 8 to 16 a year until the detector reaches a full kilometer cubed scale with 80 strings in about 2005. Below is a table of the size of the current data for the 1997 and 1998 seasons along with predictions for the years to come, scaled by the number of strings.
Also listed is the amount of data in two subset data streams. The GRB stream is selected from the raw data by taking all triggers within +- 15 minutes of a gamma ray burst. These occur about once a day. Given a 30 minute window, they select 2% of the raw data. The "up-going L1" stream is selected by requiring a reconstructed track with a zenith angle > 50 degrees. Since most of the data is the result of atmospheric muons traversing the detector from top to bottom, this filter selects about 20% of the data. The rate is adjustable by selecting higher zenith angles. We expect that the zenith angle distribution will give us our first signal of neutrinos from astrophysical sources between 80 and 110 degrees (ref), therefore leaving our "level 1" selection loose is probably desirable. The "up-going L2" stream is selected with a zenith angle>80 degrees and a minimum of 3 unscattered photons contributing to the track fit. These cuts are dialed up to make a small, well-constructed data sample for distribution.
Since AMANDA has a long history, a data handling system can not be retroactively fitted to the old data. It would be impossible, for example, to redesign the DAQ for the 1997 data. Our current approach is to start building a system for the mass storage, software trigger, data stream selection, and data distribution center and then negotiate cooperative efforts between the DAQ, calibration, and reconstruction. To build an integrated system, the AMANDA principle investigators will need to acknowledge the importance of such a system and provide the management structure for such a project. An integrated system is not built from one person's creativity, but from contributors with a unified concept. In my experience, the most successful experiments are those where software managers listen to new concepts and evaluate the impact on the project. New concepts are then either propagated into the unified concept or it is the manager's responsibility to critique the new concept and redirect the effort.
Given this current situation, it is appropriate that we outline the contribution that NERSC computing at LBL can make to the AMANDA/ICE CUBED software system.

HPSS mass storage facility
Cray T3E 512 parallel processor machine
PDSF cluster parallel distributed systems facility
PC Cluster Project 32 400 Mhz Pentium two processors and 2 dual processor front ends designed for maximum parallel efficiency

These systems provide disk space, processing power, ethernet or internet connections and software products for a wide variety of scientific investigations including high energy and nuclear physics experiments.
MASS STORAGE
HPSS is a high performance mass storage facility operated by LBL's National Energy Research and Scientific Computing Center (NERSC). Its current capacity is 100 Tbytes. It services many scientific experiments. The 1997 data currently reside in 8,009 files. The 1998 data currently reside in 1,258 tar files, each containing about 25 compressed raw data files. Because HPSS is designed as a mass storage facility, high bandwidth access is available through HSI and PFTP to some of the LBL computer facilities. FTP is available for general data distribution. We currently use HSI to transfer data to our processing cluster, and FTP to transfer data to the local computer in the the physics department. It is generally agreed that the biggest limitation to computing today is not computer cycles or disk space, but connectivity. Our bandwidth is in general limited by the time required to transfer files. As new technologies become available, we hope to remain flexible enough to take advantage of them.
SOFTWARE TRIGGER and DATA STREAM SELECTION
Several types of software quantities are being investigated with the AMANDA data as trigger/selection criteria.

Global coincidence: Currently the hardware trigger at the pole requires a set number of tubes to "fire" within a timing number. The tube threshold is set based on the trade-offs of detector dead-time and muon efficiency.
Local coincidence: Muons tend to produce PMT signals in a localized area. A coincidence between adjacent, or next neighbor tubes is more likely to be the result of a muon in the detector than a coincidence of tubes separated by a long distance.
Swedish likelihood: A likelihood is formed based on distributions of various measured quantities from a simulation of muons passing upward through the AMANDA detector.
Line fit: A simple line fit to the data gives a quick indication of the muon direction (RMS 25 degrees).
Trajectory reconstruction: The trajectory with the maximum likelihood for producing the observed PMT signals is found through an iterative process, seeded by the Line fit.
Direct hits: Using the particle trajectory, the PMT signals which are consistent with unscattered light propagation, are counted. These hits are the most significant tracking information in the event. If an event has lots of direct hits corroborating the trajectory, then it is very likely to be measured correctly.
Indirect hits: Muons passing through the detector are expected to produce many PMT signals which are the result of scattered light. If there are very few PMT signals in the event, it is most likely not a muon.
Track length: The length of the muon trajectory in the detector is measured. If the trajectory is small because the muon decayed, or clipped the edge of the detector, there will not be enough "lever arm" to indicate its direction.
Energy reconstruction: No serious attempt to measure energy is currently implemented. Cerenkov detectors can usually measure the momentum of tracks, but in this case, the velocity is indistinguishable from the speed of light. The best measurement will eventually come from the amplitude of the photon showers resulting from high energy photons radiated by the muons.

The data were collected with a hardware trigger consisting of a global coincidence. In 1997 the threshold was set to 16 PMT signals. In 1998 it was accidentally reduced to 12 PMT signals. In 1999 it will be set to 18 PMT signals. For ICE CUBE, a local coincidence trigger is planned.
The "up-going L1" stream is the result of running a "LEVEL 1 - up-going" trigger at LBL. For this, hits are calibrated, cleaned up and dead, noisy or funky PMTs are removed from the data. The Line fit is found, and events are passed if zenith angle >50 degrees. All events passing the selection criteria are reconstructed. The "up-going L2" stream is the result of selecting events with reconstructed trajectory zenith angle>80 and at least 3 direct hits. These cuts are dialed in to produce a manageable data sample for the average user's disk. Further cuts on the trajectory zenith angle, direct and indirect hits, and track length are left to be selected by the user.
A "GRB" stream is selected by replacing the "LEVEL 1 - up-going" trigger with a "LEVEL 1 - GRB" trigger. Gamma Ray Bursts are detected by a network of satellite born gamma ray observatories. AMANDA events that are within +- 15 minutes of the gps time of the burst are selected. Hits are cleaned up and dead, noisy or funky channels are removed from the data. The line fit is then found. The trajectory reconstruction is performed on all "GRB" candidates. Further cuts are left to be selected by the user.
In reality, we obtain more than two analyses from these two streams. From the up-going stream, we can count the number of up-going muons as a function of zenith angle. We can also measure the energy distribution of muons at various zenith angles. From the GRB analysis, we can count the number of muons coincident with the GRB. We can measure the time delay between the photon and neutrino signals, possibly placing limits on the mass of the muon neutrino or discovering how neutrinos are produced in relation to photons. We can make a sky map of the direction of the neutrino signals and compare it to the GRB sky map. We can measure the energy of the neutrinos which accompany GRBs. Finally, we can use the GRB sample to measure the zenith angle distribution of atmospheric muons to extend the "up-going" zenith angle into the "down-going" region. Measuring this distribution for high energy tracks may give us our first astrophysical neutrino signal.
Applying the LEVEL 1 trigger and reconstructing the data take a significant amount of CPU power. The LEVEL 1 trigger takes 10 msec/event on a 400Mhz processor. The reconstruction takes 200 msec/event. The raw 1997 data contains 1.4 billion events. To process all events by through the LEVEL 1 trigger takes 16 days on ten 400 Mhz processors. It takes 65 days on ten 400 Mhz processors to reconstruct the data passing the LEVEL 1 - up-going trigger.
We have several choices of parallel processing farms at LBL. Last spring the Cray T3E was used to filter the data with the Swedish filter. The disadvantage of the cray is that our code is not supported on this platform, including CERNLIB, so a full port was necessary. In addition, the operating system doesn't support pipes, which are a fundamental design feature of the AMANDA code, so significant modifications had to be made. Another option is the PDSF cluster. This cluster is designed to meet the needs of ATLAS and STAR. Since AMANDA is small by comparison, but will be growing over the next few years, this is an excellent option. If we become a big user of this system, we will need to allocate monetary resources for this facility. Our final option is the PC Cluster Project being studied by the future technologies group. This cluster contains 32 400Mhz machines with 2 dual processor front end machines. If we are willing to use the tools they are developing as research projects, we will be given 25% of their processing power.
DATA DISTRIBUTION
After making the data streams, distributing them to the collaboration as easily as possible is a high priority. For the near future it will be possible to put small samples of a few tens of Gbytes on disk and give public access to them through anonymous FTP. Even small samples, are too big for local disks in the long term. Staging the streams to disk, from hpss can be accomplished through a combination of Web tools and FTP "push" processes. At this point, we have to decide whether it is useful to add user defined filters (such as muff options) to the Web tools to make more specific, but much smaller samples to serve out. Most large collaborations have adopted a framework which allows users to define modules (click on "software" and "Modules" on 2nd line) for this purpose. The PDSF cluster has a well designed data distribution system based on this approach. An interesting possibility here at LBL, exists in the Clipper Project. This is a research project aimed at improving the quality of service for ESnet. As a "test case," we could be one of the first users of new high bandwidth links between LBL and Argon National Labs. This project is also planning to study a high bandwidth link to U. of Wisconsin and the California University system. With these being places of interest to the AMANDA collaboration (in addition to other normal bandwidth areas) we have a good match for a project to test their resources.
PRODUCTION SYSTEM
This outlines the resources available at in the LBL group for use by the AMANDA collaboration. Various combinations of these will be used to build a working system of data filtering, reconstruction, storage and distribution. We plan on spending the effort on each aspect to produce a working solution currently as well as a scalable product.
This page produced by Jodi Lamoureux
Feb 16, 1998

Send comments to:
JILamoureux@lbl.gov
Jodi's home page

DATA HANDLING AT AMANDA:
a new concept

MASS STORAGE

SOFTWARE TRIGGER and DATA STREAM SELECTION

DATA DISTRIBUTION

PRODUCTION SYSTEM

This page produced by Jodi Lamoureux
Feb 16, 1998

Send comments to:
JILamoureux@lbl.gov

data sets	1997	1998	1999	2000	2005 (ICE CUBE)
raw	0.5 Tbyte	1.4 Tbyte	2.3 Tbyte	3.5 Tbyte	20 Tbyte
calibration	?	?	?	?	?
GRB	10 Gbyte	28 Gbyte	46 Gbyte	71 Gbyte	400 Gbyte
up-going L1	100 Gbyte	280 Gbyte	460 Gbyte	710 Gbyte	4 Tbyte
up-going L2	10 Gbyte	28 Gbyte	46 Gbyte	71 Gbyte	400 Gbyte

DATA HANDLING AT AMANDA: a new concept

MASS STORAGE

SOFTWARE TRIGGER and DATA STREAM SELECTION

DATA DISTRIBUTION

PRODUCTION SYSTEM

This page produced by Jodi Lamoureux Feb 16, 1998

Send comments to: JILamoureux@lbl.gov

DATA HANDLING AT AMANDA:
a new concept

This page produced by Jodi Lamoureux
Feb 16, 1998

Send comments to:
JILamoureux@lbl.gov