Performance Based Management
Self-Assessment Report
October 2003
Index

Computer Information

Computer Information Resource Management

Introduction/Background

Contractor

DOE Office

Contract No.:  DE-AC03-76SF00515
Point of Contact:  Bob Cowles
Telephone No.:  (650) 926-4965
E-mail:  rdc@slac.stanford.edu
IMD: Name:  Melna Jones
Telephone No.:  (510) 637-1741
CO Name:  Tyndal Lindler
Telephone No.:  (650) 926-5076 (SLAC)
E-mail: tyndal.lindler@oak.doe.gov

Date of last assessment: October 2002

Departmental Overview

Laboratory Mission

The Stanford Linear Accelerator Center is the lead Department of Energy (DOE) laboratory for electron-based high energy physics. It is dedicated to research in elementary particle physics, accelerator physics and in allied fields that can make use of its synchrotron radiation facilities—including biology, chemistry, geology, materials science and environmental engineering. Operated on behalf of the DOE by Stanford University, SLAC is a national user facility serving universities, industry and other research institutions throughout the world. Its mission can be summarized as follows:

Organizational Mission

The Computer Information Resource Management functional area is responsible for coordinating Information Management activities within the Laboratory. This coordination effort includes encouragement of information standards to ensure broad availability of information resources and of computer and systems procurements that have Laboratory support, and are part of Laboratory wide information planning practices.

The Computer Information Resource Management functional area self-assessment is based on, and measured against, performance objectives and standards as reflected in the SLAC contract that were defined by SLAC managers and DOE points of contact in order to address customer satisfaction, cost efficiency, and contract compliance.

Identification of Self-Assessment Report Staff

Names, titles, affiliations of participants

Robert Cowles, Computer Security Officer

Richard Mount, Director of SLAC Computing Services (SCS)

Scope of Self-Assessment

Items of Interest in 2003

The BaBar program has been extremely productive. By virtue of the continued luminosity increases in PEP II, BaBar has recorded 1.2 petabytes of data. In order to accommodate this plethora of data, SLAC Computing Services (SCS) and BaBar physicists have been expanding computing resources as rapidly as possible. Three different areas deserve special mention:

Achitecture Operating System Speed Processors Number of Systems
UltraSparc II Solaris 440MHz 1 900
Pentium III Linux 866MHz 2 512
Pentium III Linux 1.4GHz 2 512
Pentium IV Linux 2.6GHz 2 128

Intel Systems Summary

Windows 2000 Infrastructure and Windows XP Client Support

Active Directory has provided increased security with its use of Kerberos and additional administrative functionality. Through the new Active Directory infrastructure, software upgrades and security updates are rolled out to Windows XP clients through a combination of Windows Update service, Group Policy Objects and local Software Update Services servers. Commonly used software and updates (e.g., MS Office) are automatically installed on Windows XP clients using Group Policy Objects. By implementing this Active Directory based Windows infrastructure and migrating to Windows XP clients, SLAC has been able to greatly reduce the costs of keeping up with the critical security patches. In the last few months, while many institutions have had large numbers of computers infected, only a handful of SLAC’s managed Windows systems were compromised.

NetIQ software was procured for monitoring Windows servers, including Exchange and web servers, and is currently being implemented.

Exchange 2003

As a result of delays in getting the 1st tier storage system ready for Exchange 2000, the decision was made to decommission the Exchange 2000 test systems and begin testing Exchange 2003. The plan is to complete the conversion from Exchange 5.5 to Exchange 2003 by Q2 FY04.

Windows Storage Systems

The Windows storage growth has been doubling over the past year. The current Windows storage is 4 TB. To cost effectively manage the growth of Windows storage, a two tier mechanism of storage is being implemented. The 1st tier will serve critical data and will reside on a SAN environment, which can provide increased uptime and recovery functionality over normal direct attached systems, such as point-in-time copies. The basic functionality has been put into production. The 2nd tier will serve normal file data that has lower requirements for uptime and recovery. The hardware for the 2nd tier has been procured and will be installed, tested and moved into production during FY04.

System administrator training

System administrators have received the following training:

Seismic Stability Upgrades

The computer center’s raised flooring is being systematically replaced in phases with the initial phases to 1) improve seismic stability and reliability of the computer center, and 2) remove under floor obstructions to cooling airflow.

Phase 1 of this project was completed in FY02. The increased load as computers get smaller and denser result in higher weights per square foot. In FY03 a complete redesign study was undertaken to consider the raised floor design in light of the increased floor loads Since this study took a substantial amount of time and research, Phase2 was not performed in FY03 but will restart in FY04. As part of this study, it has been determined that the best course of action is to move ahead with Phase 2 on the 2nd floor prior to replacement of the raised floor on the 1st floor. The resulting project timing will allow us to move some heavier loads to the 1st floor, thereby slowing the increase in the load being placed on the 2nd floor.

Electrical Power Improvements for Central Servers

Controlled measurements plus weather-related power outages provided the opportunity to perform load studies to determine the length of time the UPS systems can provide total power for the computer center and establish needs for additional battery packs as more hardware is installed. The UPS systems are intended to provide enough power to maintain SLAC’s computing and networking infrastructure sufficient to bring the planned backup generators online.

Given the load studies and the projections for growth of BaBar computing, an additional 3 UPS (450 KVA) systems will be needed in the next two fiscal years. These UPS systems along with chillers required to remove the resulting increased heat load from the raised floor, will need to be placed outside of the computer center on a concrete pad. Since our existing 3 UPS systems need to be moved from the 4th floor of the computer center due to live load limits, the area of the pad will need to be large enough to incorporate those systems in addition to the anticipated growth.

In FY03 work began on upgrading an electrical substation behind the computer center (Substation 7), from 1 MW to 2.5 MW with future expansion capability to 4MW. This upgrade is necessary to provide enough power to computing equipment for BaBar and the SLAC scientific program. The substation upgrade is scheduled to be completed by Q3 FY04. In the second half of FY04, it will be necessary to install equipment and wiring to distribute this electrical power to both the 1st and 2nd floors of the computer center.

To handle the increased cooling capacity required by the growing BaBar and SLAC computing loads, two 27 ton Stulz air cooling units were added in FY03. In FY04, with the continuation of the raised floor replacement project, the plan is to install the three additional 27 ton Stulz units, for which installation was delayed due to the engineering redesign of the raised floor bracing plan.

With the rapid increase in power consumption within the computer center it has become apparent that there is a need to monitor instantaneously and continuously the amount of power being consumed by any and all of our power distribution units. In FY03, we began procurement of such a system. In the 1st quarter of FY04, this power monitoring system will be installed on the most critical power distribution panels. We plan to expand this system in the future to all power panels.

Performance Side-Bar Indicators

HPSS and Objectivity database

A sign of system growth is the mass storage system HPSS that acts as the primary repository of the Objectivity database for the data collected by the BaBar experiment. The amount of data stored in FY03 grew from near 1 petabyte at the start of the year to over 1.2 petabytes at the end of the year. The slowing of the growth rate from the previous year was due to BaBar’s decision to stop storing redundant data.

Compute Farms

As in FY02, SLAC Compute Farms continued their rapid growth. The table below shows the growth in capacity available to the scientific program. While the percentage growth was smaller than the previous year, in absolute terms it was as large as the previous year’s growth. The capacity units below are the total GigaHertz across the farms; experience shows that physics code production is directly related to the speed ratings of current processors, giving us a useful basis for comparison and scaling.

An additional Linux compute farm is in operation – Accelerator Research (ARDA) has a farm consisting of 32 VALinux machines.

Status of Goals during FY03

  1. Migrate to Windows 2000 infrastructure and Windows XP clients.
  2. Making significant progress towards the goal and expect to be close to initial schedule for completion.
  3. Develop Performance Measures based on the tools available to the Laboratory.
  4. No significant progress has been made in this area.
  5. Implement a monitoring solution for the Windows infrastructure
  6. Software has been purchased. In process of being implemented.
  7. Implement a 2nd tier storage for the Windows environment.
  8. Continue to provide resources to support planned increases in the BaBar requirements for computing resources.
  9. Substantial capacity increases in computing power and disk and tape storage have been installed and are planned for early FY04.
  10. Continue raised floor replacement along with seismic bracing in additional areas of the central server machine room.

Little progress in FY03 due to engineering design study on floor loading and subsequent decision on best way to proceed with the raised floor replacement.

Improvement Action Plan/Goals

Goals for FY04

  1. Complete the migration to Windows 2000 infrastructure and Windows XP clients. Evaluate and upgrade to Windows Server 2003.

  2. Develop Performance Measures based on the tools available to the Laboratory.

  3. Complete implementation of a monitoring solution for the Windows infrastructure

  4. Complete implementation of the 2nd tier storage for the Windows environment.

  5. Continue to provide resources to support planned increases in the BaBar requirements for computing resources.

  6. Continue replacement of raised flooring on 2nd floor of computer center

  7. Begin replacement of the raised flooring on the 1st floor of the computer center.

  8. Distribute power for upgraded Substation 7 to 1st and 2nd floors of computer center

  9. Install three additional 27 ton Stulz air cooler units.

  10. Install a power monitoring and trending system