|
The Stanford Linear Accelerator Center is the lead Department of Energy (DOE) laboratory for electron-based high energy physics. It is dedicated to research in elementary particle physics, accelerator physics and in allied fields that can make use of its synchrotron radiation facilities—including biology, chemistry, geology, materials science and environmental engineering. Operated on behalf of the DOE by Stanford University, SLAC is a national user facility serving universities, industry and other research institutions throughout the world. Its mission can be summarized as follows:
The Computer Information Resource Management functional area is responsible for coordinating Information Management activities within the Laboratory. This coordination effort includes encouragement of information standards to ensure broad availability of information resources and of computer and systems procurements that have Laboratory support, and are part of Laboratory wide information planning practices.
The Computer Information Resource Management functional area self-assessment is based on, and measured against, performance objectives and standards as reflected in the SLAC contract that were defined by SLAC managers and DOE points of contact in order to address customer satisfaction, cost efficiency, and contract compliance.
Names, titles, affiliations of participants
Robert Cowles, Computer Security Officer
Richard Mount, Director of SLAC Computing Services (SCS)
Items of Interest in 2004
The BaBar program has been extremely productive. By virtue of the continued luminosity increases in PEP II, BaBar has recorded 1.7 petabytes of data. In order to accommodate this plethora of data, SLAC Computing Services (SCS) and BaBar physicists have been expanding computing resources as rapidly as possible. Three different areas deserve special mention:
Architecture |
Operating System |
Speed |
Processors |
Number of Systems |
UltraSparc II |
Solaris |
440MHz |
1 |
900 |
Pentium III |
Linux |
866MHz |
2 |
512 |
Pentium III |
Linux |
1.4GHz |
2 |
512 |
Pentium IV |
Linux |
2.6GHz |
2 |
384 |
Intel Systems Summary
Windows 2000/2003 Infrastructure and Windows XP Client Support
Active Directory has provided increased security with its use of Kerberos and additional administrative functionality. Through the new Active Directory infrastructure, software upgrades and security updates are rolled out to Windows XP clients through a combination of Windows Update service, Group Policy Objects and local Software Update Services servers. Commonly used software and updates (e.g., MS Office) are automatically installed on Windows XP clients using Group Policy Objects. By implementing this Active-Directory-based Windows infrastructure and migrating to Windows XP clients, SLAC has been able to greatly reduce the costs of keeping up with the critical security patches. In the last few months, while many institutions have had large numbers of computers infected, only a handful of SLAC’s managed Windows systems were compromised. The Windows print queue was centralized. Testing has begun on how to implement Windows XP Service Pack 2.
NetIQ software was procured for monitoring Windows servers, including Exchange and web servers, and is currently being utilized to monitor machine and service uptime. Additional functionality will be added over the coming year
Quest Reporter software, allows scanning of Windows machine logs and tracking software inventories was purchased as wll.
A web-based system, from which users download and install supported software for the Windows environment is being implemented. The system tracks the usage of licensed software in a database, and confirms the user’s agreement to the licensing terms before the software is downloaded.
Additional Firewalled Systems
A firewall was implemented for the Physical Electronics group (PEL). Additional protection was needed for hardware reliant on software drivers that could only be used with an unsupported operating system.
A firewall was also implemented for the GLAST Clean Room Monitoring System. Additional firewalls are in the process of being built for other GLAST systems (e.g., Online Detector System).
To accommodate the new PeopleSoft infrastructure, a firewall protecting the new network containing this infrastructure (EPN2) is being built for the Business Services Division.
Exchange 2003
The conversion from Exchange 5.5 to Exchange 2003 was completed in August, 2004. One Exchange 5.5 server remains up for synchronization between the Exchange accounts database and the SLAC Institutional and SCS Account Resources databases. After a method is implemented to synchronize directly to Windows Active Directory, the last Exchange 5.5 server will be shut down and testing/implementation of RPC over HTTP will begin. Due to performance issues discovered in pre-production testing, the Exchange 2003 databases were placed on 4 Sun StorEdge 6120 Storage Arrays. The 6120s are large enough to provide capacity for expected growth through the end of FY06, but keeping the growth in check will be a challenge.
Windows Storage Systems
The amount of Windows storage has been doubled in FY04. The current Windows storage is 8 TB. To effectively manage the growth, quota software has been procured and implemented. Quotas are initially set at 500MB for new users and 10GB for new groups. Users can ask for increases up to 2GB. Groups will be given additional space equally as it is procured. If a large amount of space is needed and is not available, bulk storage can be purchased through SCS. Software was purchased to improve the management and backup of centrally provided, network accessible storage for Windows systems. The goal is for data to be retrieved from backup within 4 hours.
Security Scanning
During FY04, we increased the usage of Internet Security Systems Internet Scanner 7. We are currently scanning ~6 days/week for latest Windows patches. 2 days/week the scans result in Windows System Admins receiving emailed reports if they have systems missing patches. This appears to have increased the level of compliance dramatically. One Security group member attended a class on ISS Site Protector and will be implementing this product during FY05 to enable scheduled scans across the network and easier report generation/distribution.
System administrator training
System administrators have received the following training:
Networking
The SLAC local Area Network (LAN) is based on a core of interconnected, redundant Cisco 6509 router/switches. Building switches, the farm core, and the border router are connected to the core routers by 1Gbps links. The number of nodes connected to the network is around 10,000. Rolling upgrades of the building switches are in progress to replace obsolete equipment, enable security upgrades, and add capacity and speed for end nodes. Desktops are being upgraded, as requested, from 10Mbps to 100Mbps links.
The compute and data server farms machines are connected to farm switches at 100Mbps that in turn are connected to the farm core switch at 1Gbps. The demands of increased throughput and size of the compute and data farms will require that we upgrade to 10Gbps the connections of farm switches to the farm core switch during FY05. We are also determining how to upgrade the core and border router to 10Gbps to accommodate the planned upgrade of the offsite ESnet link to 10Gbps in FY05.
External Networking
External networking is vital to SLAC’s scientific mission. Traffic flows over a 622 Mbits/s link into ESNet, plus a 1Gbps link to Internet2 via Stanford University. In FY04, all newly acquired raw data from the BaBar experiment was moved in close to real time to Padova in Italy for initial reconstruction to allow INFN to meet its agreed commitment to BaBar computing. This traffic formed the largest single flow over ESnet.
Seismic Stability Upgrades
Areas of the computer center’s raised flooring are being systematically replaced both to improve seismic stability and reliability of the computer center and to remove under floor obstructions to cooling airflow. As computers get smaller and denser, the increased load results in higher weights per square foot. In FY03, a complete redesign study was undertaken to consider the raised floor design in light of these increased floor loads.
One more area of the computer center’s machine room on the 2nd floor was completed in FY04. The project has now been redirected to the area currently used for tape storage on the 1st floor. Upgrading this area will allow heavier loads to be moved to the 1st floor, thereby slowing the increase in load currently being placed on the 2nd floor.
Electrical Power Improvements for Central Servers
In FY03, work began on upgrading an electrical substation behind the computer center (Substation 7), from 1 MW to 2.5 MW with future expansion capability to 4MW. This upgrade is necessary to provide enough power to computing equipment for BaBar and the SLAC scientific program. The substation upgrade was completed by the end of FY04. In FY05, it will be necessary to install equipment and wiring to distribute this electrical power to both the 1st and 2nd floors of the computer center.
Controlled measurements plus weather-related power outages provided the opportunity to perform load studies to determine the length of time the UPS systems can provide total power for the computer center and to establish needs for additional battery packs as more hardware is installed. The UPS systems are intended to provide power to maintain SLAC’s computing and networking infrastructure while (planned) backup generators are being brought online.
Given the load studies and the projections for growth of BaBar computing, an additional 3 UPS (450 KVA) systems will be needed in the next two fiscal years. These UPS systems will need to be placed outside of the computer center. Some our existing 3 UPS systems need to be moved from the 4th floor of the computer center due to live load limits, the area of the pad will need to be large enough to incorporate those systems in addition to the anticipated growth. Conceptual and architectural designs were begun in FY04 for a location and building to house the UPS systems and switch gear for power distribution of substation 7.
To handle the increased cooling required by the growing BaBar and SLAC computing loads, an additional Stulz air handling unit was installed in FY04.
Computing Research and Development
SLAC computing receives funding for a number of computing research and development activities from the DOE and other funding agencies. The performance of such activities is peer-reviewed and assessed at appropriate intervals by the program offices concerned. Current research projects include the Particle Physics Data Grid, Internet Performance Monitoring, and the development of a huge-memory architecture for scientific data analysis.
Performance Side-Bar Indicators
HPSS and Disk-based databases
A sign of system growth is the mass storage system HPSS that acts as the primary repository of the data collected by the BaBar experiment. The amount of data stored in FY04 grew from1.2 petabytes at the start of the year to over 1.7 petabytes at the end of the year as shown in the figure below.
The growth of disk space and its allocation to scientific data storage is shown below.
Delivery of Data to Remote Scientists
The figure below shows the external network traffic from SLAC over ESNet, dominated by scientific data flows. SLAC is identified by ESNET, along with Fermilab, as the leading distributor of data to remote scientists.
|
Compute Farms
As in FY03, SLAC Compute Farms continued their rapid growth. The table below shows the growth in capacity available to the scientific program. The capacity units below are the total GigaHertz across the farms; while not all physics programs scaled as expected from Pentium III to Pentium IV, this remains a useful rough guide to total capacity.
An additional Linux cluster is in operation in the form of a 64 node dual Pentium IV Myrinet cluster. The former 32 node cluster has been relegated to a development platform, and this new facility is available for lab-wide use in the development of parallel algorithms.
Status of Goals during FY04
The migration to Windows 2000 infrastructure and Windows XP clients has been completed. The Windows Server 2003 upgrade is in progress.
The new metric of scientific data distribution was introduced.
Monitoring of system and service uptime and automated service restart has been implemented. Implementation of additional functionality is ongoing.
Implementation in progress.
BaBar computing needs are planned on a forward-looking basis by the BaBar Computing Steering Committee and approved by the BaBar International Finance Committee which includes DOE representatives. The agreed SLAC commitment to the expansion of BaBar computing was met in full.
A second area of raised flooring on the 2nd floor of the computer center was replaced. There was a major redesign and seismic review due to the ever increasing density and resulting weight that is achieved in today’s computer racks. As a result of the review, we are installing custom pedestals of oversized dimensions and a solid bar welded stringer system. In the summer of FY04, it was determined that the next phase of the replacement project should be moved to the 1st floor, thus allowing us to transfer the heaviest and hottest loads to a lower seismic shear factor.
A new design for the raised flooring on the 1st flooring is underway that will increase the height to 18 inches in order to accommodate the increasing heat loads produced by today’s higher density computers. The current compute clusters found in support of the BaBar project are consuming approximately 16K watts per rack or nearly 500 Watts per square foot. At the same time we intend to incorporate a single design for all future raised floor replacements that meets seismic requirements by the use of Teflon isolation plates beneath the racks.
With substation 7 having been upgraded at the end of FY04, we are now moving into the implementation phase of distributing power to the 1st and 2nd floors. A new bus bar delivery system is being considered for the distribution of power such that we will have greater flexibility and redundancy. This will allow us to isolate racks between different sources of power quickly and more easily such that work can be done on one source of power without affecting critical computing services.
One new Stultz air handler was installed and put into operation on the 2nd floor in conjunction with the FY04 phase of the raised floor replacement project. The additional 2 Stulz units will be installed and placed on the 1st floor at the same time the raised flooring is replaced.
A power monitoring system was installed in FY04 on 11 major distribution panels that are sourced from substation 8. These meters provide instantaneous usage loads as well as historical trending, fault monitoring and analysis.
Improvement Action Plan/Goals
Goals for FY05
Back to Index Page