trip_report

Mark Crane

Review of the Real-Time 99 conference in Santa Fe NM June 1999

The first day of the conference was split into a number of different tracts. I chose the hardware tract which included a half day tutorial on VME / Compact PCI followed by an afternoon discussing real-time Linux.

In the first session a number of things stood out. The biggest issue is the move from parallel backplanes to serial buses for interconnect of computing elements. Intel's Next Generation I/O, or NGIO, poses the largest threat to VME and compact PCI. The issue with parallel buses is that the individual channels are skewed in time as they cross the backplane. With a serial bus this is not an issue since everything it self clocked. The Web site for NGIO is WWW.NGIOforum.org. Another major issue is that the costs for VME and compact PCI are based upon the packaging and the interconnect, not necessarily the computing components on the boards. This is why your commodity devices do not always show a reduction in cost. The biggest advantage to using Compact PCI is the PCI software, which is written for commodity devices, is reusable in these commodity products. The issue with VME is that few users write PC based drivers for control systems on VME.

The second half of the first day was a focus on real-time Linux or RT Linux. In this is free real-time scheduling software which allows you to run real-time tasks, with your own scheduler, and then run the normal Linux operating system on top of it. This makes Linux run just as the idle task in the scheduler. It solves the problem where you have very simple real-time needs and need to have the upper layers support such as networking and screen I/O, and be able to use one operating system on a commodity device. Designs using this model use strict real-time code at the lowest level and any non-real-time needs use normal Linux. An interesting point is that you can insert your own scheduler algorithm in place of what's shipped with the code. This would allow you to do round robin or priority scheduling to suit your needs. The Web site for this is linux-rep.fnal.gov/realtime.

The next four days of this conference were split up into differing parts of real-time data acquisition systems. The focus here was on large-scale experiments with a few small experiments in for good measure. All projects were physics related experiments, most of them acquiring data from the events and processing this data in somewhat real-time terms. Here I have listed general observations for these four days. I have detailed notes on each of the presentations for future reference.

The majority of the real-time operating system needs are accomplished using VxWorks from WindRiver. Everyone has support issues with WindRiver but it seems there isn't a good alternative. Of course CERN uses LynxOS but they seem to be the only ones.
The majority of non-real-time multinode CPU requirements are accomplished with Linux running on PCs in server farms. This is the standard model if you will, with anywhere from ten to 400 PCs clustered together to perform the computing needs. There were some Sun, Windows NT and SGI servers to round things out..
Everyone uses shared memory somewhere in their designs. Whether it's local shared memory between processes in the same CPU or if it's remote reflected memory using networks. There are differing remote protocols, some are proprietary and some use commercial networks such as Ethernet. The standout here is the KEK BELLE folks which have built a shared memory network on top of Fast Ethernet using UDP/IP protocols.
There is some ATM used in experiment designs two years ago when ATM was the hot protocol. Most of the designs have now moved to Fast Ethernet. Current and next generation designs and looking at Gigabit Ethernet for a solution. CERN and Fermi are still using ATM and having good results.
There are issues with hard disks, of course. There is a Seagate for reliability, but the ultimate move is to flash disk drives.
Many projects designed in the last few years are stock-piling chips and other devices to protect their investments. They have found that technology is moving fast and they were not sufficiently future proved. One standout here is the SCI (scalable coherent interface) done by Dolphin systems for CERN. A number of people standardized on SCI but it's almost a dead technology.
Many people view reliability as a system level issue and build in fault tolerance to avoid the questions of reliability. So they are not specifically building high reliability devices, only making redundant systems, and system level design choices, to be able to work around the reliability issue.
There is an interesting history of this RT group in that two years ago the designs were nearly all ATM and now they have gone to fast Ethernet in the actual implementation due to complexities in the ATM camp.
One of the biggest points they came out is to stall your final design decisions and criteria as long as possible. This allows you to choose the best technology at the latest time especially now as things are moving so quickly. Also, do not choose a technology which is still in committee, wait until it is approved and then use the implementation.
CORBA, C++, 0bject 0riented design are the correct ways to implement the system and it’s software. There are very few systems who are doing it otherwise. There is also a good show of CORBA which is meeting the performance requirements for many of the faster systems.
Designs using the Web interface and Java are very strong.
People still do not trust "idiot lights" in a control system display. They still require actual digits in their displays.
There is lots of EPICS in the slow control systems, but I didn't see very many systems using EPICS to distribute control information to local devices for the fast acquisition systems.
BaBar has a very interesting EPICS interface to a database and should be studied further. It is strange that I go two thousand miles to find out what people who attend our architecture meetings are doing now.
I am still having a hard time with the difference between the control system philosophy and the data acquisition philosophy. There is a difference here but the lines are blurred. It seems there are two specific camps and they don't communicate well.
This is a truly global community with people from all continents, from many different governments, and levels of technology. There is commonality in all of the designs where many of the diagrams, if simplified, would look nearly identical. Must be a human condition.
I have found that there are a specific set of slides, which I call Euro-slides, that nearly all European presenters use. This is a slide with text on the left-hand side and text on a right hand side, in two columns or else text on the left with a drawing on the right. This is consistent with the EPICS collaboration presentations also. Different schools?