Author: Robert C. Sass
Last modified on Wednesday, July 26, 2006
The Problem
The Embedded PC (EPC) micro cards have an Ethernet interface which we have been trying to use with limited success. RMX has never had a TCP/IP stack that was embeddable i.e. can be used in a nucleus-only environment like ours. Thus, Mark Crane made many mods to another stack called Fusion so that it would run under RMX. This mostly works but under heavy load in PR02 and PR04, the micro hangs totally after some relatively short period of time running Ethernet. We think that both SLCnet and Ethernet are unavailable. At least you can't use MD386 to see what's going on. A further obstacle to debugging is that the EPC BIOS trashes parts of memory on reset so even a crash dump is mostly useless.
Therefore it would be nice to have a third way into the micro. Enter the serial port. In configuring the EPC card, we usurp the COM port interrupts but the COM2 port I/O addresses are still accessible. We have terminal servers by which we can access an EPC serial port remotely. We also have a callable interface to the RMX System Debugger (SDB) by which we can access all of the "v" or view commands to examine RMX data structures and information.
That's nice if the system is still functional enough to schedule a task that can respond to the user's commands and send the result to the serial port. The question is how dead is it? What if it's hung in some ISR or interrupt task? What if RMX itself is not functional? The clock interrupt is the most basic element that must be operational for the system to be at all functional. In additional to the SDB commands, we also need a way to access the system at the most basic level so it seems we also need to extend our hook into the Clock ISR.
The Solution
We take a two-pronged approach to providing a window into the dead system. When the EPC micro starts, a task called crashmain is created at the highest user i.e. non-interrupt task priority. The following is how it works:
Do Forever
Once/second see if there is an input char on the COM2 port
If there is then
read and echo chars until input buffer id full or a CR is entered
send the entered command to SDB and get its output
send the SDB output to the COM2 port
end if
end do
It's about as simple as it gets. The SDB callable interface will give results back for any valid "v" command and reject any garbage input with a "syntax error". The "v" commands are documented in the iRMX System Debugger Reference manual. We have several copies here or you can see the online version at http://www.tenasys.com/irmx_manuals.htm
Additionally we provide a routine in crashmain that is called by the Clock ISR. This routine initially just writes a 'C' character to the serial port every few seconds to show that the Clock is still running. The function of this routine can be expanded arbitrarily depending on the results of our initial debug efforts.
The Usage
The checkpr.com file in slccom: does the telnet for you based on micro name or you can do it manually. The first thing to do is to telnet into the terminal server and port that is connected to the micro you want to debug. The names, ports, userids and passwords can vary with time so you may need to check with the appropriate people to get this info. Here's the current configuration:
Micro | Server Name | Port | Userid |
LI34 | b5as | 4 | pepii |
PR04 | tty04pep00 | 33 | eoic |
PR02 | tty02pep00 | 1 | eoic |
The following example is from a UNIX session connecting to the terminal server b5as and port number 2005 used with LI34 in the test closet. The VMS command format is telnet b5as/port=2005. Bold type is your input.
rcs@slcsun1 $ telnet b5as 2005
Trying 134.79.48.219...
Connected to b5as.SLAC.Stanford.EDU.
Escape character is '^]'.
User Access Verification
Username: pepii
Password:
Password OK
So now you're in. The crashmain program doesn't spontaneously send anything and only scans for new input once/second to keep it's system impact to a minimum. It echoes whatever you type so when you enter the first character, wait for it to echo back before you enter the rest of the command. Once it detects the first character it can scan them in at 100 Hz so you can type as fast as you want. Type Enter when you've finished a valid command or just want to terminate your input. The output from SDB is then sent to the terminal.
A word about output speed. Again, to minimize system impact and to be able to use this program on the running system, characters are output only at 100 Hz so you'll need to be a bit patient and wait a few seconds for the command results.
Another niggling detail is that the SDB output overwrites your input command because Enter just sends <CR> and the SDB output doesn't have a leading line feed. If you want to see the command you typed as well as the output, you can type <CTRL J> which is a line feed before you type Enter. Cheap trick but it works.
Here's some sample output. The vk command displays all ready and sleeping tasks. The leading and trailing 'C's are the heartbeat of the clock ISR. When a command is entered, the heartbeat is suspended until the command is complete.
CCCCCCCCvk
Ready tasks: 15d0 0268
Sleeping tasks: 0270 0fa8
1058 1070 10a0
10b0 10e8 1100
1110
1128 1168 1258
1268 1290 12c0 12e8
13b8
13e0 1408 1430
1458 1480 14a8 14d0
14f8
1520 1548 1570
1598 15c0 1718 1be8
1bf8
1c10 1d38 1d50
1d60 1f10 2108 2660
2b58
2b68 2d70 3138
3630 3640 3848 3c10
4190
41a0 43a8
CCCC
The vd command displays a job's object directory and 258 is the root job.
vd 258 Directory size: 00c8 Entries used: 0049 ATM_LI34LOOP 2b68 ACT_LI34LOOP 2d70 SlcnetPorts 1128 BitbusInt 2108 MD386 1150 CTL_LI34LOP2 3630 ATM_LI34LOP2 3640 CTL_LI34LOP3 4190 ATM_LI34LOP3 41a0 MES_LI34LOP2 3138 ACT_LI34LOP2 3848 MES_LI34LOP3 3c10 ACT_LI34LOP3 43a8 CRATTASK 13e0 STATTASK 14a8 FBCKTASK 1520 CRATMAIN 13c8 STATMAIN 1490 MICROMAIN 1020 MSG_TASK 1290 FBCKMAIN 1508 MBEMAIN 15a8 MSGMAIN 1278 CRASHMAIN 15d0 ANLGTASK 14d0 CAMVTASK 13b8 TIMETASK 1408 ANLGMAIN 14b8 MSGBWT 12c0 DBMAIN 12d0 CAMVMAIN 13a0 ERRMAIL 1268 RQMONITOR 0fb0 ERRMTIM 1258 MD386Login 1168 RQSYSINFO 0fb8 DBS_TASK 12e8 EEPROINT 1070 FNSTIMER 1058 TCPeSERV 10b0 PORT_REQUEST 1118 NET_REQUEST 10f0 TIMEMAIN 13f0 HTTPSERV 10a0 MGNTMAIN 1418 BPMOMAIN 1440 TESTMAIN 1468 SLCnetIntTsk 10e8 KLYSMAIN 14e0 MCOM_MAIN 1530 MPSCMAIN 1558 BCOMMAIN 1580 TESTTASK 1480 MPSCTASK 1570 SLCnetDriver 10d0 SLCnetServer 1100 SLCnetTimer 1110 KISTTASK 1548 MBE_TASK 15c0 KLYSTASK 14f8 BPMOTASK 1458 MGNTTASK 1430 BCOMTASK 1598 MBCDEXER 1718 TIMERRSND 1be8 TIMENONMIS 1bf8 TIM360HZ 1c10 BPMPROC 1d38 BPMFDBK 1d50 BPTO 1d60 TIMDWNLD 1f10 MES_LI34LOOP 2660 CTL_LI34LOOP 2b58
http://www.slac.stanford.edu/grp/cd/soft/pepii/slaconly/how-to/use_term.html gives you the varying control sequences required to exit telnet depending how you connected to the terminal server.
That's all there is to it Have fun in the new Millennium.