Antares-RAID-sparcLinux-HOWTO

Thom Coates (tdc3@psu.edu), Carl Munio, Jim Ludemann

v0.1, 28 April 2000
This document describes how to install, configure, and maintain a hardware RAID built around the 5070 SBUS host based RAID controller by Antares Microsystems. Other topics of discussion include RAID levels, the 5070 controller GUI, and 5070 command line. A complete command reference for the 5070's K9 kernel and Bourne-like shell is included.

1. Preamble

Copyright 2000 by Thomas D. Coates, Jr. This document's source is licensed under the terms if the GNU general public license agreement. Permission to use, copy, modify, and distribute this document without fee for any purpose commercial or non-commercial is hereby granted, provided that the author's names and this notice appear in all copies and/or supporting documents; and that the location where a freely available unmodified version of this document may be obtained is given. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, either expressed or implied. While every effort has been taken to ensure the accuracy of the information documented herein, the author(s)/editor(s)/maintainer(s)/contributor(s) assumes NO RESPONSIBILITY for any errors, or for any damages, direct or consequential, as a result of the use of the information documented herein. A complete copy of the GNU Public License agreement may be obtained from: Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Portions of this document are adapted and/or re-printed from the 5070 installation guide and man pages with permission of Antares Microsystems, Inc., Campbell CA.

2. Acknowledgements and Thanks

3. New Versions

4. Introduction

The Antares 5070 is a high performance, versatile, yet relatively inexpensive host based RAID controller. Its embedded operating system (K9 kernel) is modelled on the Plan 9 operating system whose design is discussed in several papers from AT&T (see the "Further Reading" section). K9 is a kernel targeted at embedded controllers of small to medium complexity (e.g. ISDN-ethernet bridges, RAID controllers, etc). It supports multiple lightweight processes (i.e. without memory management) on a single CPU with a non-pre-emptive scheduler. Device driver architecture is based on Plan 9 (and Unix SVR4) streams. Concurrency control mechanisms include semaphores and signals.

The 5070 has three single ended ultra 1 SCSI channels and two onboard serial interfaces one of which provides command line access via a connected serial terminal or modem. The other is used to upgrade the firmware. The command line is robust, implementing many of the essential Unix commands (e.g. dd, ls, cat, etc.) and a scaled down Bourne shell for scripting. The Unix command set is augmented with RAID specific configuration commands and scripts. In addition to the command line interface an ASCII text based GUI is provided to permit easy configuration of level 0, 1, 3, 4, and 5 RAIDs.

4.1 5070 Main Features

5. Background

Much of the information/knowledge pertaining to RAID levels in this section is adapted from the software-raid-HOWTO by Linas Vepstas . See the acknowledgements section for the URL where the full document may be obtained.

RAID is an acronym for "Redundant Array of Inexpensive Disks" and is used to create large, reliable disk storage systems out of individual hard disk drives. There are two basic ways of implementing a RAID, software or hardware. The main advantage of a software RAID is low cost. However, since the OS of the host system must manage the RAID directly there is a substantial penalty in performance. Furthermore if the RAID is also the boot device, a drive failure could prove disastrous since the operating system and utility software needed to perform the recovery is located on the RAID. The primary advantages of hardware RAID is performance and improved reliability. Since all RAID operations are handled by a dedicated CPU on the controller, the host system's CPU is never bothered with RAID related tasks. In fact the host OS is completely oblivious to the fact that its SCSI drives are really virtual RAID drives. When a drive fails on the 5070 it can be replaced on-the-fly with a drive from the spares pool and its data reconstructed without the host's OS ever knowing anything has happened.

5.1 Raid Levels

The different RAID levels have different performance, redundancy, storage capacity, reliability and cost characteristics. Most, but not all levels of RAID offer redundancy against drive failure. There are many different levels of RAID which have been defined by various vendors and researchers. The following describes the first 7 RAID levels in the context of the Antares 5070 hardware RAID implementation.

5.2 RAID Linear

RAID-linear is a simple concatenation of drives to create a larger virtual drive. It is handy if you have a number small drives, and wish to create a single, large drive. This concatenation offers no redundancy, and in fact decreases the overall reliability: if any one drive fails, the combined drive will fail.

SUMMARY

5.3 Level 1

Also referred to as "mirroring". Two (or more) drives, all of the same size, each store an exact copy of all data, disk-block by disk-block. Mirroring gives strong protection against drive failure: if one drive fails, there is another with the an exact copy of the same data. Mirroring can also help improve performance in I/O-laden systems, as read requests can be divided up between several drives. Unfortunately, mirroring is also one of the least efficient in terms of storage: two mirrored drives can store no more data than a single drive.

SUMMARY

5.4 Striping

Striping is the underlying concept behind all of the other RAID levels. A stripe is a contiguous sequence of disk blocks. A stripe may be as short as a single disk block, or may consist of thousands. The RAID drivers split up their component drives into stripes; the different RAID levels differ in how they organize the stripes, and what data they put in them. The interplay between the size of the stripes, the typical size of files in the file system, and their location on the drive is what determines the overall performance of the RAID subsystem.

5.5 Level 0

Similar to RAID-linear, except that the component drives are divided into stripes and then interleaved. Like RAID-linear, the result is a single larger virtual drive. Also like RAID-linear, it offers no redundancy, and therefore decreases overall reliability: a single drive failure will knock out the whole thing. However, the 5070 hardware RAID 0 is the fastest of any of the schemes listed here.

SUMMARY:

5.6 Level 2 and 3

RAID-2 is seldom used anymore, and to some degree has been made obsolete by modern hard disk technology. RAID-2 is similar to RAID-4, but stores ECC information instead of parity. Since all modern disk drives incorporate ECC under the covers, this offers little additional protection. RAID-2 can offer greater data consistency if power is lost during a write; however, battery backup and a clean shutdown can offer the same benefits. RAID-3 is similar to RAID-4, except that it uses the smallest possible stripe size.

SUMMARY

5.7 Level 4

RAID-4 interleaves stripes like RAID-0, but it requires an additional drive to store parity information. The parity is used to offer redundancy: if any one of the drives fail, the data on the remaining drives can be used to reconstruct the data that was on the failed drive. Given N data disks, and one parity disk, the parity stripe is computed by taking one stripe from each of the data disks, and XOR'ing them together. Thus, the storage capacity of a an (N+1)-disk RAID-4 array is N, which is a lot better than mirroring (N+1) drives, and is almost as good as a RAID-0 setup for large N. Note that for N=1, where there is one data disk, and one parity disk, RAID-4 is a lot like mirroring, in that each of the two disks is a copy of each other. However, RAID-4 does NOT offer the read-performance of mirroring, and offers considerably degraded write performance. In brief, this is because updating the parity requires a read of the old parity, before the new parity can be calculated and written out. In an environment with lots of writes, the parity disk can become a bottleneck, as each write must access the parity disk.

SUMMARY

5.8 Level 5

RAID-5 avoids the write-bottleneck of RAID-4 by alternately storing the parity stripe on each of the drives. However, write performance is still not as good as for mirroring, as the parity stripe must still be read and XOR'ed before it is written. Read performance is also not as good as it is for mirroring, as, after all, there is only one copy of the data, not two or more. RAID-5's principle advantage over mirroring is that it offers redundancy and protection against single-drive failure, while offering far more storage capacity when used with three or more drives.

SUMMARY

6. Installation

NOTE: The installation procedure given here for the SBUS controller is similar to that found in the manual. It has been modified so minor variations in the SPARCLinux installation may be included.

6.1 SBUS Controller Compatibility

The 5070 / Linux 2.2 combination was tested on SPARCstation (5, 10, & 20), Ultra 1, and Ultra 2 Creator. The 5070 was also tested on Linux with Symmetrical Multiprocessing (SMP) support on a dual processor Ultra 2 creator 3D with no problems. Other 5070 / Linux / hardware combinations may work as well.

6.2 Hardware Installation Procedure

If your system is already up and running, you must halt the operating system.

GNOME:

  1. From the login screen right click the "Options" button.
  2. On the popup menu select System -> Halt.
  3. Click "Yes" when the verification box appears

KDE:

  1. From the login screen right click shutdown.
  2. On the popup menu select shutdown by right clicking its radio button.
  3. Click OK

XDM:

  1. login as root
  2. Left click on the desktop to bring up the pop-up menu
  3. select "New Shell"
  4. When the shell opens type "halt" at the prompt and press return

Console Login (systems without X windows):

  1. Login as root
  2. Type "halt"

All Systems:

Wait for the message "power down" or "system halted" before proceeding. Turn off your SPARCstation system (Note: Your system may have turned itself off following the power down directive), its video monitor, external disk expansion boxes, and any other peripherals connected to the system. Be sure to check that the green power LED on the front of the system enclosure is not lit and that the fans inside the system are not running. Do not disconnect the system power cord.

SPARCstation 4, 5, 10, 20 & UltraSPARC Systems:

  1. Remove the top cover on the CPU enclosure. On a SPARCstation 10, this is done by loosening the captive screw at the top right corner of the back of the CPU enclosure, then tilting the top of the enclosure forward while using a Phillips screwdriver to press the plastic tab on the top left corner.
  2. Decide which SBUS slot you will use. Any slot will do. Remove the filler panel for that slot by removing the two screws and rectangular washers that hold it in.
  3. Remove the SBUS retainer (commonly called the handle) by pressing outward on one leg of the retainer while pulling it out of the hole in the printed circuit board.
  4. Insert the board into the SBUS slot you have chosen. To insert the board, first engage the top of the 5070 RAIDium backpanel into the backpanel of the CPU enclosure, then rotate the board into a level position and mate the SBUS connectors. Make sure that the SBUS connectors are completely engaged.
  5. Snap the nylon board retainers inside the SPARCstation over the 5070 RAIDium board to secure it inside the system.
  6. Secure the 5070 RAIDium SBUS backpanel to the system by replacing the rectangular washers and screws that held the original filler panel in place.
  7. Replace the top cover by first mating the plastic hooks on the front of the cover to the chassis, then rotating the cover down over the unit until the plastic tab in back snaps into place. Tighten the captive screw on the upper right corner.

Ultra Enterprise Servers, SPARCserver 1000 & 2000 Systems, SPARCserver 6XO MP Series:

  1. Remove the two Allen screws that secure the CPU board to the card cage. These are located at each end of the CPU board backpanel.
  2. Remove the CPU board from the enclosure and place it on a static-free surface.
  3. Decide which SBUS slot you will use. Any slot will do. Remove the filler panel for that slot by removing the two screws and rectangular washers that hold it in. Save these screws and washers.
  4. Remove the SBUS retainer (commonly called the handle) by pressing outward on one leg of the retainer while pulling it out of the hole in the printed circuit board.
  5. Insert the board into the SBUS slot you have chosen. To insert the board, first engage the top of the 5070 RAIDium backpanel into the backpanel of the CPU enclosure, then rotate the board into a level position and mate the SBUS connectors. Make sure that the SBUS connectors are completely engaged.
  6. Secure the 5070 RAIDium board to the CPU board with the nylon screws and standoffs provided on the CPU board. The standoffs may have to be moved so that they match the holes used by the SBUS retainer, as the standoffs are used in different holes for an MBus module. Replace the screws and rectangular washers that originally held the filler panel in place, securing the 5070 RAIDium SBus backpanel to the system enclosure.
  7. Re-insert the CPU board into the CPU enclosure and re-install the Allen-head retaining screws that secure the CPU board.

All Systems:

  1. Mate the external cable adapter box to the 5070 RAIDium and gently tighten the two screws that extend through the cable adapter box.
  2. Connect the three cables from your SCSI devices to the three 68-pin SCSI-3 connectors on the Antares 5070 RAIDium. The three SCSI cables must always be reconnected in the same order after a RAID set has been established, so you should clearly mark the cables and disk enclosures for future disassembly and reassembly.
  3. Configure the attached SCSI devices to use SCSI target IDs other than 7, as that is taken by the 5070 RAIDium itself. Configuring the target number is done differently on various devices. Consult the manufacturer's installation instructions to determine the method appropriate for your device.
  4. As you are likely to be installing multiple SCSI devices, make sure that all SCSI buses are properly terminated. This means a terminator is installed only at each end of each SCSI bus daisy chain.

Verifying the Hardware Installation:

These steps are optional but recommended. First, power-on your system and interrupt the booting process by pressing the "Stop" and "a" keys (or the "break" key if you are on a serial terminal) simultaneously as soon as the Solaris release number is shown on the screen. This will force the system to run the Forth Monitor in the system EPROM, which will display the "ok" prompt. This gives you access to many useful low-level commands, including:

ok show-devs
. . .
/iommu@f,e0000000/sbus@f,e000100SUNW, isp@1,8800000
. . .

The first line in the response shown above means that the 5070 RAIDium host adapter has been properly recognized. If you don't see a line like this, you may have a hardware problem.

Next, to see a listing of all the SCSI devices in your system, you can use the probe-scsi-all command, but first you must prepare your system as follows:

ok setenv auto-boot? False
ok reset
ok probe-scsi-all

This will tell you the type, target number, and logical unit number of every SCSI device recognized in your system. The 5070 RAIDium board will report itself attached to an ISP controller at target 0 with two Logical Unit Numbers (LUNs): 0 for the virtual hard disk drive, and 7 for the connection to the Graphical User Interface (GUI). Note: the GUI communication channel on LUN 7 is currently unused under Linux. See the discussion under "SCSI Monitor Daemon (SMON)" in the "Advanced Topics" section for more information.

REQUIRED: Perform a reconfiguration boot of the operating system:

ok boot -r

If no image appears on your screen within a minute, you most likely have a hardware installation problem. In this case, go back and check each step of the installation procedure. This completes the hardware installation procedure.

6.3 Serial Terminal

If you have a serial terminal at your disposal (e.g. DEC-VT420) it may be connected to the controller's serial port using a 9 pin DIN male to DB25 male serial cable. Otherwise you will need to supplement the above cable with a null modem adapter to connect the RAID controller's serial port to the serial port on either the host computer or a PC. The terminal emulators I have successfully used include Minicom (on Linux), Kermit (on Caldera's Dr. DOS), and Hyperterminal (on a windows CE palmtop), however, any decent terminal emulation software should work. The basic settings are 9600 baud , no parity, 8 data bits, and 1 stop bit.

6.4 Hard Drive Plant

Choosing the brand and capacity of the drives that will form the hard drive physical plant is up to you. I do have some recommendations:

7. 5070 Onboard Configuration

Before diving into the RAID configuration I need to define a few terms.

The test based GUI can be started by typing "agui"

: raid; agui 

at the husky prompt on the serial terminal (or emulator).

Agui is a simple ASCII based GUI that can be run on the RaidRunner console port which enables one to configure the RaidRunner. The only argument agui takes is the terminal type that is connected to the RaidRunner console. Current supported terminals are dtterm, vt100 and xterm. The default is dtterm.

Each agui screen is split into two areas, data and menu. The data area, which generally uses all but the last line of the screen, displays the details of the information under consideration. The menu area, which generally is the bottom line of the screen, displays a strip menu with a title then list of options or sub-menus. Each option has one character enclosed in square brackets (e.g. [Q]uit) which is the character to type to select that option. Each menu line allows you to refresh the screen data (in case another process on the RaidRunner writes to the console). The refresh character may also be used during data entry if the screen is overwritten. The refresh character is either <Control-l> or <Control-r>.

When agui starts, it reads the configuration of the RaidRunner and probes for every possible backend. As it probes for each backend, it's "name" is displayed in the bottom left corner of the screen.

7.1 Main Screen Options

<Figure 1: Main Screen>

The Main screen is the first screen displayed. It provides a summary of the RaidRunner configuration. At the top is the RaidRunner model, version and serial number. Next is a line displaying, for each controller, the SCSI ID's for each host port (labeled A, B, C, etc) and total and currently available amounts of memory. The next set of lines display the ranks of devices on the RaidRunner. Each device follows the nomenclature of <device_type_c.s.l> where device_type_ can be D for disk or T for tape, c is the internal channel the device is attached to, s is the SCSI ID (Rank) of the device on that channel, and l is the SCSI LUN of the device (typically 0).

The next set of lines provide a summary of the Raid Sets configured on the RaidRunner. The summary includes the raid set name, it's type, it's size, the amount of cache allocated to it and a comma separated list of it's backends. See rconf in the "Advanced Topics" section for a full description of the above.

Next are the spare devices configured. Each spare is named (device_type_c.s.l format), followed by it's size (in 512-byte blocks), it's spin state (Hot or Warm), it's controller allocation , and finally it's current status (Used/Unused, Faulty/Working). If used, the raid set that uses it is nominated.

At the bottom of the data area, the number of controllers, channels, ranks and devices are displayed.

The menu line allows one to quit agui or select further actions or sub-menus.

These selections are described in detail below.

7.2 [Q]uit

Exit the agui main screen and return to the husky ( :raid; ) prompt.

7.3 [R]aidSets:

<Figure 2: RAIDSet Configuration Screen>

The Raid Set Configuration screen displays a Raid Set in the data area and provides a menu which allows you to Add, Delete, Modify, Install (changes) and Scroll through all other raid sets (First, Last, Next and Previous). If no raid sets have been configured, only the screen title and menu is displayed. All attributes of the raid set are displayed. For information on each attribute of the raid set, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the Raid Set Configuration screen or select further actions:

7.4 [H]ostports:

<Figure 3: Host Port Configuration Screen>

The Host Port Configuration screen displays for each controller, each host port (labelled A, B, C, etc for port number 0, 1, 2, etc) and the assigned SCSI ID. If the RaidRunner you use, has external switches for host port SCSI ID selection, you may only exit ([Q]uit) from this screen. If the RaidRunner you use, does NOT have external switches for host port SCSI ID selection, then you may modify (and hence install) the SCSI ID for any host port. The menu line allows one to leave the Host Port Configuration screen or select further actions (if NO external host):

7.5 [S]pares:

<Figure 4: Spare Device Configuration Screen>

The Spare Device Configuration screen displays all configured spare devices in the data area and provides a menu which allows you to Add, Delete, Mod­ ify and Install (changes) spare devices. If no spare devices have been configured, only the screen title and menu is displayed. Each spare device displayed, shows it's name (in device_type_c.s.l format), it's size in 512-byte blocks, it's spin status (Hot or Warm), it's controller allocation, finally it's current status (Used/Unused, Faulty/Working). If used, the raid set that uses it is nominated. For information on each attribute of a spare device, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the Spare Device Configuration screen or select further actions:

7.6 [M]onitor:

<Figure 5: SCSI Monitor Screen>

The SCSI Monitor Configuration screen displays a table of SCSI monitors configured for the RaidRunner. Up to four SCSI monitors may be configured. The table columns are entitled Controller, Host Port, SCSI LUN and Protocol and each line of the table shows the appropriate SCSI Monitor attribute. For details on SCSI Monitor attributes, see the rconf command in the "Advanced Topics" section. The menu line allows one to leave the SCSI Monitor Configuration screen or modify and install the table.

7.7 [G]eneral:

<Figure 6: General Screen>

The General screen has a blank data area and a menu which allows one to Quit and return to the main screen, or to select further sub-menus which provide information about Devices, the System Message Logger, Global Environment variables and throughput Statistics.

7.8 [P]robe

The probe option re-scans the SCSI channels and updates the backend list with the hardware it finds.

7.9 Example RAID Configuration Session

The generalized procedure for configuration consists of three steps arranged in the following order:

  1. Configuring the Host Port(s)
  2. Assigning Spares
  3. Configuring the RAID set

Note that there is a minimum number of backends required for the various supported RAID levels:

In this example we will configure a RAID 5 using 6, 2.04 gigabyte drives. The total capacity of the virtual drive will be 10 gigabytes (the equivalent of one drive is used for redundancy). This same configuration procedure can be used to configure other levels of RAID sets by changing the type parameter.

  1. Power on the computer with the serial terminal connected to the RaidRunner's serial port.
  2. When the husky ( :raid; ) prompt appears, Start the GUI by typing "agui" and pressing return.
  3. When the main screen appears, select "H" for [H]ostport configuration
  4. On some models of RaidRunner the host port in not configurable. If you have only a [Q]uit option here then there is nothing further to be done for the host port configuration, note the values and skip to step 6. If you have add/modify options then your host port is software configurable.
  5. If there is no entry for a host port on this screen, add an entry with the parameters: controller=0, hostport=0 , SCSI ID=0. Don't forget to [I]nstall your changes. If there is already and entry present, note the values (they will be used in a later step).
  6. From this point onward I will assume the following hardware configuration:
    1. There are 7 - 2.04 gig drives connected as follows:
      1. 2 drives on SCSI channel 0 with SCSI IDs 0 and 1 (backends 0.0.0, and 0.1.0, respectively).
      2. 3 drives on SCSI channel 1 with SCSI IDs 0 ,1 and 5 (backends 1.0.0, 1.1.0, and 1.5.0).
      3. 2 drives on SCSI channel 2 with SCSI IDs 0 and 1 (backends 2.0.0 and 2.1.0).
    2. Therefore:
      1. Rank 0 consists of backends 0.0.0, 1.0.0, 2.0.0
      2. Rank 1 consists of backends 0.1.0, 1.1.0, 2.1.0
      3. Rank 5 contains only the backend 1.5.0
    3. The RaidRunner is assigned to controller 0, hostport 0
  7. Press Q to [Q]uit the hostports screen and return to the Main screen.
  8. Press S to enter the [S]pares screen
  9. Select A to [A]dd a new spare to the spares pool. A list of available backends will be displayed and you will be prompted for the following information:
    Enter the device name to add to spares - from above: 
    

enter

D1.5.0

      1. Select I to [I]nstall your changes
      2. Select Q to [Q]uit the spares screen and return to the Main screen
      3. Select R from the Main screen to enter the [R]aidsets screen.
      4. Select A to [A]dd a new RAID set. You will be prompted for each of the RAID set parameters. The prompts and responses are given below.
    1. Enter the name of Raid Set: cim_homes (or whatever you want to call it).
    2. Raid set type [0,1,3,5]: 5
    3. Enter initial host interface - ctlr,hostport,scsilun: 0.0.0 Now a list of the available backends will be displayed in the form: 0 - D0.0.0 1 - D1.0.0 2 - D2.0.0 3 - D0.1.0 4 - D1.1.0 5 - D2.1.0
    4. Enter index from above - Q to Quit: 1 press return 2 press return 3 press return 4 press return 5 press return Q
  1. After pressing Q you will be returned to the Raid Sets screen. You should see the newly configured Raid set displayed in the data area.
  2. Press I to [I]nstall the changes <Figure 12: The RaidSets screen of the GUI showing the newly configured RAID 5>
  3. Press Q to exit the RaidSet screen and return to the Main screen
  4. Press Q to [Q]uit agui and exit to the husky prompt.
  5. type "reboot" then press enter. This will reboot the RaidRunner (not the host machine.)
  6. When the RaidRunner reboots it will prepare the drives for the newly configured RAID. NOTE: Depending on the size of the RAID this could take a few minutes to a few hours. For the above example it takes the 5070 approximately 10 - 20 minutes to stripe the RAID set.
  7. Once you see the husky prompt again the RAID is ready for use. You can then proceed with the Linux configuration.

8. Linux Configuration

These instructions cover setting up the virtual RAID drives on RedHat Linux 6.1. Setting it up under other Linux distributions should not be a problem. The same general instructions apply.

If you are new to Linux you may want to consider installing Linux from scratch since the RedHat installer will do most of the configuration work for you. If so skip to section titled "New Linux Installation." Otherwise go to the "Existing Linux Installation" section (next).

8.1 Existing Linux Installation

Follow these instructions if you already have Redhat Linux installed on your system and you do not want to re-install. If you are installing the RAID as part of a new RedHat Linux installation (or are re-installing) skip to the "New Linux Installation" section.

QLogic SCSI Driver

The driver can either be loaded as a module or compiled into your kernel. If you want to boot from the RAID then you may want to use a kernel with compiled in QLogic support (see the kernel-HOWTO available from http://www.linuxdoc.org. To use the modular driver become the superuser and add the following lines to /etc/conf.modules:

alias qlogicpti /lib/modules/preferred/scsi/qlogicpti 

Change the above path to where ever your SCSI modules live. Then add the following line to you /etc/fstab (with the appropriate changes for device and mount point, see the fstab man page if you are unsure)

/dev/sdc1 /home ext2 defaults 1 2

Or, if you prefer to use a SYSV initialization script, create a file called "raid" in the /etc/rc.d/init.d directory with the following contents (NOTE: while there are a few good reasons to start the RAID using a script, one of the aforementioned methods would be preferable):

#!/bin/bash

case "$1" in

      start)
      echo "Loading raid module"
      /sbin/modprobe qlogicpti
      echo
      echo "Checking and Mounting raid volumes..."
      mount -t ext2 -o check /dev/sdc1 /home
      touch /var/lock/subsys/raid
      ;;

      stop)
      echo "Unmounting raid volumes"
      umount /home
      echo "Removing raid module(s)"
      /sbin/rmmod qlogicpti
      rm -f /var/lock/subsys/raid
      echo
      ;;

      restart)

          $0 stop 
          $0 start 
      ;; 

      *)

      echo "Usage: raid {start|stop|restart}"

      exit 1

esac

exit 0 

You will need to edit this example and substitute your device name(s) in place of /dev/sdc1 and mount point(s) in place of /home. The next step is to make the script executable by root by doing:

chmod 0700 /etc/rc.d/init.d/raid

Now use your run level editor of choice (tksysv, ksysv, etc.) to add the script to the appropriate run level.

Device mappings

Linux uses dynamic device mappings you can determine if the drives were found by typing:

more /proc/scsi/scsi

one or more of the entries should look something like this:

Host: scsi1 Channel: 00 Id: 00 Lun: 00 
Vendor: ANTARES Model: CX106 Rev: 0109
Type: Direct-Access ANSI SCSI revision: 02

There may also be one which looks like this:

Host: scsi1 Channel: 00 Id: 00 Lun: 07
Vendor: ANTARES Model: CX106-SMON Rev: 0109
Type: Direct-Access ANSI SCSI revision: 02

This is the SCSI monitor communications channel which is currently un-used under Linux (see SMON in the advanced topics section below).

To locate the drives (following reboot) type:

dmesg | more

Locate the section of the boot messages pertaining to you SCSI devices. You should see something like this:

qpti0: IRQ 53 SCSI ID 7 (Firmware v1.31.32)(Firmware 1.25 96/10/15) 
[Ultra Wide, using single ended interface]
QPTI: Total of 1 PTI Qlogic/ISP hosts found, 1 actually in use.
scsi1 : PTI Qlogic,ISP SBUS SCSI irq 53 regs at fd018000 PROM node ffd746e0
 

Which indicates that the SCSI controller was properly recognized, Below this look for the disk section:

Vendor ANTARES Model: CX106 Rev: 0109
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi1, channel 0, id 0, lun 0
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 20971200 [10239
 MB] [10.2 GB]

Note the line that reads "Detected scsi disk sdc ..." this tells you that this virtual disk has been mapped to device /dev/sdc. Following partitioning the first partition will be /dev/sdc1, the second will be /dev/sdc2, etc. There should be one of the above disk sections for each virtual disk that was detected. There may also be an entry like the following:

Vendor ANTARES Model: CX106-SMON Rev: 0109
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdd at scsi1, channel 0, id 0, lun 7
SCSI device sdd: hdwr sector= 512 bytes. Sectors= 20971200 [128 MB]
 [128.2 MB]

BEWARE: this is not a drive DO NOT try to fdisk, mkfs, or mount it!! Doing so WILL hang your system.

Partitioning

A virtual drive appears to the host operating system as a large but otherwise ordinary SCSI drive. Partitioning is performed using fdisk or your favorite utility. You will have to give the virtual drive a disk label when fdisk is started. Using the choice "Custom with autoprobed defaults" seems to work well. See the man page for the given utility for details.

Installing a filesystem

Installing a filesystem is no different from any other SCSI drive:

mkfs -t <filesystem_type> /dev/<device>

for example:

mkfs -t ext2 /dev/sdc1

Mounting

If QLogic SCSI support is compiled into you kernel OR you are loading the "qlogicpti" module at boot from /etc/conf.modules then add the following line(s) to the /etc/fstab:

/dev/<device> <mount point> ext2 defaults 1 1

If you are using a SystemV initialization script to load/unload the module you must mount/unmount the drives there as well. See the example script above.

8.2 New Linux Installation

This is the easiest way to install the RAID since the RedHat installer program will do most of the work for you.

  1. Configure the host port, RAID sets, and spares as outlined in "Onboard Configuration." Your computer must be on to perform this step since the 5070 is powered from the SBUS. It does not matter if the computer has an operating system installed at this point all we need is power to the controller card.
  2. Begin the RedHat SparcLinux installation
  3. The installation program will auto detect the 5070 controller and load the Qlogic driver
  4. Your virtual RAID drives will appear as ordinary SCSI hard drives to be partitioned and formatted during the installation. NOTE: When using the graphical partitioning utility during the RedHat installation DO NOT designate any partition on the virtual drives as type RAID since they are already hardware managed virtual RAID drives. The RAID selection on the partitioning utilities screen is for setting up a software RAID. IMPORTANT NOTE: you may see a small SCSI drive ( usually ~128 MB) on the list of available drives. DO NOT select this drive for use. It is the SMON communication channel NOT a drive. If setup tries to use it the installer will hang.
  5. Thats it, the installation program takes care of everything else !!

9. Maintenance

9.1 Activating a spare

When running a RAID 3 or 5 (if you configured one or more drives to be spares) the 5070 will detect when a drive goes offline and automatically select a spare from the spares pool to replace it. The data will be rebuilt on-the-fly. The RAID will continue operating normally during the re-construction process (i.e. it can be read from and written to just is if nothing has happened). When a backend fails you will see messages similar to the following displayed on the 5070 console:

930 secs: Redo:1:1 Retry:1 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/Selection
 Time-out @682400+16
932 secs: Redo:1:1 Retry:2 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/Selection
 Time-out @682400+16
933 secs: Redo:1:1 Retry:3 (DIO_cim_homes_D1.1.0_q1) CDB=28(Read_10)Re-/Selection
 Time-out @682400+16
934 secs: CIO_cim_homes_q3 R5_W(3412000, 16): Pre-Read drive 4 (D1.1.0)
 fails with result "Re-/Selection Time-out"
934 secs: CIO_cim_homes_q2 R5: Drained alternate jobs for drive 4 (D1.1.0)
934 secs: CIO_cim_homes_q2 R5: Drained alternate jobs for drive 4 (D1.1.0)
 RPT 1/0
934 secs: CIO_cim_homes_q2 R5_W(524288, 16): Initial Pre-Read drive 4 (D1.1.0)
 fails with result "Re-/Selection Time-out"
935 secs: Redo:1:0 Retry:1 (DIO_cim_homes_D1.0.0_q1) CDB=28(Read_10)SCSI
 Bus ~Reset detected @210544+16
936 secs: Failed:1:1 Retry:0 (rconf) CDB=2A(Write_10)Re-/Selection Time-out
 @4194866+128

Then you will see the spare being pulled from the spares pool, spun up, tested, engaged, and the data reconstructed.

937 secs: autorepair pid=1149 /raid/cim_homes: Spinning up spare device
938 secs: autorepair pid=1149 /raid/cim_homes: Testing spare device/dev/hd/1.5.0/data
939 secs: autorepair pid=1149 /raid/cim_homes: engaging hot spare ...
939 secs: autorepair pid=1149 /raid/cim_homes: reconstructing drive 4 ...
939 secs: 1054
939 secs: Rebuild on /raid/cim_homes/repair: Max buffer 2800 in 7491 reads,
 priority 6 sleep 500

The rebuild script will printout its progress every 10% of the job completed

939 secs: Rebuild on /raid/cim_homes/repair @ 0/7491
1920 secs: Rebuild on /raid/cim_homes/repair @ 1498/7491
2414 secs: Rebuild on /raid/cim_homes/repair @ 2247/7491
2906 secs: Rebuild on /raid/cim_homes/repair @ 2996/7491

9.2 Re-integrating a repaired drive into the RAID (levels 3 and 5)

After you have replaced the bad drive you must re-integrate it into the RAID set using the following procedure.

  1. Start the text GUI
  2. Look the list of backends for the RAID set(s).
  3. Backends that have been marked faulty will have a (-) to the right of their ID ( e.g. D1.1.0- ).
  4. If you set up spares the ID of the faulty backend will be followed by the ID of the spare that has replaced it ( e.g. D1.1.0-D1.5.0 ) .
  5. Write down the ID(s) of the faulty backend(s) (NOT the spares).
  6. Press Q to exit agui
  7. At the husky prompt type:
    replace <name> <backend> 
    

Where <name> is whatever you named the raid set and <backend> is the ID of the backend that is being re-integrated into the RAID. If a spare was in use it will be automatically returned to the spares pool. Be patient, reconstruction can take a few minutes to several hours depending on the RAID level and the size. Fortunately, you can use the RAID as you normally would during this process.

10. Troubleshooting / Error Messages

10.1 Out of band temperature detected...

10.2 ... failed ... cannot have more than 1 faulty backend.

10.3 When booting I see: ... Sun disklabel: bad magic 0000 ... unknown partition table.

11. Bugs

None yet! Please send bug reports to tdc3@psu.edu

12. Frequently Asked Questions

12.1 How do I reset/erase the onboard configuration?

At the husky prompt issue the following command:

rconf -init

This will delete all of the RAID configuration information but not the global variables and scsi monitors. the remove ALL configuration information type:

rconf -fullinit

Use these commands with caution!

12.2 How can I tell if a drive in my RAID has failed?

In the text GUI faulty backends appear with a (-) to the right of their ID. For example the list of backends:

D0.0.0,D1.0.0-,D2.0.0,D0.1.0,D1.1.0,D2.1.0

Indicates that backend (drive) D1.0.0 is either faulty or not present. If you assigned spares (RAID 3 or 5) then you should also see that one or more spares are in use. Both the main and the and the RaidSets screens will show information on faulty/not present drives in a RAID set.

13. Advanced Topics: 5070 Command Reference

In addition to the text based GUI the RAID configuration may also be manipulated from the husky prompt ( the : raid; prompt) of the onboard controller. This section describes commands that a user can input interactively or via a script file to the K9 kernel. Since K9 is an ANSI C Application Programming Interface (API) a shell is needed to interpret user input and form output. Only one shell is currently available and it is called husky. The K9 kernel is modelled on the Plan 9 operating system whose design is discussed in several papers from AT&T (See the "Further Reading" section for more information). K9 is a kernel targeted at embedded controllers of small to medium complexity (e.g. ISDN-ethernet bridges, RAID controllers, etc). It supports multiple lightweight processes (i.e. without memory management) on a single CPU with a non-pre-emptive scheduler. Device driver architecture is based on Plan 9 (and Unix SVR4) STREAMS. Concurrency control mechanisms include semaphores and signals. The husky shell is modelled on a scaled down Unix Bourne shell.

Using the built-in commands the user can write new scripts thus extending the functionality of the 5070. The commands (adapted from the 5070 man pages) are extensive and are described below.

13.1 AUTOBOOT - script to automatically create all raid sets and scsi monitors

13.2 AUTOFAULT - script to automatically mark a backend faulty after a drive failure

13.3 AUTOREPAIR - script to automatically allocate a spare and reconstruct a raid set

After parsing it's arguments (command and environment) autorepair gets a spare device from the RaidRunner's spares spool. It then engages it in write-only mode and reads the complete raid device which reconstructs the data on the spare. The read is from the raid file system repair entrypoint. Reading from this entrypoint causes a read of a block immediately followed by a write of that block. The read/write sequence is atomic (i.e is not interruptible). Once the reconstruction has completed, a check is made to ensure the spare did not fail during reconstruction and if not, the access mode of the spare device is set to the access mode of the raid set. The process that reads the repair entrypoint is rebuild.

This device reconstruction will take anywhere from 10 minutes to one and a half hours depending on both the size and speed of the backends and the amount of activity the host is generating.

During device reconstruction, pairs of numbers will be printed indicating each 10% of data reconstructed. The pairs of numbers are separated by a slash character, the first number being the number of blocks reconstructed so far and the second being the number of blocks to be reconstructed. Further status about the rebuild can be gained from running rebuild.

When the spare is allocated both the number of spares currently used on the backend and the spare device name is printed. The number of spares on a backend is referred to the depth of spares on the backend. Thus prior to re-engaging the spare after a reconstruction a check can be made to see if the depth is the same. If it is not, then the spare reconstruction failed and reconstruction using another spare is underway (or no spares are available), and hence we don't re-engage the drive.

13.4 BIND - combine elements of the namespace

13.5 BUZZER - get the state or turn on or off the buzzer

13.6 CACHE - display information about and delete cache ranges

13.7 CACHEDUMP - Dump the contents of the write cache to battery backed-up ram

13.8 CACHERESTORE - Load the cache with data from battery backed-up ram

13.9 CAT - concatenate files and print on the standard output

13.10 CMP - compare the contents of 2 files

13.11 CONS - console device for Husky

bind -k cons /dev/cons

On a Unix system this is equivalent to:

bind -k unixfd /dev/cons

On a DOS system this is equivalent to:

bind -k doscon /dev/cons

On target hardware using a SCN2681 chip this is equivalent to:

bind -k scn2681 /dev/cons

13.12 DD - copy a file (disk, etc)

The number after the "+" is the number of fractional blocks (i.e. blocks that are less than the block size) involved. This number will usually be zero (and is otherwise when physical media with alignment requirements is involved).

A write failure outputting the last block on the previous example would cause the following output:

Write failed
8+0 records in
7+0 records out

13.13 DEVSCMP - Compare a file's size against a given value

13.14 DFORMAT- Perform formatting functions on a backend disk drive

13.15 DIAGS - script to run a diagnostic on a given device

13.16 DPART - edit a scsihd disk partition table

13.17 DUP - open file descriptor device

13.18 ECHO - display a line of text

13.19 ENV- environment variables file system

13.20 ENVIRON - RaidRunner Global environment variables - names and effects

To over-ride the number of parity buffers for ALL raid 3's (and set only 72 parity buffers) set

: raid ; setenv RAID3_Default_PBUFS 128

If you set a default for all raid sets of a particular type, but want ONE of them to be different then set up a variable for that particular raid set as it's value will over-ride the default. In the above example, where all Raid Type 3 will have 128 parity buffers, you could set the variable

: raid ; setenv RAID3_Dbase_PBUFS 56 

which will allow the raid 3 raid set named 'Dbase' to have 56 parity buffers, but all other raid 3's defined on the RaidRunner will have 128.

13.21 EXEC - cause arguments to be executed in place of this shell

13.22 EXIT - exit a K9 process

13.23 EXPR - evaluation of numeric expressions

13.24 FALSE - returns the K9 false status

13.25 FIFO - bi-directional fifo buffer of fixed size

13.26 GET - select one value from list

13.27 GETIV - get the value an internal RaidRunner variable

13.28 HELP - print a list of commands and their synopses

13.29 HUSKY - shell for K9 kernel

13.30 HWCONF - print various hardware configuration details

13.31 HWMON - monitoring daemon for temperature, fans, PSUs.

13.32 INTERNALS - Internal variables used by RaidRunner to change dynamics of running kernel

13.33 KILL - send a signal to the nominated process

13.34 LED- turn on/off LED's on RaidRunner

13.35 LFLASH- flash a led on RaidRunner

13.36 LINE - copies one line of standard input to standard output

13.37 LLENGTH - return the number of elements in the given list

13.38 LOG - like zero with additional logging of accesses

13.39 LRANGE - extract a range of elements from the given list

13.40 LS - list the files in a directory

13.41 LSEARCH - find the a pattern in a list

13.42 LSUBSTR - replace a character in all elements of a list

13.43 MEM - memory mapped file (system)

13.44 MDEBUG - exercise and display statistics about memory allocation

13.45 MKDIR - create directory (or directories)

13.46 MKDISKFS - script to create a disk filesystem

13.47 MKHOSTFS - script to create a host port filesystem

13.48 MKRAID - script to create a raid given a line of output of rconf

13.49 MKRAIDFS - script to create a raid filesystem

13.50 MKSMON - script to start the scsi monitor daemon smon

13.51 MKSTARGD - script to initialize a scsi target daemon for a given raid set

13.52 MSTARGD - monitor for stargd

13.53 NICE - Change the K9 run-queue priority of a K9 process

13.54 NULL- file to throw away output in

13.55 PARACC - display information about hardware parity accelerator

13.56 PEDIT - Display/modify SCSI backend Mode Parameters Pages

13.57 PIPE - two way interprocess communication

13.58 PRANKS - print or set the accessible backend ranks for the current controller

13.59 PRINTENV - print one or all GLOBAL environment variables

13.60 PS - report process status

13.61 PSCSIRES - print SCSI-2 reservation table for all or specific monikers

13.62 PSTATUS - print the values of hardware status registers

13.63 RAIDACTION- script to gather/reset stats or stop/start a raid set's stargd

13.64 RAID0 - raid 0 device

13.65 RAID1 - raid 1 device

13.66 RAID3 - raid 3 device

13.67 RAID4 - raid 4 device

13.68 RAID5 - raid 5 device

13.69 RAM - ram based file system

13.70 RANDIO - simulate random reads and writes

13.71 RCONF, SPOOL, HCONF, MCONF, CORRUPT-CONFIG - raid configuration and spares management

13.72 REBOOT - exit K9 on target hardware + return to monitor

13.73 REBUILD - raid set reconstruction utility

13.74 REPAIR - script to allocate a spare to a raid set's failed backend

13.75 REPLACE - script to restore a backend in a raid set

13.76 RM - remove the file (or files)

13.77 RMON - Power-On Diagnostics and Bootstrap

13.78 RRSTRACE - disassemble scsihpmtr monitor data

13.79 RSIZE - estimate the memory usage for a given raid set

13.80 SCN2681 - access a scn2681 (serial IO device) as console

13.81 SCSICHIPS - print various details about a controller's scsi chips

13.82 SCSIHD - SCSI hard disk device (a SCSI initiator)

13.83 SCSIHP - SCSI target device

13.84 SET - set (or clear) an environment variable

13.85 SCSIHPMTR - turn on host port debugging

13.86 SETENV - set a GLOBAL environment variable

13.87 SDLIST - Set or display an internal list of attached disk drives

13.88 SETIV - set an internal RaidRunner variable

13.89 SHOWBAT - display information about battery backed-up ram

13.90 SHUTDOWN - script to place the RaidRunner into a shutdown or quiescent state

13.91 SLEEP - sleep for the given number of seconds

13.92 SMON - RaidRunner SCSI monitor daemon

13.93 SOS - pulse the buzzer to emit sos's

13.94 SPEEDTST - Generate a set number of sequential writes then reads

13.95 SPIND - Spin up or down a disk device

13.96 SPINDLE - Modify Spindle Synchronization on a disk device

13.97 SRANKS - set the accessible backend ranks for a controller

13.98 STARGD - daemon for SCSI-2 target

13.99 STAT - get status information on the named files (or stdin)

13.100 STATS - Print cumulative performance statistics on a Raid Set or Cache Range

13.101 STRING - perform a string operation on a given value

13.102 SUFFIX - Suffixes permitted on some big decimal numbers

13.103 SYSLOG - device to send system messages for logging

13.104 SYSLOGD - initialize or access messages in the system log area

  1. EMERG: messages of an extremely serious nature from which the RaidRunner cannot recover
  2. ALERT: messages of a serious nature from which the RaidRunner can only partially recover
  3. CRIT: messages of a serious nature from which the RaidRunner can almost fully recover
  4. ERR: messages indicating internal errors
  5. WARNING: messages of a serious from which the RaidRunner can fully recover, for example automatic allocation of hot spare to Raid 1, 3 or 5 file system.
  6. NOTICE :messages logged via writes to syslog device
  7. INFO: informative messages
  8. DEBUG: debugging messages options are given and cnt is set to 20.
  9. REPEATS: Indicates that the previous message has been repeated N times every S seconds since it's initial entry.

13.105 TEST - condition evaluation command

13.106 TIME - Print the number of seconds since boot (or reset of clock)

13.107 TRAP - intercept a signal and perform some action

13.108 TRUE - returns the K9 true status

13.109 STTY or TTY - print the user's terminal mount point or terminfo status

13.110 UNSET - delete one or more environment variables

13.111 UNSETENV - unset (delete) a GLOBAL environment variable

13.112 VERSION - print out the version of the RaidRunner kernel

13.113 WAIT - wait for a process (or my children) to terminate

13.114 WARBLE - periodically pulse the buzzer

13.115 XD- dump given file(s) in hexa-decimal to standard out

13.116 ZAP - write zeros to a file

SYNOPSIS: zap [-b blockSize] [-f byteVal] count offset <>[3] store

DESCRIPTION: zap writes count * 8192 bytes of zeros at byte position offset * 8192 into file store (which is opened and associated with file descriptor 3). Both count and offset may have a suffix. The optional "-b" switch allows the block size to be set to blockSize bytes. The default block size is 8192 bytes. The optional "-f" switch allows the fill character to be set to byteVal which should be a number in the range 0 to 255 (inclusive). The default fill character is 0 (i.e. zero). Every 100 write operations the current count is output (usually overwriting the previous count output). Errors on the write operations are ignored.

SEE ALSO: suffix

13.117 ZCACHE - Manipulate the zone optimization IO table of a Raid Set's cache

13.118 ZERO - file when read yields zeros continuously

13.119 ZLABELS - Write zeros to the front and end of Raid Sets

14. Advanced Topics: SCSI Monitor Daemon (SMON)

Another way of communicating with the onboard controller from the host operating system is using the SCSI Monitor (SMON) facility. SMON provides an ASCII communication channel on an assigned SCSI ID and LUN. The commands discussed in section 7 may also be issued over this channel to manipulate the RAID configuration and operation. This mechanism is utilized under Solaris to provide a communication channel between an X Based GUI and the RAID controller. It is currently un-utilized under Linux. See the description of the smon daemon in the 5070 command reference above.

15. Further Reading