POST POSITIONS: THE PCI STORAGE REVOLUTION
Steve Katz
Issue: June 1, 2009

POST POSITIONS: THE PCI STORAGE REVOLUTION

PCI attachment (based on the PCIe architecture) provides maximum bandwidth at minimum cost per storage unit by eliminating the secondary bus bridges required for all other connection methods. Since the host server already has a PCI (peripheral connection interconnect) bridge included to provide communications between the frontside (system) bus and its attached CPU(s), connecting storage systems directly to that creates the closest, widest-bandwidth interface possible — giving way to wide bandwidth, cost-effective direct-attached storage.

HOW IT WORKS


Using a simple host adapter card plugged into a multilane PCI Express slot in the host computer's mother board, we create an external PCIe I/O port for cable attachment. This is not a bus bridge, it's just a pass-through method of allowing cable attachment to the host PCIe bus.
We then expand the PCIe bus by an external cable to a data storage system containing a PCIe-to-SAS/SATA adapter. Normally, that adapter is a "hardware RAID" controller having its own high-speed processor and enabling RAID functionality for high performance and fault tolerance. The connection speed from the PCIe host to that storage system RAID controller is the full bandwidth of the PCIe bus to which it is connected, which ensures the utmost in data throughput rates.
First-generation PCIe has a bus speed of 2.5GHz and bandwidth of 500MB/s per "lane." Today, common PCIe slots are x4, x8 and x16, or 4-8-16 lanes. At 500MB/s per lane, the ubiquitous x8 (eight lane) slot provides 4,000MB/s maximum transfer rates in theory. In practice, we can achieve about 3,200MB/s maximum using real-world hardware.
Second-generation PCIe has a bus speed of 5.0GHz and double the bandwidth, or 1GB/s per lane. Thus, a second-generation PCIe x8 slot can provide up to 8,000MB/s theoretical transfers. This is faster than can be provided by 8Gb/s Fibre Channel, 12X InfiniBand, 10GigEthernet, or any other external bus architecture.
The resident hardware PCIe RAID controller, co-located with the storage disk subsystem, provides disk drive management without burdening host CPU resources like a software RAID system would. 
By controlling small batches of disk devices, e.g., eight disks per RAID controller, we can maximize controller performance (throughput) to provide near theoretical maximum data transfer rates to and from the disk devices themselves. Eight 7,200rpm SATA drives with 80MB/s maximum transfer rates in a RAID provide 640 MB/s (max) rates.

NOW FOR A SWITCH


Since PCIe is serial switched architecture, the way to aggregate connected devices is by the use of switches. The biggest difference from Fibre Channel or Ethernet is the switches are running at higher speeds and are located electrically closer to the CPU system.
We do this to either add "more" storage (capacity) or to improve data transfer rates by throughput aggregation — or both. 
Replacing the single-port host card with a two-port PCIe switch (PCIe card) allows connection to two PCIe RAID controllers to a slot, using two cables. The two controllers may be in two different storage shelves, or co-located in the same shelf, in two different I/O modules, each addressing eight of the 16 resident drives. The aggregated data transfers have increased from 640MB/s to ~1.2GB/s — all done with only 16 disks.
We can continue to add PCIe switches, which have minimal latency, to aggregate bandwidth until the host PCIe bus is saturated. With PCIe 2.0 and 16-lane connections, this would occur at 16GB/s. At 1GB/s per (Gen. 2) PCIe lane and disks that can read and write at 80MB/s burst speeds, this would take about 200 disk drives to accomplish. Using 1TB SATA disks, this can build an array of 200TB at a system cost of about $1/GB. 1.5TB SATA drives are now available and 2TB drives are on most suppliers' roadmaps for mid-2009.
There isn't any other technology that accomplishes so much so inexpensively. Expansion is easy and comes with virtually no hit in performance!

OOPS, OUT OF SLOTS!

PCI Express is a simple switched architecture and devices are commonly available to create bus expansion products. A simple rack-mount unit contains a multi-slot PCIe backplane connected (by external PCIe cable) to a host computer to provide additional PCIe slots, which are an extension of the computer's internal bus. Performance is the same as using the host computer's internal slots, except now they are external. Any PCIe-compatible peripherals may be plugged in to such an expansion chassis, just as though they were plugged in to the host.
This kind of product allows us to use additional PCIe switches to connect more storage units, or allow plug-in HD-plus video capture cards, 10GE cards or any other PCIe devices.
This is the great thing about PCIe: it's ubiquitous and already used for just about everything that connects to modern computers. It's also backwards compatible with original PCI devices so that software used with original (parallel) PCI devices will also work with serial PCIe hardware. PCIe devices are also "hot pluggable," capable of being added or subtracted from systems without the need to power down the system. 

IN SUMMARY


PCI attached storage is the way of the future. It's high bandwidth, high performance and low cost, and expands easily with virtually no hit in performance. Scalable systems can handle hundreds of SD streams, tens of HD streams and multiple 2K and 4K workflows at cost points previously unattainable.

Steve Katz is an executive at JMR Electronics (www.jmr.com) in Chatsworth, CA. His 12 years there follows a career in aerospace applications engineering and field sales, as well time at HP in product design.