Saturday, January 17, 2015

Crazy Performance From Something So Small

So, I did a refresh on my home machine recently, or really just an entirely new machine... I picked up a used Dell Precision T7500 workstation (24 GB memory, 2 x Xeon W5590 processors). I also bought a used Fusion-io ioDrive 160 GB SLC flash memory device. I knew it was going to be fast, but was surprised at just how fast with such a little card.

I'm running Fedora 21 "Workstation" on this system. The drive for this card, called "VSL" is available from fusionio.com but you need to create an account first to access it. It also appears there is a newer version of the driver/firmware if you pay for a support contract. I used the 2.3.11 version of driver, and it lists supporting Fedora 17. The driver is written for older kernels, so I had to change it a bit to work with 3.x -- let me know if you're interested in the changes needed for newer kernels.

Anyhow, here is a quick peak at the performance numbers on this system using the fio tool...

--snip--
# fio --bs=4k --direct=1 --rw=randread --ioengine=libaio --iodepth=64 --name=/dev/fioa --size=10G
/dev/fioa: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [750.3MB/0KB/0KB /s] [192K/0/0 iops] [eta 00m:00s]
/dev/fioa: (groupid=0, jobs=1): err= 0: pid=1406: Sat Jan 17 11:00:38 2015
  read : io=10240MB, bw=763767KB/s, iops=190941, runt= 13729msec
    slat (usec): min=1, max=172, avg= 2.85, stdev= 2.61
    clat (usec): min=199, max=3604, avg=331.24, stdev=77.36
     lat (usec): min=201, max=3625, avg=334.22, stdev=77.33
    clat percentiles (usec):
     |  1.00th=[  245],  5.00th=[  253], 10.00th=[  270], 20.00th=[  294],
     | 30.00th=[  318], 40.00th=[  326], 50.00th=[  330], 60.00th=[  330],
     | 70.00th=[  334], 80.00th=[  350], 90.00th=[  402], 95.00th=[  426],
     | 99.00th=[  454], 99.50th=[  462], 99.90th=[  540], 99.95th=[ 2544],
     | 99.99th=[ 2992]
    bw (KB  /s): min=673840, max=768568, per=100.00%, avg=763737.48, stdev=18102.33
    lat (usec) : 250=3.48%, 500=96.39%, 750=0.04%, 1000=0.01%
    lat (msec) : 2=0.03%, 4=0.06%
  cpu          : usr=23.24%, sys=62.81%, ctx=254638, majf=0, minf=664
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=2621440/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=10240MB, aggrb=763767KB/s, minb=763767KB/s, maxb=763767KB/s, mint=13729msec, maxt=13729msec

Disk stats (read/write):
  fioa: ios=2607327/0, merge=31/0, ticks=815401/0, in_queue=815145, util=99.34%
--snip--

--snip--
# fio --bs=4k --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/fioa --size=10G
/dev/fioa: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0KB/747.3MB/0KB /s] [0/191K/0 iops] [eta 00m:00s]
/dev/fioa: (groupid=0, jobs=1): err= 0: pid=1433: Sat Jan 17 11:01:49 2015
  write: io=10240MB, bw=746955KB/s, iops=186738, runt= 14038msec
    slat (usec): min=1, max=192, avg= 3.33, stdev= 2.83
    clat (usec): min=192, max=3048, avg=338.28, stdev=70.32
     lat (usec): min=194, max=3052, avg=341.74, stdev=70.41
    clat percentiles (usec):
     |  1.00th=[  262],  5.00th=[  282], 10.00th=[  298], 20.00th=[  310],
     | 30.00th=[  318], 40.00th=[  322], 50.00th=[  330], 60.00th=[  334],
     | 70.00th=[  342], 80.00th=[  366], 90.00th=[  398], 95.00th=[  414],
     | 99.00th=[  454], 99.50th=[  478], 99.90th=[ 1144], 99.95th=[ 2024],
     | 99.99th=[ 2800]
    bw (KB  /s): min=660624, max=765872, per=99.99%, avg=746907.14, stdev=25759.49
    lat (usec) : 250=0.32%, 500=99.39%, 750=0.18%, 1000=0.01%
    lat (msec) : 2=0.06%, 4=0.05%
  cpu          : usr=23.67%, sys=68.75%, ctx=110028, majf=0, minf=431
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=2621440/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=10240MB, aggrb=746955KB/s, minb=746955KB/s, maxb=746955KB/s, mint=14038msec, maxt=14038msec

Disk stats (read/write):
  fioa: ios=109/2595463, merge=110/28, ticks=9/814160, in_queue=813744, util=99.39%
--snip--

So, in both of those tests, the first being 100% random, 100% read with 4K IOs, I'm getting 192K (192,000) IOPS! And in the second test its 100% random, 100% write with 4K IOs: 191K (191,000) IOPS! That's pretty fast for such a little package... just a single PCIe flash device.

And for some sequential IO tests with a much larger IO size...

--snip--
# fio --bs=4m --direct=1 --rw=read --ioengine=libaio --iodepth=64 --name=/dev/fioa --size=10G
/dev/fioa: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [R] [92.9% done] [800.0MB/0KB/0KB /s] [200/0/0 iops] [eta 00m:01s]
/dev/fioa: (groupid=0, jobs=1): err= 0: pid=1452: Sat Jan 17 11:06:41 2015
  read : io=10240MB, bw=819392KB/s, iops=200, runt= 12797msec
    slat (usec): min=110, max=19743, avg=4959.85, stdev=8374.12
    clat (msec): min=92, max=392, avg=312.69, stdev=20.35
     lat (msec): min=92, max=411, avg=317.65, stdev=18.69
    clat percentiles (msec):
     |  1.00th=[  212],  5.00th=[  302], 10.00th=[  302], 20.00th=[  302],
     | 30.00th=[  322], 40.00th=[  322], 50.00th=[  322], 60.00th=[  322],
     | 70.00th=[  322], 80.00th=[  322], 90.00th=[  322], 95.00th=[  322],
     | 99.00th=[  322], 99.50th=[  334], 99.90th=[  392], 99.95th=[  392],
     | 99.99th=[  392]
    bw (KB  /s): min=442593, max=835584, per=97.73%, avg=800802.08, stdev=80018.74
    lat (msec) : 100=0.12%, 250=1.33%, 500=98.55%
  cpu          : usr=0.07%, sys=3.44%, ctx=1028, majf=0, minf=65543
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=0.6%, 32=1.2%, >=64=97.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=2560/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=10240MB, aggrb=819392KB/s, minb=819392KB/s, maxb=819392KB/s, mint=12797msec, maxt=12797msec

Disk stats (read/write):
  fioa: ios=20256/0, merge=0/0, ticks=1799659/0, in_queue=1806118, util=99.29%
--snip--

--snip--
# fio --bs=4m --direct=1 --rw=write --ioengine=libaio --iodepth=64 --name=/dev/fioa --size=10G
/dev/fioa: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [W] [100.0% done] [0KB/767.3MB/0KB /s] [0/191/0 iops] [eta 00m:00s]
/dev/fioa: (groupid=0, jobs=1): err= 0: pid=1448: Sat Jan 17 11:06:11 2015
  write: io=10240MB, bw=786157KB/s, iops=191, runt= 13338msec
    slat (usec): min=124, max=20466, avg=5167.94, stdev=8529.73
    clat (msec): min=99, max=412, avg=326.06, stdev=21.06
     lat (msec): min=99, max=413, avg=331.23, stdev=19.41
    clat percentiles (msec):
     |  1.00th=[  225],  5.00th=[  314], 10.00th=[  314], 20.00th=[  314],
     | 30.00th=[  334], 40.00th=[  334], 50.00th=[  334], 60.00th=[  334],
     | 70.00th=[  334], 80.00th=[  334], 90.00th=[  334], 95.00th=[  334],
     | 99.00th=[  334], 99.50th=[  351], 99.90th=[  412], 99.95th=[  412],
     | 99.99th=[  412]
    bw (KB  /s): min=407157, max=802816, per=98.28%, avg=772616.08, stdev=74921.31
    lat (msec) : 100=0.12%, 250=1.17%, 500=98.71%
  cpu          : usr=3.31%, sys=2.05%, ctx=1139, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=0.6%, 32=1.2%, >=64=97.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=2560/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=10240MB, aggrb=786156KB/s, minb=786156KB/s, maxb=786156KB/s, mint=13338msec, maxt=13338msec

Disk stats (read/write):
  fioa: ios=59/20405, merge=55/0, ticks=7/1888035, in_queue=1893181, util=98.52%
--snip--

So with 100% sequential, 100% read using 4M IOs we see 800 MB/sec; with same test using writes I'm seeing 767 MB/sec. Pretty fast! I'm not sure where the bottleneck is here... I believe this card is PCIe 2.0 4x so that bus may be the crippler, not sure, I'll have to look into it. Either way, the random IO performance is really where its at, and I am very much impressed.