Page tree
Skip to end of metadata
Go to start of metadata

Background

NVMe has already established as storage choice for modern workloads requiring low latency and high performance. On cloud it is available as NVMe ephemeral resources that can be easily attached to computing instances.  Though this option is within reach it comes with other challenges

  • Storage is ephemeral in nature and cannot be used when persistence is required as the contents do not survive stop
  • Compute instances cannot be stopped and it requires removal and recreating of instance if one does.  Now your applications need to consider this unnecessarily.
  • Only certain storage optimized instances i3,i2 have these direct NVMe SSD
  • The minimum capacity is 475Gib even if the application will use only 1/10 of that capacity.
  • Hinder any solution that require  separation of storage and compute.
  • The Elastic Block Storage for a single compute instance will peak at 40K IOPS.

MayaScale provides Elastic NVMe using  NVMe Fabric over TCP on AWS Cloud for high-performance shared storage and its performance is compared against running locally using the ephemeral local NVMe SSDs.

Benchmark Setup

MayaScale works better when configured with multiple cores to provide same parallelism seen with NVMe hardware queues and also the compute instance network and storage performance is also tied to the number of provisioned CPU cores.  The software has very low memory requirement only.  The following i3 instances can be used for MayaScale server.

Instance TypevCPUMemory (GiB)NetworkCapacityTotal HW Queues
i3.large215.25~10 Gbps

i3.xlarge430.5~10 Gbps0.8T4
i3.2xlarge861~10 Gbps1.7T8
i3.4xl16122~10 Gbps3.5T32
i3.8xl3224410 Gbps6.9T64
i3.16xl6448825 Gbps13.8T248


  •  server node of instance type from above running MayaScale software
  • client node of machine type m5a.2xlarge 

FIO Performance Report

The FIO benchmark ran with  options numthreads=4, iodepth=32, bs=4k

Instance TypeIO patternBlock Size

Local

IOPS

Client

IOPS

Local

Latency (us)

Client

Latency (us)

i3.largerandom write4K



i3.largerandom read4K



i3.xlargerandom write4K71.8k71.3K1781.891793.06
i3.xlargerandom read4K210k175K609.61727.94
i3.2xlargerandom write4K181K180k706.52706.55
i3.2xlargerandom read4K413K189k308.56674.64
i3.4xlargerandom write4K



i3.4xlargerandom write4K



i3.8xlargerandom write4K



The output shows that NVME over Fabrics is a power technology that can match the local NVMe performance with little added latency for read operations only limited by the network bandwidth of 10Gbps that was used in the testing.


  • No labels