Hyperflex Design, Performance and Benchmarking with Flash and NVMe

These are my notes from the Cisco Hyperflex talk about doing benchmarks with different setups and versions of HX. Some tips about taking benchmarks, what tools you can use and some information about HX Data Plattform works. The following information about HX may be outdated already. Because these are notes I don't plan to rewrite it more nicely. Sorry.

Whats New in Hyperflex 2.6

  • Support M5 Server Models
  • All-Flash now supports NVMe Caching Disks
  • Native Snapshot based Replication
  • Mixed Clusters with M4 and M5 Servers
  • Support for self encrypting Disks
  • Expand Compute-Only Node Support
  • Hyperflex Connect is a Native HTML5 GUI running on the Cluster itself

DR

Transport Snapshots in the Background to another Hyperflex Cluster on another site. Built-in Disaster Recovery.

All-Flash System Design

Cisco HX Data Platform, it's the software defined storage behind the scene of Hyperflex.

  • Distributed filesystem over all nodes
  • Deduplication and compression
  • Scale-out log-structured FS design
  • No "legacy" filesystems behind it
  • VAAI support for snapshots and provisioning tasks.

The Storage Controller is a VM on each Node and has direct access to the disk of the host. The Storage is presented as NFS Volume to the Hypervisor.

Differences to a Hybrid configuration

  • SSD Cache for deduplication read cache in Hybrid Only acts like an L2 cache
  • Index on every hosts for the full filesystem, no cross network asks to find a block in the FS
  • Based on Metadata
  • Log-Based FS -> The cleaner process cleans the log to mark blocks who can be rewritten (Flash!)

Life of an IO (Summary)

There is the IOVisor which intercept the IO when the Datastore want to write it to disk and send the IO to the controller VM

  • Destage (based on every node)
    • Secondary log gets promoted to active log
    • Full log gets deduplicated on the primary copies of the data
    • Data is then three way written (Replication Factor 3) to SSD / HDD
    • After the commit the data is purged from the log and the caching SSD
    • In hybrid systems, recent writes are added to the SSD Cache

Dynamic Data Distribution

The IO from a VM scales to every storage controller of all nodes in the cluster. So all the cache and disk IO can be used simultianously. The backend network of the UCS-based HX Nodes is used to transmit the data to other nodes.

Scaling

  • Scaling Compute and Capacity independent
  • IOVisor access the storage from a converged nodes

Limits and Stuff

  • RF2 max. Capacity is 591 TB!
  • Keep usage under 70% for best performance.

Tools to meter the performance

  • IOMeter:
    • Highly dedupable data pattern so no real workload to the underlying disks because it's all deduped.
  • FIO

    • Better, self compile, highly customizeable
    • Start can be done from a remote master machine
    • not easy readable output, cluttering...
  • Vdbench

    • Server/Client exec with SSH
    • Easy HTML output
  • Cisco HX Bench

    • CCO Login required to download
    • OVA VM which starts new VMs in the Cluster for load generation
    • included test scenarios or built your own
    • Result in the nice WebUI
  • HCIBench from VMware which executes vdbench on Photon VMs
  • HammerDB for database workload testing

Performance testing tipps

  • While testing, exceed the read cache to see the real performance.
  • With Intel Skylake CPUs the lanes for best RAM performance are 384 GB or 756 GB and not 512 GB.
  • Hyperflex cannot turn off dedup and compression!
  • 40 GbE brings better performance in IOPS, less latency and more stability.