Benchmark

Environment

  • EC2 and the S3 bucket are located in the same region

fio

Create a 100GB test file:

fio --name=create_100gb_file \
    --filename=/mnt/fuse/100gb \
    --ioengine=libaio \
    --direct=1 \
    --group_reporting \
    --fallocate=none \
    --create_on_open=1 \
    --end_fsync=1 \
    --size=100000M \
    --rw=write \
    --bs=10M \
    --numjobs=1

Load the test file into local cache:

$ lsblk

$ sudo mkfs.xfs /dev/nvme1n1

$ sudo mkdir /mnt/fuse /data

$ sudo chmod 0777 /mnt/fuse /data -R

$ sudo mount /dev/nvme1n1 /data

$ sudo mapfs add vol_benchmark aws <AWSAccessKey> <AWSSecretKey> <S3-BucketName> <Region> cache_dir=/data

$ mapfs load /mnt/fuse/100gb

Read Performance with cache:

fio --name=read_benchmark \
    --ioengine=libaio \
    --direct=1 \
    --group_reporting \
    --runtime=60 \
    --size=100000M \
    --readonly \
    --filename=/mnt/fuse/100gb \
    --rw=<read|randread> \
    --bs=[4k|1M] \
    --numjobs=[1|32|128]
Note:
mapfs uses DirectIO mode in the whole IO process, and does not use the OS Page Cache.
Therefore, after fio testing, no need to drop the Page Cache.
 

Network Write Performance:

fio --name=write_benchmark \
    --directory=/mnt/fuse \
    --ioengine=libaio \
    --direct=1 \
    --group_reporting \
    --fallocate=none \
    --create_on_open=1 \
    --end_fsync=1 \
    --runtime=60 \
    --time_based \
    --size=100G \
    --rw=write \
    --bs=<1M|4M|8M> \
    --numjobs=<1|32|128>

Note:

After each fio test, delete fio generated large test file from the Cloud Storage:

$ rm -f /mnt/fuse/write_benchmark.*

 

c6id Series: performance overview

With 1 NVME SSD instance storage.

Instance Type Baseline IOPS Peak IOPS Baseline Throughput (MB/s) Peak Throughput (MB/s) Baseline Bandwidth (Mbps) Peak Bandwidth (Mbps)
c6id.large 3600 40000 81.25 1250 650 10000
c6id.xlarge 6000 40000 156.25 1250 1250 10000
c6id.2xlarge 12000 40000 312.5 1250 2500 10000
c6id.4xlarge 20000 40000 625 1250 5000 10000
c6id.8xlarge 40000 40000 1250 1250 10000 10000

Benchmark Results

c6id.large

2 CPU, 4GB RAM
On-Demand Linux pricing: 0.1155 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: Instance Store (data is lost after EC2 Stop)
1 x 118 GiB NVMe SSD
Network Read  
Speed 70.8 MiB/s
CPU(cores) 57.48%
RSS(KB) 404785
%MEM 10.36%

 

Throughput (bs=1M)
Bandwidth/numjobs Sequential Random
1 32 128 1 32 128
Bandwidth 152 MiB/s 4839 MiB/s 6440 MiB/s 152MiB/s 152MiB/s 152MiB/s
CPU (cores) 3.36% 75.22% 162.88% 3% 4% 4%
RSS(KB) 390373 408380 410700 224360 232953 233083
%MEM 10% 10.45% 10.51% 6% 6% 6%

 

IOPS (4k block size)
IOPS/numjobs Sequential Random
1 32 128 1 32 128
IOPS 39k 182k 132k 8.2k 34.1k 34.2k
CPU1 31.68% 93.89% 80% 27.2% 97.96% 101.48
RSS(KB) 397051 418732 403244 212342 213160 213258
%MEM 10.16% 10.72% 10.32% 5.43% 5.46% 5.46%

 

WRITE
BS/numjobs 1MB 4MB
1 32 1 32
Bandwidth 791MiB/s 831MiB/s 986MiB/s 850MiB/s
CPU (cores) 152.61% 140% 161.14% 145.84%
RSS(KB) 663370 674742 731934 697581
%MEM 16.98 17.27% 18.73% 17.85%

 

c6id.xlarge

4 vCPU, 8GB RAM
On-Demand Linux pricing: 0.231 USD per Hour
Network: baseline 1.25Gbps (156MB/s), up to 12.5Gbps; VPC credits can be exhausted within one minute
NVME SSD: 1 x 237 GiB Instance Store

 

Network Read  
Speed 141.8 MiB/s
CPU(cores) 85%
RSS(KB) 630752
%MEM 7.92%

 

Throughput (bs=1M)
Bandwidth/numjobs Sequential Random
1 32 128 1 32 128
Bandwidth 305 MiB/s 9754 MiB/s 11.3 GiB/s 309 MiB/s 305 MiB/s 305 MiB/s
CPU (cores) 6.62% 177% 312% 8.64% 9.72% 8.90%
RSS(KB) 717340 718324 742341 316319 316609 316707
%MEM 9.01% 9.02% 9.32% 3.97% 3.97% 3.98%

 

 

IOPS (4k block size)
IOPS/numjobs Sequential Random
1 32 128 1 32 128
IOPS 53.1k 319k 198k 8.2k 68.2k 68.3k
CPU1 41% 199% 172% 26% 223% 236%
RSS(KB) 719117 697986 473843 307216 308542 309972
%MEM 9.03% 8.76% 5.95% 3.86% 3.87% 3.89%

 

 

WRITE
BS/numjobs 1MB 4MB
1 32 1 32
Bandwidth 917 MiB/s 1200 MiB/s 1144 MiB/s 879 MiB/s
CPU (cores) 163% 180% 191% 165%
RSS(KB) 1119798 1073321 1061085 1066169
%MEM 14.06% 13.47% 13.32% 13.38%

 

 

c6id.2xlarge

8 vCPU, 16GB RAM
On-Demand Linux pricing: 0.4620 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 474 GiB Instance Store

Network Read  
Speed 283.3 MiB/s
CPU(cores) 112%
RSS(KB) 753241
%MEM 4.68%

 

 

Throughput (bs=1M)
Bandwidth/numjobs Sequential Random
1 32 128 1 32 128
Bandwidth 615MiB/s 19.0GiB/s 18.8GiB/s 612MiB/s 610MiB/s 610MiB/s
CPU (cores) 17% 427% 589% 18% 20% 22%
RSS(KB) 721862 778279 814246 258948 258213 260116

 

IOPS (4k block size)
IOPS/numjobs Sequential Random
1 32 128 1 32 128
IOPS 51.2k 472k 282k 7.7k 116k 103k
CPU1 43% 384% 347% 30% 385% 391%
RSS(KB) 788131 775740 514511 242278 244264 244523

 

WRITE
BS/numjobs 1MB 4MB
1 32 1 32
Bandwidth 1319MiB/s 1254MiB/s 1255MiB/s 1250MiB/s
CPU (cores) 186% 183% 193% 182%
RSS(KB) 1999296 2032385 2076298 2052195
%MEM 12.43% 12.64% 12.91% 12.76%

 

c6id.4xlarge

16 vCPU, 32GB RAM
On-Demand Linux pricing: 0.9240  USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 950 GiB Instance Store

Network Read  
Speed 568.2 MiB/s
CPU(cores) 125%
RSS(KB) 796710
%MEM 2.47%

 

 

Throughput (bs=1M)
Bandwidth/numjobs Sequential Random
1 32 128 1 32 128
Bandwidth 1202MiB/s 38.0GiB/s 45.8GiB/s 1222MiB/s 1221MiB/s 1221MiB/s
CPU (cores) 29% 645% 1236% 26% 31% 32%
RSS(KB) 759424 844728 932179 352842 352286 350937

 

 

IOPS (4k block size) iouring
IOPS/numjobs Sequential Random
1 32 128 1 32 128
IOPS 147k 1010k 1124k 9.5k 112k 124k
CPU(cores) 41% 591% 705% 17% 250% 285%
RSS(KB) 719477 742808 751704 208218 209705 209728

 

 

WRITE
BS/numjobs 1MB 4MB 8MB
1 32 1 32 1 32
Bandwidth 1429MiB/s 1370MiB/s 1375MiB/s 1380MiB/s 1387MiB/s 1051MiB/s
CPU (cores) 169% 174% 190% 180% 206% 162%
CPU (usr+sys)            
RSS(KB) 2447013 2532508 2617603 2576234 2617686 2612897
%MEM 7.57% 7.84% 8.10% 7.97% 8.10% 8.09%

 

c6id.8xlarge

32 vCPU, 64GB RAM
On-Demand Linux pricing: 1.8480 USD per Hour
Network: 12.5Gbps
NVME SSD: 1 x 1900 GiB Instance Store

 

Network Read:      
1124 MiB/s 150% 1194014 KB 1.84%

 

Throughput (bs=1M)
Bandwidth/numjobs Sequential Random
1 32 128 1 32 128
Bandwidth 1671 MiB/s 51.8 GiB/s 80.4 GIB/s 1471 MiB/s 2487 MiB/s 2482 MiB/s
CPU (cores) 35% 820% 2666% 36% 68% 71%
CPU (usr+sys)            
RSS(KB) 906482 925460 888389 442029 445317 445010
%MEM 1.40% 1.43% 1.37% 0.68% 0.69% 0.69%

 

 

IOPS (4k block size) iouring
IOPS/numjobs Sequential Random
1 32 128 1 32 128
IOPS 147k 829k 818k 9.6k 165k 201k
CPU(cores) 40% 1893% 1889% 17% 345% 432%
CPU(usr+sys)            
RSS(KB) 721082 751799 731597 262532 262532 262532
%MEM 1.11% 1.16% 1.13% 0.41% 0.41% 0.41%

 

 

WRITE
BS/numjobs 1MB 4MB 8MB
1 32 1 32 1 32
Bandwidth 1460MiB/s 1457MiB/s 1453MiB/s 1419MiB/s 1399MiB/s 1434MiB/s
CPU (cores) 166% 167% 192% 185% 197% 186%
CPU (usr+sys)            
RSS(KB) 2443310 2584623 2721486 2673981 2726733 2731521
%MEM 3.77% 3.99% 4.20% 4.13% 4.21% 4.22%

 

Performance Notes

1. c6id.large/xlarge/2xlarge/4xlarge: EBS and Network have a baseline performance, and burstable upper limit.

2. "mapfs mount <VolumeName> <MountPoint>" mounts in traditional mode by default;

To enable "fuse over iouring" mode, specify "iouring" during mount:

 $ mapfs mount <VolumeName> <MountPoint> iouring

Note:

Specifying iouring does not guarantee "fuse over iouring" will be enabled. It also requires Linux kernel version >= 6.18, typically from these distributions:

  • Amazon Linux 2023, kernel 6.18
  • Ubuntu 26.04

3. Advantages and limitations of iouring mode

Traditional mount: When CPU number exceeds 16 and IOPS exceeds 300K, a single lock contention point can cause CPU usage to spike while I/O performance may not improve or can even degrade greatly. The more CPUs added, the worse the performance degradation becomes, due to scheduling pressure, frequent L3 cache invalidations, and the single kernel spinlock CPU usage.

iouring mode mount: breaks single lock contention and scale up better when CPU number exceeds 16 and IOPS exceed 300K.

However, its limitation is that when IOPS exceed 1M and CPU number reaches 32 or more, fuse uring threads may degrade into polling mode, causing high CPU usage without a corresponding I/O performance increase.

4. Common benchmark bottlenecks:

  • EBS Throughput and IOPS limitations
  • c6id instances include an instance NVME SSD. If you use other instance types such as t3.small with an attached GP3 SSD volume, you will be limited by GP3 throughput and IOPS.
  • Network bandwidth
  • If you use burstable EC2 instances (such as t3 series), CPU performance may fluctuate between baseline and burst limits.