Documentation

Benchmark

Environment

EC2 and the S3 bucket are located in the same region

fio

Create a 100GB test file:

fio --name=create_100gb_file \
--filename=/mnt/fuse/100gb \
--ioengine=libaio \
--direct=1 \
--group_reporting \
--fallocate=none \
--create_on_open=1 \
--end_fsync=1 \
--size=100000M \
--rw=write \
--bs=10M \
--numjobs=1

Load the test file into local cache:

$ lsblk

$ sudo mkfs.xfs /dev/nvme1n1

$ sudo mkdir /mnt/fuse /data

$ sudo chmod 0777 /mnt/fuse /data -R

$ sudo mount /dev/nvme1n1 /data

$ sudo mapfs add vol_benchmark aws <AWSAccessKey> <AWSSecretKey> <S3-BucketName> <Region> cache_dir=/data

$ mapfs load /mnt/fuse/100gb

Read Performance with cache:

fio --name=read_benchmark \

--ioengine=libaio \

--direct=1 \

--group_reporting \

--runtime=60 \

--size=100000M \

--readonly \

--filename=/mnt/fuse/100gb \

--rw=<read|randread> \

--bs=[4k|1M] \

--numjobs=[1|32|128]

Note:

mapfs uses DirectIO mode in the whole IO process, and does not use the OS Page Cache.

Therefore, after fio testing, no need to drop the Page Cache.

Note:

After each fio test, delete fio generated large test file from the Cloud Storage:

$ rm -f /mnt/fuse/write_benchmark.*

c6id Series: performance overview

With 1 NVME SSD instance storage.

Instance Type	Baseline IOPS	Peak IOPS	Baseline Throughput (MB/s)	Peak Throughput (MB/s)	Baseline Bandwidth (Mbps)	Peak Bandwidth (Mbps)
c6id.large	3600	40000	81.25	1250	650	10000
c6id.xlarge	6000	40000	156.25	1250	1250	10000
c6id.2xlarge	12000	40000	312.5	1250	2500	10000
c6id.4xlarge	20000	40000	625	1250	5000	10000
c6id.8xlarge	40000	40000	1250	1250	10000	10000

Benchmark Results

c6id.large

2 CPU, 4GB RAM

On-Demand Linux pricing: 0.1155 USD per Hour

Network: Up to 12.5Gbps

NVME SSD: Instance Store (data is lost after EC2 Stop)

1 x 118 GiB NVMe SSD

Network Read
Speed	70.8 MiB/s
CPU(cores)	57.48%
RSS(KB)	404785
%MEM	10.36%

Throughput (bs=1M)
Bandwidth/numjobs	Sequential			Random
Bandwidth/numjobs	1	32	128	1	32	128
Bandwidth	152 MiB/s	4839 MiB/s	6440 MiB/s	152MiB/s	152MiB/s	152MiB/s
CPU (cores)	3.36%	75.22%	162.88%	3%	4%	4%
RSS(KB)	390373	408380	410700	224360	232953	233083
%MEM	10%	10.45%	10.51%	6%	6%	6%

IOPS (4k block size)
IOPS/numjobs	Sequential			Random
IOPS/numjobs	1	32	128	1	32	128
IOPS	39k	182k	132k	8.2k	34.1k	34.2k
CPU1	31.68%	93.89%	80%	27.2%	97.96%	101.48
RSS(KB)	397051	418732	403244	212342	213160	213258
%MEM	10.16%	10.72%	10.32%	5.43%	5.46%	5.46%

WRITE
BS/numjobs	1MB		4MB
BS/numjobs	1	32	1	32
Bandwidth	791MiB/s	831MiB/s	986MiB/s	850MiB/s
CPU (cores)	152.61%	140%	161.14%	145.84%
RSS(KB)	663370	674742	731934	697581
%MEM	16.98	17.27%	18.73%	17.85%

c6id.xlarge

4 vCPU, 8GB RAM
On-Demand Linux pricing: 0.231 USD per Hour
Network: baseline 1.25Gbps (156MB/s), up to 12.5Gbps; VPC credits can be exhausted within one minute
NVME SSD: 1 x 237 GiB Instance Store

Network Read
Speed	141.8 MiB/s
CPU(cores)	85%
RSS(KB)	630752
%MEM	7.92%

Throughput (bs=1M)
Bandwidth/numjobs	Sequential			Random
Bandwidth/numjobs	1	32	128	1	32	128
Bandwidth	305 MiB/s	9754 MiB/s	11.3 GiB/s	309 MiB/s	305 MiB/s	305 MiB/s
CPU (cores)	6.62%	177%	312%	8.64%	9.72%	8.90%
RSS(KB)	717340	718324	742341	316319	316609	316707
%MEM	9.01%	9.02%	9.32%	3.97%	3.97%	3.98%

IOPS (4k block size)
IOPS/numjobs	Sequential			Random
IOPS/numjobs	1	32	128	1	32	128
IOPS	53.1k	319k	198k	8.2k	68.2k	68.3k
CPU1	41%	199%	172%	26%	223%	236%
RSS(KB)	719117	697986	473843	307216	308542	309972
%MEM	9.03%	8.76%	5.95%	3.86%	3.87%	3.89%

WRITE
BS/numjobs	1MB		4MB
BS/numjobs	1	32	1	32
Bandwidth	917 MiB/s	1200 MiB/s	1144 MiB/s	879 MiB/s
CPU (cores)	163%	180%	191%	165%
RSS(KB)	1119798	1073321	1061085	1066169
%MEM	14.06%	13.47%	13.32%	13.38%

c6id.2xlarge

8 vCPU, 16GB RAM
On-Demand Linux pricing: 0.4620 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 474 GiB Instance Store

Network Read
Speed	283.3 MiB/s
CPU(cores)	112%
RSS(KB)	753241
%MEM	4.68%

Throughput (bs=1M)
Bandwidth/numjobs	Sequential			Random
Bandwidth/numjobs	1	32	128	1	32	128
Bandwidth	615MiB/s	19.0GiB/s	18.8GiB/s	612MiB/s	610MiB/s	610MiB/s
CPU (cores)	17%	427%	589%	18%	20%	22%
RSS(KB)	721862	778279	814246	258948	258213	260116

IOPS (4k block size)
IOPS/numjobs	Sequential			Random
IOPS/numjobs	1	32	128	1	32	128
IOPS	51.2k	472k	282k	7.7k	116k	103k
CPU1	43%	384%	347%	30%	385%	391%
RSS(KB)	788131	775740	514511	242278	244264	244523

WRITE
BS/numjobs	1MB		4MB
BS/numjobs	1	32	1	32
Bandwidth	1319MiB/s	1254MiB/s	1255MiB/s	1250MiB/s
CPU (cores)	186%	183%	193%	182%
RSS(KB)	1999296	2032385	2076298	2052195
%MEM	12.43%	12.64%	12.91%	12.76%

c6id.4xlarge

16 vCPU, 32GB RAM
On-Demand Linux pricing: 0.9240 USD per Hour
Network: Up to 12.5Gbps
NVME SSD: 1 x 950 GiB Instance Store

Network Read
Speed	568.2 MiB/s
CPU(cores)	125%
RSS(KB)	796710
%MEM	2.47%

Throughput (bs=1M)
Bandwidth/numjobs	Sequential			Random
Bandwidth/numjobs	1	32	128	1	32	128
Bandwidth	1202MiB/s	38.0GiB/s	45.8GiB/s	1222MiB/s	1221MiB/s	1221MiB/s
CPU (cores)	29%	645%	1236%	26%	31%	32%
RSS(KB)	759424	844728	932179	352842	352286	350937

IOPS (4k block size) iouring
IOPS/numjobs	Sequential			Random
IOPS/numjobs	1	32	128	1	32	128
IOPS	147k	1010k	1124k	9.5k	112k	124k
CPU(cores)	41%	591%	705%	17%	250%	285%
RSS(KB)	719477	742808	751704	208218	209705	209728

WRITE
BS/numjobs	1MB		4MB		8MB
BS/numjobs	1	32	1	32	1	32
Bandwidth	1429MiB/s	1370MiB/s	1375MiB/s	1380MiB/s	1387MiB/s	1051MiB/s
CPU (cores)	169%	174%	190%	180%	206%	162%
CPU (usr+sys)
RSS(KB)	2447013	2532508	2617603	2576234	2617686	2612897
%MEM	7.57%	7.84%	8.10%	7.97%	8.10%	8.09%

c6id.8xlarge

32 vCPU, 64GB RAM
On-Demand Linux pricing: 1.8480 USD per Hour
Network: 12.5Gbps
NVME SSD: 1 x 1900 GiB Instance Store

Network Read:
1124 MiB/s	150%	1194014 KB	1.84%

Throughput (bs=1M)
Bandwidth/numjobs	Sequential			Random
Bandwidth/numjobs	1	32	128	1	32	128
Bandwidth	1671 MiB/s	51.8 GiB/s	80.4 GIB/s	1471 MiB/s	2487 MiB/s	2482 MiB/s
CPU (cores)	35%	820%	2666%	36%	68%	71%
CPU (usr+sys)
RSS(KB)	906482	925460	888389	442029	445317	445010
%MEM	1.40%	1.43%	1.37%	0.68%	0.69%	0.69%

IOPS (4k block size) iouring
IOPS/numjobs	Sequential			Random
IOPS/numjobs	1	32	128	1	32	128
IOPS	147k	829k	818k	9.6k	165k	201k
CPU(cores)	40%	1893%	1889%	17%	345%	432%
CPU(usr+sys)
RSS(KB)	721082	751799	731597	262532	262532	262532
%MEM	1.11%	1.16%	1.13%	0.41%	0.41%	0.41%

WRITE
BS/numjobs	1MB		4MB		8MB
BS/numjobs	1	32	1	32	1	32
Bandwidth	1460MiB/s	1457MiB/s	1453MiB/s	1419MiB/s	1399MiB/s	1434MiB/s
CPU (cores)	166%	167%	192%	185%	197%	186%
CPU (usr+sys)
RSS(KB)	2443310	2584623	2721486	2673981	2726733	2731521
%MEM	3.77%	3.99%	4.20%	4.13%	4.21%	4.22%

Performance Notes

1. c6id.large/xlarge/2xlarge/4xlarge: EBS and Network have a baseline performance, and burstable upper limit.

2. "mapfs mount <VolumeName> <MountPoint>" mounts in traditional mode by default;

To enable "fuse over iouring" mode, specify "iouring" during mount:

$ mapfs mount <VolumeName> <MountPoint> iouring

Note:

Specifying iouring does not guarantee "fuse over iouring" will be enabled. It also requires Linux kernel version >= 6.18, typically from these distributions:

Amazon Linux 2023, kernel 6.18
Ubuntu 26.04

3. Advantages and limitations of iouring mode

Traditional mount: When CPU number exceeds 16 and IOPS exceeds 300K, a single lock contention point can cause CPU usage to spike while I/O performance may not improve or can even degrade greatly. The more CPUs added, the worse the performance degradation becomes, due to scheduling pressure, frequent L3 cache invalidations, and the single kernel spinlock CPU usage.

iouring mode mount: breaks single lock contention and scale up better when CPU number exceeds 16 and IOPS exceed 300K.

However, its limitation is that when IOPS exceed 1M and CPU number reaches 32 or more, fuse uring threads may degrade into polling mode, causing high CPU usage without a corresponding I/O performance increase.

4. Common benchmark bottlenecks:

EBS Throughput and IOPS limitations
c6id instances include an instance NVME SSD. If you use other instance types such as t3.small with an attached GP3 SSD volume, you will be limited by GP3 throughput and IOPS.
Network bandwidth
If you use burstable EC2 instances (such as t3 series), CPU performance may fluctuate between baseline and burst limits.

Benchmark

Environment

fio

Create a 100GB test file:

Load the test file into local cache:

Read Performance with cache:

Network Write Performance:

c6id Series: performance overview

Benchmark Results

c6id.large

c6id.xlarge

c6id.2xlarge

c6id.4xlarge

c6id.8xlarge

Performance Notes