Akamai Throughput and Quality Report
Terminology
For clarity and consistency throughout this report, the following definitions apply:
- CPU: Software-based video encoding utilizing x264 (H.264 codec) and x265 (H.265 codec).
- GPU: Hardware-accelerated video encoding utilizing NVENC (NVIDIA Encoder) utilizing H.264 codec and H.265 codec.
- VPU: Hardware-accelerated processing utilizing Quadra (NETINT Encoder) utilizing H.264 codec and H.265 codec.
The terms “CPU,” “GPU,” and “VPU” may be used interchangeably with “x264/x265,” “NVENC,” and “Quadra,” respectively.
Contents
- Throughput Test
Objective: Measure the processing speed of CPU, GPU, and VPU in frames per second (FPS) for video encoding.
- Quality Test
Objective: Evaluate output fidelity using metrics like VMAF after processing test videos.
- Summary
Objective: Summarize the throughput and quality test results for CPU (x264/x265), GPU (NVENC), and VPU (Quadra)
Test Environnement
These test were performed on the following Akamai Virtual Machines.
- CPU: NETINT Quadra T1U x2 Small, which has 12 Cores and 24GB RAM
- VPU: NETINT Quadra T1U x2 Small, which has 12 Cores and 24GB RAM.
note
Only one Quadra T1U was used for these tests
- GPU: RTX4000 Ada x1 Large, which has 16 Cores and 64GG RAM
Source Content
The following files were used for the capacity and quality testing.
Input Name | Resolution | Frames | Bit Depth | FPS | Interval (2 sec) |
---|---|---|---|---|---|
blue_sky_1920x1080p25_420.yuv | 1920x1080 | 217 | 8 | 25 | 50 |
crowd_run_1920x1080p50_420.yuv | 1920x1080 | 500 | 8 | 50 | 100 |
ducks_take_off_1920x1080p50_420.yuv | 1920x1080 | 500 | 8 | 50 | 100 |
in_to_tree_1920x1080p50_420.yuv | 1920x1080 | 500 | 8 | 50 | 100 |
old_town_cross_1920x1080p50_420.yuv | 1920x1080 | 500 | 8 | 50 | 100 |
park_joy_1920x1080p50_420.yuv | 1920x1080 | 500 | 8 | 50 | 100 |
pedestrian_area_1920x1080p25_420.yuv | 1920x1080 | 375 | 8 | 25 | 50 |
riverbed_1920x1080p25_420.yuv | 1920x1080 | 250 | 8 | 25 | 50 |
rush_hour_1920x1080p25_420.yuv | 1920x1080 | 500 | 8 | 25 | 50 |
station2_1920x1080p25_420.yuv | 1920x1080 | 313 | 8 | 25 | 50 |
sunflower_1920x1080p25_420.yuv | 1920x1080 | 500 | 8 | 25 | 50 |
tractor_1920x1080p25_420.yuv | 1920x1080 | 690 | 8 | 25 | 50 |
Bitrate
The following bitrate were used in the testing:
- 1 Mbps
- 1.5 Mbps
- 3.5 Mbps
- 7.5 Mbps
Presets
The following presets were used for CPU and GPU
- CPU: Medium for both x264 and x265
- GPU: P7 (Highest Quality Preset)
Versions:
Codec | Version |
---|---|
CPU (x264) | r3213 570f6c7 |
CPU (x265) | 5163c32d7 |
VPU | 5.0.0 |
GPU | Driver: 570.86.10 Cuda: 12.8 |
Commands
The following are the commands used to perform the validation.
AVC/H.264
CPU
x264 --aq-mode 0 --no-scenecut --bframes 3 --b-adapt 0 --rc-lookahead 16 \
--input-depth {input_depth} --output-depth {output_depth} \
--input-res {resolution} --fps {fps} --bitrate {bitrate} --vbv-bufsize {2*bitrate} \
--keyint {interval} --min-keyint {interval} --preset {preset} --frames {total frames} \
-o {output} {input}
NVENC GPU
ffmpeg -y -vsync 0 -hwaccel cuda -s:v {resolution} \
-r {fps} -i {input} -c:v h264_nvenc -pix_fmt yuv420p -preset {preset} -rc cbr \
-bufsize {bitrate} -tune hq -bf 3 -b_ref_mode middle -b_adapt 0 -rc-lookahead 15 -vsync 0 \
-b_qfactor 1 -spatial-aq 0 -temporal-aq 0 -b:v {bitrate} -maxrate {bitrate} -minrate {bitrate} \
-profile:v high -g {interval} -spatial-aq 0 -temporal-aq 1 -vsync 0 -vframes {total frames} {output}
Quadra VPU
ffmpeg -y -vsync 0 -s {resolution} -r {fps} \
-i {input} -c:v h264_ni_quadra_enc \
-xcoder-params level=0:frameRate={fps}:RcEnable=1:vbvBufferSize=2000:bitrate={bitrate}:intraPeriod={interval}:gopPresetIdx=-1:entropyCodingMode=1:lookaheadDepth=16:cuLevelRCEnable=0:rdoLevel=1:EnableRdoQuant=1 \
-vframes {total_frames} {output}
HEVC/H.265
CPU
x265 --aq-mode 0 --no-scenecut --bframes 3 --b-adapt 0 --rc-lookahead 16 \
--input-depth {input_depth} --output-depth {output_depth} \
--input-res {resolution} --fps {fps} --bitrate {bitrate} \
--vbv-bufsize {2*bitrate} --keyint {interval} --min-keyint {interval} \
--preset {preset} --frames {total frames} \
-o {output} {input}
NVENC GPU
ffmpeg -y -vsync 0 -hwaccel cuda -s:v {resolution} \
-r {fps} -i {input} -c:v hevc_nvenc -pix_fmt yuv420p -preset {preset}-rc cbr \
-bufsize {bitrate} -tune hq -bf 5 -b_ref_mode middle -b_adapt 0 -rc-lookahead 15 -vsync 0 \
-b_qfactor 1 -spatial-aq 0 -temporal-aq 0 -b:v {bitrate} -maxrate {bitrate} -minrate {bitrate} \
-profile:v main -g {interval} -spatial-aq 0 -temporal-aq 1 -vsync 0 -vframes {total frames} {output}
Quadra VPU
ffmpeg -y -vsync 0 -s {resolution} -r {fps} \
-i {input} -c:v h265_ni_quadra_enc \
-xcoder-params level=0:frameRate={fps}:RcEnable=1:vbvBufferSize=2000:bitrate={bitrate}:intraPeriod={interval}:gopPresetIdx=-1:entropyCodingMode=1:lookaheadDepth=16:cuLevelRCEnable=0:rdoLevel=1:EnableRdoQuant=1 \
-vframes {total_frames} {output}
Throughput Analysis
The below analysis evaluates the throughput performance, using the blue_sky_1920x1080p25_420.yuv
source file with bitrate of 3.125Mbps
for the Quadra (VPU), x264/x265 (CPUs), and NVIDIA (GPUs) across two different codecs, using H.264 and H.265 with varying instance counts.
Various number of parallel instances were used to determine total frames per second (fps).
AVC/H.264
Instances | CPU (fps per instance) | VPU (fps per instance) | GPU (fps per instance) |
---|---|---|---|
4 | 37.13 | 105.87 | 62.33 |
8 | 18.85 | 55.10 | 34.37 |
10 | 15.23 | 46.28 | 28.19 |
16 | 9.60 | 29.63 | 18.52 |
32 | 4.83 | 16.00 | 9.16 |
HEVC/H.265
Instances | CPU (fps per instance) | VPU (fps per instance) | GPU (fps per instance) |
---|---|---|---|
4 | 17.85 | 124.14 | 54.31 |
8 | 9.81 | 70.09 | 30.70 |
10 | 7.60 | 56.58 | 25.11 |
16 | 4.83 | 36.49 | 16.43 |
32 | 2.47 | 18.99 | 8.53 |
Summary
The throughput data underscores the scalability and efficiency of VPUs (Quadra) for both H.264 and HEVC encoding. As instance counts increase, CPU and GPU performance drop sharply—CPU falls to 4.83 FPS (H.264) and 2.47 FPS (HEVC) at 32 instances, while GPU declines to 9.16 FPS (H.264) and 8.53 FPS (HEVC).
In contrast, VPUs maintain robust throughput, achieving 16.00 FPS (H.264) and 18.99 FPS (HEVC) at 32 instances. This resilience highlights VPUs’ superior scalability under heavy workloads compared to CPUs and GPUs.
VPUs consistently outperform CPUs across all instance levels and remain highly competitive with GPUs, particularly as instance counts rise. While the data shows GPUs trailing VPUs even at 4 instances (e.g., 62.33 FPS vs. VPU’s 105.87 FPS for H.264, and 54.31 FPS vs. 124.14 FPS for HEVC)—delivering up to 2x the throughput of GPU P7 (highest quality).
Quality Analysis
This section presents a detailed quality analysis of video encoding performed on CPU (software-based encoding with x264/x265), GPU (hardware-accelerated encoding with NVENC), and VPU (hardware-accelerated encoding with Quadra). The primary objective is to evaluate and compare the perceptual video quality delivered by each encoding across varying bitrate conditions.
GPU vs VPU (Quadra)
Results
GPU (H.264) | GPU (H.265) | |
---|---|---|
Quadra | -4.49 | -14.62 |
- Lower, negative numbers show bitrate advantage
- GPU used P7 for Highest Quality
BD-Rate Graphs
Expand the sections below to view the BD-Rate Graphs for H.264 and H.265
H.264
H.265
Summary
- For H.264 encoding, Quadra exhibits a clear improvement over GPU-based NVENC H.264 encoding (using the high-quality P7 preset), with a VMAF-bdrate difference of -4.49. This suggests Quadra retains noticeably better perceptual quality compared to the GPU’s top-tier setting. The difference, while not massive, is significant given that NVENC P7 is optimized for maximum quality, indicating Quadra’s edge in visual fidelity against NVIDIA’s best effort.
- For HEVC encoding, the advantage becomes dramatically more pronounced, with a VMAF-bdrate difference of -14.62. Quadra vastly outperforms GPU-based H.265 encoding (again, using the high-quality P7 preset), offering exceptional quality retention across the tested bitrate range. This large negative value underscores Quadra’s superior quality and efficiency, especially in high-compression scenarios where NVENC P7, despite its high-quality tuning, exhibits noticeable quality degradation.
CPU vs VPU (Quadra)
Results
GPU (H.264) | CPU (H.265) | |
---|---|---|
Quadra | -2.29 | -6.74 |
Lower, negative numbers show bitrate advantage
BD-Rate Graphs
Expand the sections below to view the BD-Rate Graphs for H.264 and H.265
H.264
H.265
Summary
-
For H.264 encoding, Quadra demonstrates a quality advantage over CPU-based x264 encoding, with a VMAF-bdrate difference of -2.29. This negative value indicates Quadra delivers better perceptual quality than the CPU across the tested bitrates. While the improvement is moderate, it showcases Quadra’s ability to preserve visual fidelity more effectively than CPU encoding, likely due to optimized hardware or algorithmic efficiency.
-
For HEVC encoding, the quality gap widens significantly, with a VMAF-bdrate difference of -6.74. Quadra outperforms CPU-based x265 encoding, particularly at lower bitrates where HEVC compression artifacts tend to be more pronounced. This substantial advantage highlights Quadra’s quality and bitrate efficiency under demanding compression scenarios, making it a strong contender against traditional CPU encoding.
Conclusion
This report presents a comprehensive evaluation of CPU (x264/x265), GPU (NVENC), and VPU (Quadra) across two critical dimensions: throughput and video quality, assessed using the VMAF metric. The analysis leverages 12 diverse video sequences tested at seven bitrate levels.
Quadra demonstrates exceptional throughput scalability, maintaining robust performance—16.00 FPS (H.264) and 18.99 FPS (HEVC) at 32 instances —while CPU and GPU performance degrade sharply (e.g., CPU at 4.83 FPS H.264 and 2.47 FPS HEVC, GPU at 9.16 FPS H.264 and 8.53 FPS HEVC at 32 instances). This resilience is complemented by a significant quality advantage: Quadra outperforms CPU with VMAF-bdrate differences of -2.29 (H.264) and -6.74 (HEVC), and GPU with -4.49 (H.264) and -14.62 (HEVC).
These metrics highlight the Quadra VPU’s ability to deliver superior perceptual quality with reasonable settings, particularly in high-compression HEVC scenarios, making it ideal for real-time streaming, multi-instance encoding, and bandwidth-constrained applications. Even against GPU’s highest quality preset (P7), Quadra’s quality edge persists, with notable robustness at lower bitrates.
As this testing shows, Quadra VPU is the premier choice, offering an unmatched combination of scalability, efficiency, and quality to meet diverse video processing demands, with Quadra vs. GPU P7 (highest) showing significant advantages in both quality (up to 15%) and throughput (2x).