Skip to main content
Version: v5.3.0

Akamai Throughput and Quality Report

Terminology

For clarity and consistency throughout this report, the following definitions apply:

  • CPU: Software-based video encoding utilizing x264 (H.264 codec) and x265 (H.265 codec).
  • GPU: Hardware-accelerated video encoding utilizing NVENC (NVIDIA Encoder) utilizing H.264 codec and H.265 codec.
  • VPU: Hardware-accelerated processing utilizing Quadra (NETINT Encoder) utilizing H.264 codec and H.265 codec.

Note: The terms “CPU,” “GPU,” and “VPU” may be used interchangeably with “x264/x265,” “NVENC,” and “Quadra,” respectively.

Contents

  1. Throughput Test

Objective: Measure the processing speed of CPU, GPU, and VPU in frames per second (FPS) for video encoding.

  1. Quality Test

Objective: Evaluate output fidelity using metrics like VMAF after processing test videos.

  1. Summary

Objective: Summarize the throughput and quality test results for CPU (x264/x265), GPU (NVENC), and VPU (Quadra)

Test Environnement

These test were performed on the following Akamai Virtual Machines.

  • CPU: NETINT Quadra T1U x2 Small, which has 12 Cores and 24GB RAM
  • VPU: NETINT Quadra T1U x2 Small, which has 12 Cores and 24GB RAM.
    note

    Only one Quadra T1U was used for these tests

  • GPU: RTX4000 Ada x1 Large, which has 16 Cores and 64GG RAM

Source Content

The following files were used for the capacity and quality testing.

Input NameResolutionFramesBit DepthFPSInterval (2 sec)
blue_sky_1920x1080p25_420.yuv1920x108021782550
crowd_run_1920x1080p50_420.yuv1920x1080500850100
ducks_take_off_1920x1080p50_420.yuv1920x1080500850100
in_to_tree_1920x1080p50_420.yuv1920x1080500850100
old_town_cross_1920x1080p50_420.yuv1920x1080500850100
park_joy_1920x1080p50_420.yuv1920x1080500850100
pedestrian_area_1920x1080p25_420.yuv1920x108037582550
riverbed_1920x1080p25_420.yuv1920x108025082550
rush_hour_1920x1080p25_420.yuv1920x108050082550
station2_1920x1080p25_420.yuv1920x108031382550
sunflower_1920x1080p25_420.yuv1920x108050082550
tractor_1920x1080p25_420.yuv1920x108069082550

Bitrate

The following bitrate were used in the testing:

  • 1 Mbps
  • 1.5 Mbps
  • 3.5 Mbps
  • 7.5 Mbps

Presets

The following presets were used for CPU and GPU

  • CPU: Medium for both x264 and x265
  • GPU: P7 (Highest Quality Preset)

Versions:

CodecVersion
CPU (x264)r3213 570f6c7
CPU (x265)5163c32d7
VPU5.0.0
GPUDriver: 570.86.10
Cuda: 12.8

Commands

The following are the commands used to perform the validation.

AVC/H.264

CPU

x264 --aq-mode 0 --no-scenecut --bframes 3 --b-adapt 0 --rc-lookahead 16  \
--input-depth {input_depth} --output-depth {output_depth} \
--input-res {resolution} --fps {fps} --bitrate {bitrate} --vbv-bufsize {2*bitrate} \
--keyint {interval} --min-keyint {interval} --preset {preset} --frames {total frames} \
-o {output} {input}

GPU

ffmpeg -y -vsync 0 -hwaccel cuda -s:v {resolution} \
-r {fps} -i {input} -c:v h264_nvenc -pix_fmt yuv420p -preset {preset} -rc cbr \
-bufsize {bitrate} -tune hq -bf 3 -b_ref_mode middle -b_adapt 0 -rc-lookahead 15 -vsync 0 \
-b_qfactor 1 -spatial-aq 0 -temporal-aq 0 -b:v {bitrate} -maxrate {bitrate} -minrate {bitrate} \
-profile:v high -g {interval} -spatial-aq 0 -temporal-aq 1 -vsync 0 -vframes {total frames} {output}

VPU

ffmpeg -y -vsync 0 -s {resolution} -r {fps} \
-i {input} -c:v h264_ni_quadra_enc \
-xcoder-params level=0:frameRate={fps}:RcEnable=1:vbvBufferSize=2000:bitrate={bitrate}:intraPeriod={interval}:gopPresetIdx=-1:entropyCodingMode=1:lookaheadDepth=16:cuLevelRCEnable=0:rdoLevel=1:EnableRdoQuant=1 \
-vframes {total_frames} {output}

HEVC/H.265

CPU

x265 --aq-mode 0 --no-scenecut --bframes 3 --b-adapt 0 --rc-lookahead 16 \
--input-depth {input_depth} --output-depth {output_depth} \
--input-res {resolution} --fps {fps} --bitrate {bitrate} \
--vbv-bufsize {2*bitrate} --keyint {interval} --min-keyint {interval} \
--preset {preset} --frames {total frames} \
-o {output} {input}

GPU

ffmpeg -y -vsync 0 -hwaccel cuda -s:v {resolution} \
-r {fps} -i {input} -c:v hevc_nvenc -pix_fmt yuv420p -preset {preset}-rc cbr \
-bufsize {bitrate} -tune hq -bf 5 -b_ref_mode middle -b_adapt 0 -rc-lookahead 15 -vsync 0 \
-b_qfactor 1 -spatial-aq 0 -temporal-aq 0 -b:v {bitrate} -maxrate {bitrate} -minrate {bitrate} \
-profile:v main -g {interval} -spatial-aq 0 -temporal-aq 1 -vsync 0 -vframes {total frames} {output}

VPU

ffmpeg -y -vsync 0 -s {resolution} -r {fps} \
-i {input} -c:v h265_ni_quadra_enc \
-xcoder-params level=0:frameRate={fps}:RcEnable=1:vbvBufferSize=2000:bitrate={bitrate}:intraPeriod={interval}:gopPresetIdx=-1:entropyCodingMode=1:lookaheadDepth=16:cuLevelRCEnable=0:rdoLevel=1:EnableRdoQuant=1 \
-vframes {total_frames} {output}

Throughput Analysis

The below analysis evaluates the throughput performance, using the blue_sky_1920x1080p25_420.yuv source file with bitrate of 3.125Mbps for the Quadra (VPU), software (CPUs), and Nvidia (GPUs) across two different Codecs: H.264 and H.265. Various number of parallel instances were used to determine total frames per second (fps).

AVC/H.264

InstancesCPU (fps per instance)VPU (fps per instance)GPU (fps per instance)
437.13105.8762.33
818.8555.1034.37
1015.2346.2828.19
169.6029.6318.52
324.8316.009.16

HEVC/H.265

InstancesCPU (fps per instance)VPU (fps per instance)GPU (fps per instance)
417.85124.1454.31
89.8170.0930.70
107.6056.5825.11
164.8336.4916.43
322.4718.998.53

Summary

The throughput data underscores the scalability and efficiency of VPUs (Quadra) for both H.264 and H.265 encoding. As instance counts increase, CPU and GPU performance drop sharply: CPU drops to 4.83 FPS (H.264) and 2.47 FPS (H.265) at 32 instances, while GPU drops to 9.16 fps (H.264) and 8.53 fps (H.265). In contrast, VPUs maintain robust throughput, achieving 16.00 fps (H.264) and 18.99 fps (H.265) at 32 instances. This resilience highlights VPUs’ superior scalability under heavy workloads compared to CPUs and GPUs.

VPUs consistently outperform CPUs across all instance levels and remain highly competitive with GPUs, particularly as instance counts rise. While the data shows GPUs trailing VPUs even at 4 instances (e.g., 62.33 fps vs. VPU’s 105.87 fps for H.264, and 54.31 fps vs. 124.14 fps for H.265)—delivering up to 2x the throughput of GPU P7(highest quality).

Quality Analysis

This section presents a detailed quality analysis of video encoding performed on CPU (software-based encoding with x264/x265), GPU (hardware-accelerated encoding with NVENC), and VPU (hardware-accelerated encoding with Quadra). The primary objective is to evaluate and compare the perceptual video quality delivered by each encoding across varying bitrate conditions.

GPU vs VPU (Quadra)

Results

GPU (H.264)GPU (H.265)
Quadra-4.49-14.62

NOTE: GPU used P7 for Highest Quality

BD-Rate Graphs

Expand the sections below to view the BD-Rate Graphs for H.264 and H.265

H.264

x264-1 x264-2 x264-3

H.265

x265-1 x265-2 x265-3

Summary

  • For H.264 encoding, Quadra exhibits a clear improvement over GPU-based NVENC H.264 encoding (using the high-quality P7 preset), with a VMAF BD-Rate difference of -4.49. This suggests Quadra retains noticeably better perceptual quality compared to the GPU’s top-tier setting. The difference, while not massive, is significant given that NVENC P7 is optimized for maximum quality, indicating Quadra’s edge in visual fidelity even against NVIDIA’s best effort.

  • For H.265 encoding, the advantage becomes dramatically more pronounced, with a VMAF BD-Rate difference of -14.62. Quadra vastly outperforms GPU-based H.265 encoding (again, using the high-quality P7 preset), offering exceptional quality retention across the tested bitrate range. This large negative value underscores Quadra’s superior robustness, especially in high-compression scenarios where NVENC P7, despite its high-quality tuning, exhibits far more noticeable quality degradation.

CPU vs VPU (Quadra)

Results

GPU (H.264)CPU (H.265)
Quadra-2.29-6.74

BD-Rate Graphs

Expand the sections below to view the BD-Rate Graphs for H.264 and H.265

H.264

gpu-x264-1 gpu-x264-2 gpu-x264-3

H.265

gpu-x265-1 gpu-x265-2 gpu-x265-3

Summary

  • For H.264 encoding, Quadra demonstrates a quality advantage over CPU-based x264 encoding, with a VMAF BD-Rate difference of -2.29. This negative value indicates Quadra delivers better perceptual quality than the CPU across the tested bitrates. While the improvement is moderate, it showcases Quadra’s ability to preserve visual fidelity more effectively than CPU encoding, likely due to optimized hardware or algorithmic efficiency.

  • For H.265 encoding, the quality gap widens significantly, with a VMAF BD-Rate difference of -6.74. Quadra outperforms CPU-based x265 encoding, particularly at lower bitrates where H.265 compression artifacts tend to be more pronounced. This substantial advantage highlights Quadra’s robustness in maintaining quality under demanding compression scenarios, making it a strong contender against traditional CPU encoding.

Conclusion

This report presents a comprehensive evaluation of CPU (x264/x265), GPU (NVENC), and VPU (Quadra) across two critical dimensions: throughput and video quality, assessed using the VMAF metric. The analysis leverages 12 diverse video sequences tested at four bitrate levels.

Quadra demonstrates exceptional throughput scalability, maintaining robust performance—16.00 FPS (H.264) and 18.99 FPS (H.265) at 32 instances, while CPU and GPU performance degrade sharply (e.g., CPU at 4.83 FPS H.264 and 2.47 FPS H.265, GPU at 9.16 FPS H.264 and 8.53 FPS H.265 at 32 instances). This resilience is complemented by a significant quality advantage: Quadra outperforms CPU with VMAF BD-Rate differences of -2.29 (H.264) and -6.74 (H.265), and GPU with -4.49 (H.264) and -14.62 (H.265). These metrics highlight Quadra’s ability to deliver superior perceptual quality with reasonable settings, particularly in high-compression H.265 scenarios, making it ideal for real-time streaming, multi-instance encoding, and bandwidth-constrained applications. Even against GPU’s highest quality preset (P7), Quadra’s quality edge persists, with notable robustness at lower bitrates .

In conclusion, Quadra (VPU) emerges as the premier choice, offering an unmatched combination of scalability, efficiency, and quality to meet diverse video processing demands, with Quadra vs. GPU P7(highest) showing significant advantages in both quality (up to 15%) and throughput (2x).