High-Throughput ACL Pattern Matching via GPU-Accelerated Convolution

Thu, 01 Jun 2023 00:00:00 +0000

Overview

A high-throughput network packet filtering engine built at Huawei R&D. The system treats each Access Control List (ACL) rule as a 5-dimensional filter and applies a convolutional linear scan over millions of rules for incoming packets in parallel on Tesla V100 GPUs.

Key Contributions

GPU-Accelerated Pattern Matching Engine

Treats each ACL rule as a 5-d filter (source IP, destination IP, source port, destination port, protocol) and applies a convolutional linear scan over millions of rules for incoming packets in parallel, achieving massive throughput gains over traditional CPU-based approaches.

Structure-of-Arrays (SoA) Rule Representation

Proposed a novel compact representation of 26 bytes/rule, enabling coalesced memory access across threads. The system stores up to 5M rules within 124 MB of GPU memory.

Throughput & Optimization Results

100M packets/second at 1K rules
~80K packets/second sustained at 5M rules
Early-termination on first full match minimizes wasted computation
GPU Profiler: Collects GPU specs on-the-fly and sweeps block/batch sizes, delivering up to 1.5× total-time speedup for the 5M-rule case by reducing kernel time ~30% via optimal batch partitioning.

Tech Stack: C, C++, PyTorch, Linux Kernel, CPU/GPU Profiling

Systems | Mohammad Abdul Hadi

High-Throughput ACL Pattern Matching via GPU-Accelerated Convolution

Overview

Key Contributions

GPU-Accelerated Pattern Matching Engine

Structure-of-Arrays (SoA) Rule Representation

Throughput & Optimization Results