High-Throughput ACL Pattern Matching via GPU-Accelerated Convolution
Overview
A high-throughput network packet filtering engine built at Huawei R&D. The system treats each Access Control List (ACL) rule as a 5-dimensional filter and applies a convolutional linear scan over millions of rules for incoming packets in parallel on Tesla V100 GPUs.
Key Contributions
GPU-Accelerated Pattern Matching Engine
Treats each ACL rule as a 5-d filter (source IP, destination IP, source port, destination port, protocol) and applies a convolutional linear scan over millions of rules for incoming packets in parallel, achieving massive throughput gains over traditional CPU-based approaches.
Structure-of-Arrays (SoA) Rule Representation
Proposed a novel compact representation of 26 bytes/rule, enabling coalesced memory access across threads. The system stores up to 5M rules within 124 MB of GPU memory.
Throughput & Optimization Results
- 100M packets/second at 1K rules
- ~80K packets/second sustained at 5M rules
- Early-termination on first full match minimizes wasted computation
- GPU Profiler: Collects GPU specs on-the-fly and sweeps block/batch sizes, delivering up to 1.5× total-time speedup for the 5M-rule case by reducing kernel time ~30% via optimal batch partitioning.
Tech Stack: C, C++, PyTorch, Linux Kernel, CPU/GPU Profiling