High-Throughput ACL Pattern Matching via GPU-Accelerated Convolution

Overview

A high-throughput network packet filtering engine built at Huawei R&D. The system treats each Access Control List (ACL) rule as a 5-dimensional filter and applies a convolutional linear scan over millions of rules for incoming packets in parallel on Tesla V100 GPUs.

Key Contributions

GPU-Accelerated Pattern Matching Engine

Treats each ACL rule as a 5-d filter (source IP, destination IP, source port, destination port, protocol) and applies a convolutional linear scan over millions of rules for incoming packets in parallel, achieving massive throughput gains over traditional CPU-based approaches.

Structure-of-Arrays (SoA) Rule Representation

Proposed a novel compact representation of 26 bytes/rule, enabling coalesced memory access across threads. The system stores up to 5M rules within 124 MB of GPU memory.

Throughput & Optimization Results

  • 100M packets/second at 1K rules
  • ~80K packets/second sustained at 5M rules
  • Early-termination on first full match minimizes wasted computation
  • GPU Profiler: Collects GPU specs on-the-fly and sweeps block/batch sizes, delivering up to 1.5× total-time speedup for the 5M-rule case by reducing kernel time ~30% via optimal batch partitioning.

Tech Stack: C, C++, PyTorch, Linux Kernel, CPU/GPU Profiling

Previous

Related