Security | Mohammad Abdul Hadi

High-Throughput ACL Pattern Matching via GPU-Accelerated Convolution

Thu, 01 Jun 2023 00:00:00 +0000

Overview

A high-throughput network packet filtering engine built at Huawei R&D. The system treats each Access Control List (ACL) rule as a 5-dimensional filter and applies a convolutional linear scan over millions of rules for incoming packets in parallel on Tesla V100 GPUs.

Key Contributions

GPU-Accelerated Pattern Matching Engine

Treats each ACL rule as a 5-d filter (source IP, destination IP, source port, destination port, protocol) and applies a convolutional linear scan over millions of rules for incoming packets in parallel, achieving massive throughput gains over traditional CPU-based approaches.

Structure-of-Arrays (SoA) Rule Representation

Proposed a novel compact representation of 26 bytes/rule, enabling coalesced memory access across threads. The system stores up to 5M rules within 124 MB of GPU memory.

Throughput & Optimization Results

100M packets/second at 1K rules
~80K packets/second sustained at 5M rules
Early-termination on first full match minimizes wasted computation
GPU Profiler: Collects GPU specs on-the-fly and sweeps block/batch sizes, delivering up to 1.5× total-time speedup for the 5M-rule case by reducing kernel time ~30% via optimal batch partitioning.

Tech Stack: C, C++, PyTorch, Linux Kernel, CPU/GPU Profiling

Malware Filter Framework (MFF) — CNN Optimization

Thu, 01 Sep 2022 00:00:00 +0000

Overview

Optimization and engineering overhaul of Huawei’s production Malware Filter Framework (MFF) at Anshi Lab. The work combined architectural improvements to the deep learning model with low-level systems engineering to achieve a dramatic performance improvement in a production security pipeline.

CNN Optimization via Atrous Spatial Pyramid Pooling

Replaced standard convolutions in MFF with dilated (atrous) convolutions using Spatial Pyramid Pooling, enabling the model to capture multi-scale features without increasing the number of parameters. Combined with feature-profiling and memory caching, this achieved a 315% performance boost over the baseline.

Model Lifecycle & Engineering Excellence

LLMOps Pipelines Management: Directed multiple lifecycle components including Model Versioners, Validators, Regression Testing, Runtimes, Schedulers, Domain/Data-Drift Detectors, and Retrainers.
Module Refactoring: Drove codebase restructuring and introduced industry-leading testing and software build practices to improve engineering efficiency.
Technology Map: Linux C, user-space process development, kernel module development, memory allocation optimization, and low-level performance instrumentation.

Tech Stack: C, TensorFlow, Linux Kernel, CI/CD, MLflow, Weights & Biases, Docker, Kubernetes

Semantic LLM — Comprehensive Binary Analysis for Malware Detection

Thu, 01 Sep 2022 00:00:00 +0000

Overview

A full-stack agentic AI system for zero-day binary malware analysis, built at Huawei R&D’s Anshi Lab. The system operates at the intersection of LLM for Security (binary analysis) and Security for LLM (adversarial robustness), and is deployed across heterogeneous hardware including Huawei NPU, GPU clusters, and IoT edge devices.

Semantic Function Model (SFM)

Developed two architecture variants for function-level binary analysis:

Tokenless Instruction Set Transformer — takes 32-dimensional architecture-specific instruction sets as input, eliminating the need for a separate tokenizer.
Intermediate Representation Tokenizer — lifts binaries to LLVM IR with a POV Normalization Engine for architecture-agnostic analysis.

Semantic Program Model (SPM)

Replaced Self-Attention with Holographic Reduced Representations (HRR) in a transformer. This maps XOR-logic to the Query-Key interaction with O(T log T) complexity, enabling analysis of malicious binaries with 100,000+ functions. The symbolic binding operations act as a natural adversarial noise filter, making the model inherently resistant to adversarial attacks.

Malware Analyst LLM

Utilized Mixture-of-Experts (MoE) routing across SFM and SPM pathways. Elevated the container framework with Agent Client Protocol (ACP) and Model Context Protocol (MCP) infrastructure that dynamically coordinates:

6 multi-turn Online Agents including: Program Encoder Signature Generator, KNN Search, CFG Segment Classifier (GAT), LLM4Decompile Code Generation, and Pangu-R1 reasoning for explainability.
10+ Autonomous Tools including: Ghidra Pro Disassembler, LLVM IR Lifter, Static and Dynamic Behavior Logger (Emulator), and Chroma RAG-DB.

Downstream Capabilities

Scaled capabilities include Function DNA Matching, vulnerability auditing, and cross-architecture code similarity search using high-dimensional function/program embeddings from SFM and SPM.

Heterogeneous Hardware Deployment

Huawei NPU: W8A8 dynamic quantization via HiFloat (HF8) using CANN & ModelSlim.
IoT/Edge: Progressive Teacher–Student Distillation.
GPU Clusters: Mixed-precision training against open-source and in-house malware repositories.

Tech Stack: Python, C++, Assembly, PyTorch, Hugging Face (Transformers, PEFT), LoRA/QLoRA, CANN, vLLM, NVIDIA TensorRT-LLM, DSPy, PydanticAI, CrewAI, MCP, Pinecone, CPU/GPU/NPU Profiling