ARSeek: Identifying API Resource Using Code and Discussion on Stack Overflow

Abstract

ARSeek identifies API resources by jointly analyzing code snippets and natural-language discussions on Stack Overflow. By combining code-based and text-based signals, ARSeek accurately locates the most relevant API documentation and usage examples for a developer query, outperforming approaches that rely on a single modality.

Publication
In IEEE/ACM International Conference on Program Comprehension (ICPC)

Overview

When developers look for API resources, they rely on two complementary information sources: code (showing how an API is called) and discussion (explaining why and when to use it). ARSeek is a retrieval system that jointly exploits both modalities from Stack Overflow to identify the most relevant API resources for a given developer query.

Approach

  • Code signal: Extracts API call patterns and signatures from Stack Overflow answer code blocks, matched against query API names and contexts.
  • Discussion signal: Applies PTM-based semantic analysis to Stack Overflow post bodies and titles to understand the conceptual relevance of each post to the query.
  • Joint ranking: Fuses both signals into a unified ranking model that outperforms either modality used alone.

Key Results

ARSeek consistently outperforms code-only and text-only baselines on the API resource identification benchmark, demonstrating that the two modalities provide complementary signals that are stronger in combination.

Published at: IEEE/ACM International Conference on Program Comprehension (ICPC) 2022 · Citations: 6

Mohammad Abdul Hadi
Mohammad Abdul Hadi
AI Security Researcher (Sr. Software Engineer)

AI Security Researcher at Huawei R&D — LLM architecture, malware analysis, and agentic multi-agent systems. 150+ citations across A* and A-rated conferences.

Previous

Related