On the Effectiveness of Pretrained Models for API Learning

Abstract

Pre-trained models (PTMs) have shown great promise in various software engineering tasks. In this work, we study the effectiveness of PTMs for API learning — specifically, how well they can learn and recommend API usage sequences from code and natural language. We evaluate multiple PTM variants across key tasks including API sequence completion and cross-lingual API mapping, comparing against non-PTM baselines on curated benchmarks.

Publication
In IEEE/ACM International Conference on Program Comprehension (ICPC)

Overview

Pre-trained language models (PTMs) such as BERT, CodeBERT, and GPT variants have transformed NLP and are increasingly applied to software engineering tasks. This paper presents a systematic empirical study of PTM effectiveness specifically for API learning — the task of understanding, completing, and recommending API usage sequences from mixed natural-language and code inputs.

Research Questions

  • How effective are PTMs at API sequence completion compared to non-PTM approaches?
  • Does domain-specific pre-training (e.g., code-focused PTMs) outperform general PTMs for API learning?
  • How well do PTMs generalize across programming languages for cross-lingual API mapping?

Key Findings

  • PTMs consistently outperform traditional baselines on API learning tasks, particularly in low-resource settings.
  • Code-specific PTMs (e.g., CodeBERT) provide measurable gains over general-purpose PTMs on code-centric subtasks.
  • Cross-lingual transfer is effective, with PTMs showing strong generalization across Java and Python API benchmarks.

Published at: IEEE/ACM International Conference on Program Comprehension (ICPC) 2022 · Citations: 19

Mohammad Abdul Hadi
Mohammad Abdul Hadi
AI Security Researcher (Sr. Software Engineer)

AI Security Researcher at Huawei R&D — LLM architecture, malware analysis, and agentic multi-agent systems. 150+ citations across A* and A-rated conferences.

Next
Previous

Related