On the Effectiveness of Pretrained Models for API Learning

Mohammad Abdul Hadi, Ibrahim Nur Baki Yusuf, Ferdian Thung, Xuan-Bach D. Le, Lingxiao Jiang, Fatemeh H. Fard, David Lo

May 2022

PDF

Abstract

Pre-trained models (PTMs) have shown great promise in various software engineering tasks. In this work, we study the effectiveness of PTMs for API learning — specifically, how well they can learn and recommend API usage sequences from code and natural language. We evaluate multiple PTM variants across key tasks including API sequence completion and cross-lingual API mapping, comparing against non-PTM baselines on curated benchmarks.

Type

Conference paper

Publication

In IEEE/ACM International Conference on Program Comprehension (ICPC)

Overview

Pre-trained language models (PTMs) such as BERT, CodeBERT, and GPT variants have transformed NLP and are increasingly applied to software engineering tasks. This paper presents a systematic empirical study of PTM effectiveness specifically for API learning — the task of understanding, completing, and recommending API usage sequences from mixed natural-language and code inputs.

Research Questions

How effective are PTMs at API sequence completion compared to non-PTM approaches?
Does domain-specific pre-training (e.g., code-focused PTMs) outperform general PTMs for API learning?
How well do PTMs generalize across programming languages for cross-lingual API mapping?

Key Findings

PTMs consistently outperform traditional baselines on API learning tasks, particularly in low-resource settings.
Code-specific PTMs (e.g., CodeBERT) provide measurable gains over general-purpose PTMs on code-centric subtasks.
Cross-lingual transfer is effective, with PTMs showing strong generalization across Java and Python API benchmarks.

Published at: IEEE/ACM International Conference on Program Comprehension (ICPC) 2022 · Citations: 19

Mohammad Abdul Hadi

AI Security Researcher (Sr. Software Engineer)

AI Security Researcher at Huawei R&D — LLM architecture, malware analysis, and agentic multi-agent systems. 150+ citations across A* and A-rated conferences.