Improving domain-specific neural code generation with few-shot meta-learning

  • Zhen Yang
  • , Jacky Wai Keung
  • , Zeyu Sun
  • , Yunfei Zhao
  • , Ge Li
  • , Zhi Jin
  • , Shuo Liu
  • , Yishu Li

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

Context: Neural code generation aims to automatically generate code snippets guided by Natural Language Descriptions (NLDs). In recent years, various neural code generation models for mainstream Programming Languages (PLs), such as Java and Python, have been proposed and demonostrated significant success in prior studies. Nonetheless, due to the scarcity of available training examples for some domain-specific PLs, such as Solidity, Bash, and Clojure, simply adopting previous neural models may lead to overfitting and inadequate learning. Objective: To overcome this challenge, we propose MetaCoder, a novel meta-learning code generation approach that efficiently extracts general-purpose knowledge from a large-scale source language and rapidly adapts to domain-specific scenarios, even with relatively few samples. Method: MetaCoder employs MAML, a powerful few-shot meta-learning method, to construct a transfer learning framework. This framework learns general-purpose knowledge from large-scale source languages and applies it in domain-specific target languages. To acquire more general-purpose knowledge, heterogeneous sub-tasks are constructed from the source language during the pre-training phase of MAML. As such, combining with CodeBERT and K-means, we design an unsupervised category assignment method for code generation samples, thereby exploiting the n-way k-shot rule to construct the heterogeneous sub-tasks. Consequently, MetaCoder can be applied to the code generation field. Results: We evaluate MetaCoder with both tree-based (e.g., TreeGen) and sequence-based (e.g., CodeGPT) backbones on two domain-specific PLs, including Solidity and Bash. Extensive experiments demonstrate the superior performance of our approach compared to baselines and verified its capability of code generation visually in practice. Conclusion: MetaCoder effectively extracts general-purpose knowledge from large-scale source languages, thereby enhancing model performance. Therefore, we highly recommend MetaCoder as a code generation approach for domain-specific PLs.

Original languageEnglish
Article number107365
JournalInformation and Software Technology
Volume166
DOIs
Publication statusPublished - Feb 2024
Externally publishedYes

Keywords

  • Code generation
  • Few-shot learning
  • Meta-learning
  • Transfer learning

Fingerprint

Dive into the research topics of 'Improving domain-specific neural code generation with few-shot meta-learning'. Together they form a unique fingerprint.

Cite this