TY - GEN
T1 - A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts
AU - Yang, Zhen
AU - Keung, Jacky
AU - Yu, Xiao
AU - Gu, Xiaodong
AU - Wei, Zhengyuan
AU - Ma, Xiaoxue
AU - Zhang, Miao
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - Code comment has been an important part of computer programs, greatly facilitating the understanding and maintenance of source code. However, high-quality code comments are often unavailable in smart contracts, the increasingly popular programs that run on the blockchain. In this paper, we propose a Multi-Modal Transformer-based (MMTrans) code summarization approach for smart contracts. Specifically, the MMTrans learns the representation of source code from the two heterogeneous modalities of the Abstract Syntax Tree (AST), i.e., Structure-based Traversal (SBT) sequences and graphs. The SBT sequence provides the global semantic information of AST, while the graph convolution focuses on the local details. The MMTrans uses two encoders to extract both global and local semantic information from the two modalities respectively, and then uses a joint decoder to generate code comments. Both the encoders and the decoder employ the multi-head attention structure of the Transformer to enhance the ability to capture the long-range dependencies between code tokens. We build a dataset with over 300K pairs of smart contracts, and evaluate the MMTrans on it. The experimental results demonstrate that the MMTrans outperforms the state-of-The-Art baselines in terms of four evaluation metrics by a substantial margin, and can generate higher quality comments.
AB - Code comment has been an important part of computer programs, greatly facilitating the understanding and maintenance of source code. However, high-quality code comments are often unavailable in smart contracts, the increasingly popular programs that run on the blockchain. In this paper, we propose a Multi-Modal Transformer-based (MMTrans) code summarization approach for smart contracts. Specifically, the MMTrans learns the representation of source code from the two heterogeneous modalities of the Abstract Syntax Tree (AST), i.e., Structure-based Traversal (SBT) sequences and graphs. The SBT sequence provides the global semantic information of AST, while the graph convolution focuses on the local details. The MMTrans uses two encoders to extract both global and local semantic information from the two modalities respectively, and then uses a joint decoder to generate code comments. Both the encoders and the decoder employ the multi-head attention structure of the Transformer to enhance the ability to capture the long-range dependencies between code tokens. We build a dataset with over 300K pairs of smart contracts, and evaluate the MMTrans on it. The experimental results demonstrate that the MMTrans outperforms the state-of-The-Art baselines in terms of four evaluation metrics by a substantial margin, and can generate higher quality comments.
KW - Code Summarization
KW - Graph Convolution
KW - Smart Contracts
KW - Structure-based Traversal
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85113206073&partnerID=8YFLogxK
U2 - 10.1109/ICPC52881.2021.00010
DO - 10.1109/ICPC52881.2021.00010
M3 - Conference contribution
AN - SCOPUS:85113206073
T3 - IEEE International Conference on Program Comprehension
SP - 1
EP - 12
BT - Proceedings - 2021 IEEE/ACM 29th International Conference on Program Comprehension, ICPC 2021
T2 - 29th IEEE/ACM International Conference on Program Comprehension, ICPC 2021
Y2 - 20 May 2021 through 21 May 2021
ER -