Associate Professor
School of Computer Science, Shanghai Jiao Tong University
Contact:
Room 1208, Software Building, No.800 Dongchuan Road, Shanghai, China
Email:
Research Interest:
My research focuses on large language models for natural and programming languages. I develop efficient machine learning methodologies for software code. My research topics are:
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
In Proceedings of the 48th International Conference on Software Engineering (ICSE 2026). Rio De Janeiro, Brazil, April 12 - 18, 2026.
(CCF-A)
[paper]
[code]
LongCodeZip: Compress Long Context for Code Language Models
In Proceedings of the 40th International Conference on Automated Software Engineering (ASE 2025), Seoul, Korea, Nov. 16-20, 2025
(CCF-A)
[paper]
[code]
Transplant Then Regenerate: A New Paradigm for Text Data Augmentation
In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). Suzhou, China, Nov 5 - 9, 2025.
(CCF-B)
[code]
LastingBench: Defend Benchmarks Against Knowledge Leakage
In Findings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025 Findings). Suzhou, China, Nov 5 - 9, 2025.
(CCF-B)
[paper]
[code]
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
In Proceedings of the 47th International Conference on Software Engineering (ICSE 2025). Ottawa, Ontario, Canada, April 27 - May 3, 2025.
(CCF-A)
[paper]
[code]
[bibtex]
On the Effectiveness of Large Language Models in Domain-Specific Code Generation
ACM Transactions on Software Engineering and Methodology (TOSEM 2024)
(CCF-A, ESI Highly Citated Paper)
[paper]
How Effectively Do Code Language Models Understand Poor-Readability Code?
In Proceedings of the 39th ACM/IEEE International Conference on Automated Software Engineering (ASE 2024). Sacramento, California, United States, Oct 27 - Nov 1, 2024.
(CCF-A)
[paper]
[code]
[bibtex]
VarGAN: Adversarial Learning of Variable Semantic Representations
IEEE Transactions on Software Engineering (TSE 2024)
(CCF-A)
[paper]
[code]
On the Evaluation of Neural Code Translation: Taxonomy and Benchmark
In Proceedings of the 38th International Conference on Automated Software Engineering (ASE 2023), Kirchberg, Luxembourg, Sept. 11-15, 2023
(CCF-A)
[paper]
[slides]
[code]
InfeRE: Step-by-Step Regex Generation via Chain of Inference
In Proceedings of the 38th International Conference on Automated Software Engineering (ASE 2023), Kirchberg, Luxembourg, Sept. 11-15, 2023
(CCF-A)
[paper]
[slides]
[code]
[bibtex]
Self-Supervised Query Reformulation for Code Search
In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023), San Francisco, California, United States, Dec 3-9, 2023
(CCF-A)
[paper]
[slides]
[code]
[bibtex]
Diet Code Is Healthy: Simplifying Programs for Pre-Trained Models of Code
In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022), Singapore, Nov 14-18, 2022
(CCF-A)
[paper]
[slides]
[code]
[bibtex]
Cross-Domain Deep Code Search with Meta Learning
In Proceedings of the 44th International Conference on Software Engineering (ICSE 2022)
(CCF-A)
[paper]
[code]
[slides]
[bibtex]
I am grateful to the wonderful students I have been collaborating with
Alumni
,场景知识增强的Java代码自动生成技术,2024.9.1-2025.2.25,主持
,基于大模型的恶意代码样本生成,2023.5.1-2024.4.31,主持| Program Committee | ASE (2025), ACL (2023), EMNLP (2021, 2022, 2023), COLING (2020, 2022, 2024), IJCAI (2023), EACL (2023) |
| Reviewer Board | Automated Software Engineering (AUSE), Empirical Software Engineering (EMSE) |
| Journal Reviewer | TSE, TOSEM, EMSE, IST, JSS, FCS |