cv
Basics
| Name | Hongliang Lu |
| Label | Graduate Student & RL Algorithm Engineer |
| lhl@pku.edu.cn | |
| Url | https://AuroraLHL.github.io |
| Summary | Graduate student at Peking University specializing in Agentic RL, Multi-Agent systems, and Self-Play training paradigms. Research focuses on enhancing LLM agent capabilities through novel RL approaches. Experienced RL algorithm engineer with internships at Alibaba and Moonshot AI. |
Work
-
2025.05 - 2025.09 RL Algorithm Engineer (Summer Internship)
Alibaba Group - Quark - Large Model Training and Application - RL Team
Led deep search project using Agentic RL to enhance LLM reasoning and retrieval. Designed multi-dimensional reward systems and pioneered Search Self-Play training paradigm.
- Agentic RL
- Search Self-Play
- LLM Tool-Calling
- Verl Framework
- 10+ benchmark improvement
-
2025.01 - 2025.05 RL Algorithm Engineer
Moonshot AI (Kimi) - RL Team - WebAgent Project
Built WebAgent training infrastructure with PPO/GRPO adaptations. Developed end-to-end agent capabilities through SFT+RL pipeline and constructed evaluation benchmarks.
- WebAgent Development
- PPO/GRPO Implementation
- Agent Evaluation
- SFT+RL Pipeline
Education
-
2023.09 - 2026.07 Beijing, China
M.S.
Peking University, College of Engineering
Mechanical Engineering (Industrial Engineering Management)
- Python Data Science
- Deep Learning
- Machine Learning
- Optimization
- Reinforcement Learning
- LLM Foundations
-
2019.09 - 2023.06 Beijing, China
B.S.
Peking University, College of Engineering
Robotics Engineering
- Robotics Systems
- Control Theory
- Machine Learning
Awards
- 2022.01.01
Third Prize in Beijing University Student Mathematics Competition
Beijing Municipal Education Commission
33rd Beijing University Student Mathematics Competition - Third Prize.
Publications
-
2026.01.01 Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
Under Review at ICLR 2026
Hongliang Lu*, Yuhang Wen*, Pengyu Cheng, Ruijin Ding, Haotian Xu, Jiaqi Guo, Chutian Wang, Haonan Chen, Xiaoxi Jiang, Guanjun Jiang. Pioneering self-play methodology for enhancing agent capabilities.
-
2025.01.01 OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling
ICML 2025
Hongliang Lu*, Zhonglin Xie*, Yaoyu Wu, Can Ren, Yuxuan Chen, Zaiwen Wen. A novel framework for optimization modeling using bidirectional data synthesis.
Skills
| Agentic RL & Agent Systems | |
| Agentic RL | |
| WebAgent/OSAgent | |
| Self-Play Training | |
| PPO/GRPO Algorithms | |
| Agent Capability Enhancement |
| Programming & Frameworks | |
| Python (Expert) | |
| C++ | |
| MATLAB | |
| Verl Framework | |
| PyTorch | |
| TensorFlow |
| Research & Development | |
| Search Self-Play | |
| Multi-dimensional Rewards | |
| Agent Evaluation | |
| Benchmark Development | |
| Academic Writing |
Languages
| Chinese | |
| Native speaker |
| English | |
| Fluent |
Interests
| Agentic Reinforcement Learning | |
| Agent-based RL | |
| Multi-agent Systems | |
| Self-Play Training | |
| Agent Capability Enhancement |
| WebAgent & OSAgent Systems | |
| Web Interaction Agents | |
| Operating System Agents | |
| Agent Evaluation | |
| Environment Construction |
Projects
- 2025.05 - Present
Search Self-Play Training Paradigm
Pioneered a novel self-play training methodology that significantly enhances LLM agent capabilities without supervision, achieving 10+ point improvements on multiple benchmarks.
- Self-Play Training
- Agent Capability
- LLM Enhancement
- Benchmark Improvement
- 2025.05 - 2025.09
Agentic RL for Deep Search
Developed deep search capabilities using Agentic RL with multi-dimensional reward design and forced reflection mechanisms to enhance LLM reasoning and retrieval.
- Agentic RL
- Deep Search
- Multi-dimensional Rewards
- Verl Framework