Hongliang Lu
Welcome to my personal homepage! I am a third-year M.E. student in Mechanical Engineering at the College of Engineering, Peking University, advised by Prof. Zaiwen Wen. I received my B.E. degree in Robotics Engineering from Peking University in 2023.
My research interests primarily lie at the intersection of Reinforcement Learning and Large Language Models:
- RL for LLMs: developing data-efficient RL algorithms and improving post-training effectiveness to boost model performance and align with human preferences;
- Agentic RL: designing novel RL methods to enhance LLM agents, with a recent focus on self-play to push the frontier of agent capabilities through adversarial training;
- LLMs for Optimization: leveraging large language models to assist optimization modeling.
I have interned at two leading AI companies. At Alibaba’s QuarkLLM team(2025.05-2025.09), I contributed to the Deep Search project, designing RL algorithms to strengthen the Deep Search agent’s performance on tasks requiring multi-step reasoning and complex retrieval . Previously, at Moonshot AI (202501-2025.05), I worked on data synthesis and RL training for their WebAgent.
news
| Oct 22, 2025 | We are excited to release our latest research work in Agentic RL: “Search Self-Play: Pushing the Frontier of Agent Capability without Supervision”! 🚀 The paper has been submitted to ICLR 2026 and explores novel self-play training methods for enhancing agent capabilities without supervision. |
|---|---|
| May 01, 2025 | Our paper “OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling” has been accepted as a poster presentation at ICML 2025! 🎉 |
selected publications
latest posts
| Nov 08, 2024 | Scaling Law |
|---|---|
| Jul 26, 2024 | Transformer Architecture Explained: Attention is All You Need |
| Jul 26, 2024 | Understanding Attention Mechanism: Self-Attention and Attention Models |