cv

Basics

Name Hongliang Lu
Label Graduate Student & RL Algorithm Engineer
Email lhl@pku.edu.cn
Url https://AuroraLHL.github.io
Summary Graduate student at Peking University specializing in Agentic RL, Multi-Agent systems, and Self-Play training paradigms. Research focuses on enhancing LLM agent capabilities through novel RL approaches. Experienced RL algorithm engineer with internships at Alibaba and Moonshot AI.

Work

  • 2025.05 - 2025.09
    RL Algorithm Engineer (Summer Internship)
    Alibaba Group - Quark - Large Model Training and Application - RL Team
    Led deep search project using Agentic RL to enhance LLM reasoning and retrieval. Designed multi-dimensional reward systems and pioneered Search Self-Play training paradigm.
    • Agentic RL
    • Search Self-Play
    • LLM Tool-Calling
    • Verl Framework
    • 10+ benchmark improvement
  • 2025.01 - 2025.05
    RL Algorithm Engineer
    Moonshot AI (Kimi) - RL Team - WebAgent Project
    Built WebAgent training infrastructure with PPO/GRPO adaptations. Developed end-to-end agent capabilities through SFT+RL pipeline and constructed evaluation benchmarks.
    • WebAgent Development
    • PPO/GRPO Implementation
    • Agent Evaluation
    • SFT+RL Pipeline

Education

  • 2023.09 - 2026.07

    Beijing, China

    M.S.
    Peking University, College of Engineering
    Mechanical Engineering (Industrial Engineering Management)
    • Python Data Science
    • Deep Learning
    • Machine Learning
    • Optimization
    • Reinforcement Learning
    • LLM Foundations
  • 2019.09 - 2023.06

    Beijing, China

    B.S.
    Peking University, College of Engineering
    Robotics Engineering
    • Robotics Systems
    • Control Theory
    • Machine Learning

Awards

Publications

Skills

Agentic RL & Agent Systems
Agentic RL
WebAgent/OSAgent
Self-Play Training
PPO/GRPO Algorithms
Agent Capability Enhancement
Programming & Frameworks
Python (Expert)
C++
MATLAB
Verl Framework
PyTorch
TensorFlow
Research & Development
Search Self-Play
Multi-dimensional Rewards
Agent Evaluation
Benchmark Development
Academic Writing

Languages

Chinese
Native speaker
English
Fluent

Interests

Agentic Reinforcement Learning
Agent-based RL
Multi-agent Systems
Self-Play Training
Agent Capability Enhancement
WebAgent & OSAgent Systems
Web Interaction Agents
Operating System Agents
Agent Evaluation
Environment Construction

Projects

  • 2025.05 - Present
    Search Self-Play Training Paradigm
    Pioneered a novel self-play training methodology that significantly enhances LLM agent capabilities without supervision, achieving 10+ point improvements on multiple benchmarks.
    • Self-Play Training
    • Agent Capability
    • LLM Enhancement
    • Benchmark Improvement
  • 2025.05 - 2025.09
    Agentic RL for Deep Search
    Developed deep search capabilities using Agentic RL with multi-dimensional reward design and forced reflection mechanisms to enhance LLM reasoning and retrieval.
    • Agentic RL
    • Deep Search
    • Multi-dimensional Rewards
    • Verl Framework