News
Overview
OptMATH is a scalable framework for synthesizing high-quality optimization modeling datasets. Large language models (LLMs) have shown remarkable capabilities in understanding natural language and performing complex reasoning tasks, but translating natural language descriptions of optimization problems into solver-ready formats remains a significant challenge.
The framework consists of a bidirectional pipeline that:
- Generates problem data (PD) with controllable complexity from seed mathematical formulations (MF)
- Creates natural language (NL) descriptions through backtranslation
- Validates the correspondence between NL and PD through forward modeling and rejection sampling

Key Features
- Bidirectional Data Synthesis: A novel pipeline that generates high-quality optimization modeling data through both forward and reverse processes
- Comprehensive Problem Coverage: Spans 60+ diverse optimization problem types across multiple domains including logistics, supply chain, manufacturing, and finance
- Controllable Complexity: Generates problems with adjustable difficulty levels through feedback-driven parameter tuning
- Mathematical Validation: Employs rigorous rejection sampling based on solution equivalence to ensure dataset quality
- Proven Effectiveness: Models fine-tuned on OptMATH achieve state-of-the-art performance across multiple benchmarks
Dataset
OptMATH consists of two main components designed for training and evaluation:
OptMATH-Train
- Over 200k high-quality and diverse optimization problems
- Covers diverse optimization scenarios including logistics, supply chain, manufacturing, etc.

OptMATH-Bench
- Coverage of various problem types (LP, MILP, IP, NLP, SOCP)
Problem Type | NL4OPT | MAMO EasyLP | MAMO ComplexLP | OptMATH-Bench |
---|---|---|---|---|
IP |
26.94%
|
58.74%
|
15.64%
|
11.92%
|
LP |
61.63%
|
0.61%
|
62.08%
|
8.81%
|
MILP |
11.43%
|
40.65%
|
22.28%
|
67.87%
|
NLP |
0.00%
|
0.00%
|
0.00%
|
5.18%
|
SOCP |
0.00%
|
0.00%
|
0.00%
|
6.22%
|
Problem Length Analysis

- OptMATH features significantly longer problem descriptions, challenging LLMs with more complex reasoning requirements.(2.9× longer than MAMO EasyLP)
Dataset Distribution

OptMATH effectively surrounds other benchmarks in the embedding space, explaining the transfer learning benefits.
Results
We use the LLaMAFactory framework for fine-tuning. For more details, please refer to https://github.com/hiyouga/LLaMA-Factory.
Main Results
The table below presents a comprehensive comparison of model performance across various benchmarks. Our fine-tuned models, especially OptMATH-Qwen2.5-32B, consistently outperform both baseline models and other fine-tuned models, demonstrating that training with OptMATH-Train significantly enhances optimization modeling capabilities.
Types | Models | NL4OPT | MAMO EasyLP | MAMO ComplexLP | OptMATH Bench | IndustryOR | OptiBench | Macro AVG |
---|---|---|---|---|---|---|---|---|
Baseline | Llama3.1-8B | 0.0% | 0.2% | 0.0% | 0.0% | 0.0% | 0.0% | 0.1% |
Qwen2.5-7B | 86.9% | 83.6% | 21.8% | 1.6% | 10.0% | 36.2% | 40.0% | |
GPT-3.5-turbo | 78.0% | 79.3% | 33.2% | 15.0% | 21.0% | 47.4% | 51.4% | |
GPT-4 | 89.0% | 87.3% | 49.3% | 16.6% | 33.3% | 68.6% | 57.4% | |
Deepseek-V3 | 95.9% | 88.3% | 51.1% | 32.6% | 37.0% | 71.6% | 62.8% | |
OptiMUS (GPT-4o-2024-05-13) | 78.8% | 77.0% | 43.6% | 20.2% | 31.0% | 45.8% | 49.4% | |
Qwen2.5-32B | 92.7% | 82.2% | 44.6% | 9.3% | 16.0% | 47.6% | 48.7% | |
Fine-tuning | ORLM-Llama-3-8B (reported) | 85.7% | 82.3% | 37.4% | * | 38.0% | * | 60.9% |
ORLM-Llama-3-8B (reproduced) | 84.5% | 74.9% | 34.1% | 2.6% | 24.0% | 51.1% | 45.2% | |
OptMATH-Llama3.1-8B (pass@1) | 55.5% | 73.9% | 40.8% | 24.4% | 18.0% | 55.5% | 44.7% | |
OptMATH-Qwen2.5-7B (pass@1) | 94.7% | 86.5% | 51.2% | 24.4% | 20.0% | 57.9% | 55.8% | |
OptMATH-Qwen2.5-32B (pass@1) | 95.9% | 89.9% | 54.1% | 34.7% | 31.0% | 66.1% | 62.0% | |
Pass@8 | OptMATH-Llama3.1-8B | 97.6% | 94.2% | 71.6% | 51.6% | 37.0% | 66.6% | 69.8% |
OptMATH-Qwen2.5-7B | 98.4% | 94.5% | 72.5% | 56.0% | 38.0% | 68.1% | 71.3% | |
OptMATH-Qwen2.5-32B | 97.9% | 93.9% | 75.4% | 67.4% | 47.0% | 76.8% | 76.4% |
Ablation Study
Ablation study on Model Size
To investigate the effectiveness of OptMATH training across different model scales, we conducted experiments using Qwen2.5 models ranging from 0.5B to 32B parameters. Due to computational constraints, we used a randomly sampled subset of 100,000 training examples. As shown in the figure below, all models exhibit substantial performance improvements after OptMATH-Train fine-tuning.

Ablation study on Data Size
As shown in the figure below, the performance of Qwen2.5-1.5B across different benchmarks varies with the amount of training data. The model demonstrates notable improvements in optimization modeling capabilities even with a small portion of the OptMATH-Train dataset. The performance gains gradually level off as more training data is added, showing a typical diminishing returns pattern.

Interactive Exploration
Model Performance Across Benchmarks
This visualization compares the performance of different models across all benchmarks. Models are grouped into three categories: Baseline models, Fine-tuned models (pass@1), and Fine-tuned models with multiple sampling (pass@8).
Model Types
Benchmark Scenario Distribution
This visualization compares the application scenario distribution across different optimization benchmarks. OptMATH-Bench stands out with the most diverse and balanced coverage of industrial scenarios, including significant representation in manufacturing, telecommunications, aviation, and other sectors that are underrepresented in other benchmarks.
Data Complexity Distribution
This visualization demonstrates the complexity of different optimization benchmarks in terms of variables and constraints. OptMATH-Bench contains problems with significantly higher median variable and constraint counts, providing a more realistic and challenging test environment for optimization algorithms compared to other benchmarks.
AutoFormulation Example
Prompt
Below is an operations research question. Build a mathematical model and corresponding python code using `gurobipy` that appropriately addresses the question.
# Question:
{{Question}}
# Notes:
- First provide problem analysis and its mathematical formulation before the Python
code
- The Python code should start with the following lines:
```python
import
gurobipy as gp
from gurobipy import GRB
```
- Make sure the model variable is named `model`.
- Avoid using "<" and ">" in Gurobi constraints; instead, use "<=" or ">=" as
appropriate.
- Carefully determine whether the variable is an integer or a continuous variable.
# Response:
(Provide your response here, keep the notes above in mind)
Input (Natural Language Description):
A logistics company must deliver packages to four cities: City 0, City 1, City 2, and City 3, minimizing the total delivery cost while ensuring each city is visited exactly once and the route forms a continuous loop without subtours. The delivery costs between cities are as follows: City 0 to City 1 costs 616 units, City 0 to City 2 costs 567 units, City 0 to City 3 costs 143 units, City 1 to City 0 costs 388 units, City 1 to City 2 costs 947 units, City 1 to City 3 costs 628 units, City 2 to City 0 costs 495 units, City 2 to City 1 costs 650 units, City 2 to City 3 costs 604 units, City 3 to City 0 costs 77 units, City 3 to City 1 costs 630 units, and City 3 to City 2 costs 23 units. The company uses binary decision variables to indicate whether a route is used and assigns position variables to each city to prevent subtours. The position variable for City 0 is fixed at 0, while the position variables for City 1, City 2, and City 3 must be integers between 0 and 3. The objective is to minimize the total delivery cost, which is the sum of the costs of all selected routes, including the term 0 times the position variable for City 0. Constraints ensure each city is visited and departed exactly once. Subtour elimination is enforced by ensuring that if a route from City i to City j is used, the position of City j must be greater than the position of City i, except for City 0, which is fixed as the starting point. These constraints are implemented through specific relationships between the route variables and the position variables, guaranteeing the delivery route forms a single loop without smaller loops. The constraints also explicitly bound the position variables for City 1, City 2, and City 3 to be between 0 and 3, and ensure that the position variables are correctly related to the route variables to eliminate subtours.
Output (OptMATH Model Response):
Problem Analysis:
We are tasked with solving a Traveling Salesman Problem (TSP) variant where a logistics company needs to deliver packages to four cities (City 0, City 1, City 2, and City 3) with the objective of minimizing the total delivery cost. The company must ensure that each city is visited exactly once, and the route forms a continuous loop without subtours. The delivery costs between cities are provided, and we need to use binary decision variables to indicate whether a route is used and assign position variables to each city to prevent subtours.
Mathematical Formulation:
Decision Variables:
- xij : Binary variable indicating whether the route from City i to City j is used.
- ui : Position variable for City i, where u0 = 0 and ui is an integer between 0 and 3 for i = 1, 2, 3.
Objective Function:
where cij is the cost of traveling from City i to City j.
Constraints:
- Each city is visited exactly once:
∑j=0, j≠i3 xij = 1 ∀i ∈ {0, 1, 2, 3}
- Each city is departed exactly once:
∑i=0, i≠j3 xij = 1 ∀j ∈ {0, 1, 2, 3}
- Subtour elimination constraints:
ui - uj + 4xij ≤ 3 ∀i, j ∈ {1, 2, 3}, i ≠ j
- Position variable bounds:
u0 = 01 ≤ ui ≤ 3 ∀i ∈ {1, 2, 3}
Python Code:
import gurobipy as gp
from gurobipy import GRB
# Define the cities and the cost matrix
cities = [0, 1, 2, 3]
cost = {
(0, 1): 616, (0, 2): 567, (0, 3): 143,
(1, 0): 388, (1, 2): 947, (1, 3): 628,
(2, 0): 495, (2, 1): 650, (2, 3): 604,
(3, 0): 77, (3, 1): 630, (3, 2): 23
}
# Create the model
model = gp.Model("TSP")
# Decision variables
x = model.addVars(cities, cities, vtype=GRB.BINARY, name="x")
u = model.addVars(cities, vtype=GRB.INTEGER, name="u")
# Set the objective function
model.setObjective(gp.quicksum(cost[i, j] * x[i, j] for i, j in cost), GRB.MINIMIZE)
# Constraints
# Each city is visited exactly once
model.addConstrs((gp.quicksum(x[i, j] for j in cities if j != i) == 1 for i in cities), name="visit_once")
# Each city is departed exactly once
model.addConstrs((gp.quicksum(x[i, j] for i in cities if i != j) == 1 for j in cities), name="depart_once")
# Subtour elimination constraints
model.addConstrs((u[i] - u[j] + 4 * x[i, j] <= 3 for i in cities if i != 0 for j in cities if j != 0 and i != j), name="subtour_elim")
# Position variable bounds
model.addConstr(u[0] == 0, name="u0")
model.addConstrs((u[i] >= 1 for i in cities if i != 0), name="u_lower_bound")
model.addConstrs((u[i] <= 3 for i in cities if i != 0), name="u_upper_bound")
# Optimize the model
model.optimize()
# Print the solution
if model.status == GRB.OPTIMAL:
print("Optimal solution found!")
print(f"Total delivery cost: {model.objVal}")
print("Route:")
for i in cities:
for j in cities:
if i != j and x[i, j].x > 0.5:
print(f"City {i} -> City {j} (Cost: {cost[i, j]})")
else:
print("No optimal solution found.")
Explanation:
- Decision Variables:
x[i, j]
is a binary variable that indicates whether the route from Cityi
to Cityj
is used.u[i]
is an integer variable representing the position of Cityi
in the route. - Objective Function: The objective is to minimize the total delivery cost, which is the sum of the costs of all selected routes.
- Constraints: The constraints ensure that each city is
visited and departed exactly once, and that the route forms a single loop
without subtours. The subtour elimination constraints are implemented using
the position variables
u[i]
. - Optimization: The model is optimized using Gurobi, and the optimal route and total delivery cost are printed.
Backtranslation Example
Our bidirectional framework first generates problem instances from mathematical formulations, then creates natural language descriptions through backtranslation. This example shows the transformation process.
An Example of Backtranslation
- ci represents the cost coefficient for each set
- xi is a binary decision variable indicating whether set i is selected
- Sj represents the set of all sets containing element j
- kj represents the minimum number of times element j needs to be covered
Minimize 80000 Selected[1] + 40000 Selected[2] + 20000 Selected[3] + 10000Selected[4] + 80000 Selected[5] + 90000 Selected[6] Subject To MultiCover_e1: Selected[1] + Selected[3] + Selected[5] + Selected[6] >= 4 ………… MultiCover_e10: Selected[1] + Selected[4] + Selected[5] + Selected[6] >= 4 Bounds Binaries Selected[1] Selected[2] Selected[3] Selected[4] Selected[5] Selected[6] End
A city is planning the layout of emergency medical stations. There are 6 candidate locations for building medical stations, each with different construction costs:
Location 1: Construction cost $80,000; Location 2: Construction cost $40,000 ......
The city is divided into 10 districts, each requiring different numbers of medical stations for coverage due to population density and emergency medical needs:
Districts 1 and 2: require coverage by at least 4 stations; District 3: requires coverage by at least 2 stations ......
Each candidate location can cover specific districts:
Location 1 covers districts: 1, 2, 6, 10; Location 2 covers districts: 3, 5, 6, 9 ......
The objective is to decide which locations should be selected for building medical stations, minimizing the total construction cost while meeting the coverage requirements for each district. Each location can only be selected or not selected (binary decision).
Citation
@misc{lu2025optmathscalablebidirectionaldata,
title={OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling},
author={Hongliang Lu and Zhonglin Xie and Yaoyu Wu and Can Ren and Yuxuan Chen and Zaiwen Wen},
year={2025},
eprint={2502.11102},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.11102},
}
Contact
We hope that our work is valuable for your research or projects. If you are interested in our research or have any bug reports or comments, please feel free to contact one of the authors:
- Hongliang Lu, lhl@pku.edu.cn
- Zhonglin Xie, zlxie@pku.edu.cn
- Zaiwen Wen, wenzw@pku.edu.cn