OptMATH - A Scalable Bidirectional Data Synthesis Framework

News

2025/05/01 - Our paper has been accepted as a poster presentation at ICML 2025 !

2025/04/15 - We have expanded our problem coverage to 60+ diverse optimization problem types. We will continue to expand the dataset with more problem types and quantities, stay tuned!

2025/02/21 - We have released our 200K training dataset (OptMATH-Train) and OptMATH-Bench.

Overview

OptMATH is a scalable framework for synthesizing high-quality optimization modeling datasets. Large language models (LLMs) have shown remarkable capabilities in understanding natural language and performing complex reasoning tasks, but translating natural language descriptions of optimization problems into solver-ready formats remains a significant challenge.

The framework consists of a bidirectional pipeline that:

Generates problem data (PD) with controllable complexity from seed mathematical formulations (MF)
Creates natural language (NL) descriptions through backtranslation
Validates the correspondence between NL and PD through forward modeling and rejection sampling

Key Features

Bidirectional Data Synthesis: A novel pipeline that generates high-quality optimization modeling data through both forward and reverse processes
Comprehensive Problem Coverage: Spans 60+ diverse optimization problem types across multiple domains including logistics, supply chain, manufacturing, and finance
Controllable Complexity: Generates problems with adjustable difficulty levels through feedback-driven parameter tuning
Mathematical Validation: Employs rigorous rejection sampling based on solution equivalence to ensure dataset quality
Proven Effectiveness: Models fine-tuned on OptMATH achieve state-of-the-art performance across multiple benchmarks

Dataset

OptMATH consists of two main components designed for training and evaluation:

OptMATH-Train

Over 200k high-quality and diverse optimization problems
Covers diverse optimization scenarios including logistics, supply chain, manufacturing, etc.

OptMATH-Bench

Coverage of various problem types (LP, MILP, IP, NLP, SOCP)

Problem Type	NL4OPT	MAMO EasyLP	MAMO ComplexLP	OptMATH-Bench
IP	26.94%	58.74%	15.64%	11.92%
LP	61.63%	0.61%	62.08%	8.81%
MILP	11.43%	40.65%	22.28%	67.87%
NLP	0.00%	0.00%	0.00%	5.18%
SOCP	0.00%	0.00%	0.00%	6.22%

IP: Integer Programming

LP: Linear Programming

MILP: Mixed Integer Linear Programming

NLP: Nonlinear Programming

SOCP: Second-Order Cone Programming

Problem Length Analysis

OptMATH features significantly longer problem descriptions, challenging LLMs with more complex reasoning requirements.(2.9× longer than MAMO EasyLP)

Dataset Distribution

OptMATH effectively surrounds other benchmarks in the embedding space, explaining the transfer learning benefits.

Results

We use the LLaMAFactory framework for fine-tuning. For more details, please refer to https://github.com/hiyouga/LLaMA-Factory.

Main Results

The table below presents a comprehensive comparison of model performance across various benchmarks. Our fine-tuned models, especially OptMATH-Qwen2.5-32B, consistently outperform both baseline models and other fine-tuned models, demonstrating that training with OptMATH-Train significantly enhances optimization modeling capabilities.

Types	Models	NL4OPT	MAMO EasyLP	MAMO ComplexLP	OptMATH Bench	IndustryOR	OptiBench	Macro AVG
Baseline	Llama3.1-8B	0.0%	0.2%	0.0%	0.0%	0.0%	0.0%	0.1%
	Qwen2.5-7B	86.9%	83.6%	21.8%	1.6%	10.0%	36.2%	40.0%
	GPT-3.5-turbo	78.0%	79.3%	33.2%	15.0%	21.0%	47.4%	51.4%
	GPT-4	89.0%	87.3%	49.3%	16.6%	33.3%	68.6%	57.4%
	Deepseek-V3	95.9%	88.3%	51.1%	32.6%	37.0%	71.6%	62.8%
	OptiMUS (GPT-4o-2024-05-13)	78.8%	77.0%	43.6%	20.2%	31.0%	45.8%	49.4%
	Qwen2.5-32B	92.7%	82.2%	44.6%	9.3%	16.0%	47.6%	48.7%
Fine-tuning	ORLM-Llama-3-8B (reported)	85.7%	82.3%	37.4%	*	38.0%	*	60.9%
	ORLM-Llama-3-8B (reproduced)	84.5%	74.9%	34.1%	2.6%	24.0%	51.1%	45.2%
	OptMATH-Llama3.1-8B (pass@1)	55.5%	73.9%	40.8%	24.4%	18.0%	55.5%	44.7%
	OptMATH-Qwen2.5-7B (pass@1)	94.7%	86.5%	51.2%	24.4%	20.0%	57.9%	55.8%
	OptMATH-Qwen2.5-32B (pass@1)	95.9%	89.9%	54.1%	34.7%	31.0%	66.1%	62.0%
Pass@8	OptMATH-Llama3.1-8B	97.6%	94.2%	71.6%	51.6%	37.0%	66.6%	69.8%
	OptMATH-Qwen2.5-7B	98.4%	94.5%	72.5%	56.0%	38.0%	68.1%	71.3%
	OptMATH-Qwen2.5-32B	97.9%	93.9%	75.4%	67.4%	47.0%	76.8%	76.4%

Ablation Study

Ablation study on Model Size

To investigate the effectiveness of OptMATH training across different model scales, we conducted experiments using Qwen2.5 models ranging from 0.5B to 32B parameters. Due to computational constraints, we used a randomly sampled subset of 100,000 training examples. As shown in the figure below, all models exhibit substantial performance improvements after OptMATH-Train fine-tuning.

Ablation study on Data Size

As shown in the figure below, the performance of Qwen2.5-1.5B across different benchmarks varies with the amount of training data. The model demonstrates notable improvements in optimization modeling capabilities even with a small portion of the OptMATH-Train dataset. The performance gains gradually level off as more training data is added, showing a typical diminishing returns pattern.

Interactive Exploration

Model Performance Across Benchmarks

This visualization compares the performance of different models across all benchmarks. Models are grouped into three categories: Baseline models, Fine-tuned models (pass@1), and Fine-tuned models with multiple sampling (pass@8).

Select Benchmark:

Model Types

Baseline Models

Fine-tuned Models (pass@1)

Fine-tuned Models (pass@8)

Benchmark Scenario Distribution

This visualization compares the application scenario distribution across different optimization benchmarks. OptMATH-Bench stands out with the most diverse and balanced coverage of industrial scenarios, including significant representation in manufacturing, telecommunications, aviation, and other sectors that are underrepresented in other benchmarks.

Data Complexity Distribution

This visualization demonstrates the complexity of different optimization benchmarks in terms of variables and constraints. OptMATH-Bench contains problems with significantly higher median variable and constraint counts, providing a more realistic and challenging test environment for optimization algorithms compared to other benchmarks.

AutoFormulation Example

Prompt

Below is an operations research question. Build a mathematical model and corresponding python code using `gurobipy` that appropriately addresses the question.

# Question:
{{Question}}

# Notes:
- First provide problem analysis and its mathematical formulation before the Python code
- The Python code should start with the following lines:
```python
import gurobipy as gp
from gurobipy import GRB
```
- Make sure the model variable is named `model`.
- Avoid using "<" and ">" in Gurobi constraints; instead, use "<=" or ">=" as appropriate.
- Carefully determine whether the variable is an integer or a continuous variable.

# Response:
(Provide your response here, keep the notes above in mind)

Input (Natural Language Description):

A logistics company must deliver packages to four cities: City 0, City 1, City 2, and City 3, minimizing the total delivery cost while ensuring each city is visited exactly once and the route forms a continuous loop without subtours. The delivery costs between cities are as follows: City 0 to City 1 costs 616 units, City 0 to City 2 costs 567 units, City 0 to City 3 costs 143 units, City 1 to City 0 costs 388 units, City 1 to City 2 costs 947 units, City 1 to City 3 costs 628 units, City 2 to City 0 costs 495 units, City 2 to City 1 costs 650 units, City 2 to City 3 costs 604 units, City 3 to City 0 costs 77 units, City 3 to City 1 costs 630 units, and City 3 to City 2 costs 23 units. The company uses binary decision variables to indicate whether a route is used and assigns position variables to each city to prevent subtours. The position variable for City 0 is fixed at 0, while the position variables for City 1, City 2, and City 3 must be integers between 0 and 3. The objective is to minimize the total delivery cost, which is the sum of the costs of all selected routes, including the term 0 times the position variable for City 0. Constraints ensure each city is visited and departed exactly once. Subtour elimination is enforced by ensuring that if a route from City i to City j is used, the position of City j must be greater than the position of City i, except for City 0, which is fixed as the starting point. These constraints are implemented through specific relationships between the route variables and the position variables, guaranteeing the delivery route forms a single loop without smaller loops. The constraints also explicitly bound the position variables for City 1, City 2, and City 3 to be between 0 and 3, and ensure that the position variables are correctly related to the route variables to eliminate subtours.

Output (OptMATH Model Response):

Problem Analysis:

We are tasked with solving a Traveling Salesman Problem (TSP) variant where a logistics company needs to deliver packages to four cities (City 0, City 1, City 2, and City 3) with the objective of minimizing the total delivery cost. The company must ensure that each city is visited exactly once, and the route forms a continuous loop without subtours. The delivery costs between cities are provided, and we need to use binary decision variables to indicate whether a route is used and assign position variables to each city to prevent subtours.

Mathematical Formulation:

Decision Variables:

x_ij : Binary variable indicating whether the route from City i to City j is used.
u_i : Position variable for City i, where u₀ = 0 and u_i is an integer between 0 and 3 for i = 1, 2, 3.

Objective Function:

Minimize ∑_i=0³ ∑_{j=0, j≠i}³ c_ijx_ij

where c_ij is the cost of traveling from City i to City j.

Constraints:

Each city is visited exactly once:
∑_{j=0, j≠i}³ x_ij = 1 ∀i ∈ {0, 1, 2, 3}
Each city is departed exactly once:
∑_{i=0, i≠j}³ x_ij = 1 ∀j ∈ {0, 1, 2, 3}
Subtour elimination constraints:
u_i - u_j + 4x_ij ≤ 3 ∀i, j ∈ {1, 2, 3}, i ≠ j
Position variable bounds:
u₀ = 0

1 ≤ u_i ≤ 3 ∀i ∈ {1, 2, 3}

Python Code:

import gurobipy as gp
        from gurobipy import GRB
          
        # Define the cities and the cost matrix
        cities = [0, 1, 2, 3]
        cost = {
        (0, 1): 616, (0, 2): 567, (0, 3): 143,
        (1, 0): 388, (1, 2): 947, (1, 3): 628,
        (2, 0): 495, (2, 1): 650, (2, 3): 604,
        (3, 0): 77, (3, 1): 630, (3, 2): 23
        }
          
        # Create the model
        model = gp.Model("TSP")
          
        # Decision variables
        x = model.addVars(cities, cities, vtype=GRB.BINARY, name="x")
        u = model.addVars(cities, vtype=GRB.INTEGER, name="u")
          
        # Set the objective function
        model.setObjective(gp.quicksum(cost[i, j] * x[i, j] for i, j in cost), GRB.MINIMIZE)
          
        # Constraints
        # Each city is visited exactly once
        model.addConstrs((gp.quicksum(x[i, j] for j in cities if j != i) == 1 for i in cities), name="visit_once")
          
        # Each city is departed exactly once
        model.addConstrs((gp.quicksum(x[i, j] for i in cities if i != j) == 1 for j in cities), name="depart_once")
          
        # Subtour elimination constraints
        model.addConstrs((u[i] - u[j] + 4 * x[i, j] <= 3 for i in cities if i != 0 for j in cities if j != 0 and i != j), name="subtour_elim")
          
        # Position variable bounds
        model.addConstr(u[0] == 0, name="u0")
        model.addConstrs((u[i] >= 1 for i in cities if i != 0), name="u_lower_bound")
        model.addConstrs((u[i] <= 3 for i in cities if i != 0), name="u_upper_bound")
          
        # Optimize the model
        model.optimize()
          
        # Print the solution
        if model.status == GRB.OPTIMAL:
        print("Optimal solution found!")
        print(f"Total delivery cost: {model.objVal}")
        print("Route:")
        for i in cities:
          for j in cities:
           if i != j and x[i, j].x > 0.5:
            print(f"City {i} -> City {j} (Cost: {cost[i, j]})")
        else:
        print("No optimal solution found.")

Explanation:

Decision Variables: x[i, j] is a binary variable that indicates whether the route from City i to City j is used. u[i] is an integer variable representing the position of City i in the route.
Objective Function: The objective is to minimize the total delivery cost, which is the sum of the costs of all selected routes.
Constraints: The constraints ensure that each city is visited and departed exactly once, and that the route forms a single loop without subtours. The subtour elimination constraints are implemented using the position variables u[i].
Optimization: The model is optimized using Gurobi, and the optimal route and total delivery cost are printed.

Backtranslation Example

Our bidirectional framework first generates problem instances from mathematical formulations, then creates natural language descriptions through backtranslation. This example shows the transformation process.

An Example of Backtranslation

Input-General Formulation

Minimize $\sum_{i=1}^{n} c_i x_i$

Subject to $\sum_{i \in S_j} x_i \geq k_j$ $\forall j \in \text{Elements}$

$x_i \in \{0,1\}$ $\forall i \in \text{Sets}$

c_i represents the cost coefficient for each set
x_i is a binary decision variable indicating whether set i is selected
S_j represents the set of all sets containing element j
k_j represents the minimum number of times element j needs to be covered

Generator

Input-LP File

Minimize
              80000 Selected[1] + 40000 Selected[2] + 20000
            Selected[3] + 10000Selected[4]
               + 80000 Selected[5] + 90000 Selected[6]
            Subject To
             MultiCover_e1: Selected[1] + Selected[3] + Selected[5] +
            Selected[6] >= 4
            …………
             MultiCover_e10: Selected[1] + Selected[4] + Selected[5] +
            Selected[6]
               >= 4
            Bounds
            Binaries
             Selected[1] Selected[2] Selected[3] Selected[4] Selected[5]
            Selected[6]
            End

Backtranslation

Output-Natural Language Description

A city is planning the layout of emergency medical stations. There are 6 candidate locations for building medical stations, each with different construction costs:

Location 1: Construction cost $80,000; Location 2: Construction cost $40,000 ......

The city is divided into 10 districts, each requiring different numbers of medical stations for coverage due to population density and emergency medical needs:

Districts 1 and 2: require coverage by at least 4 stations; District 3: requires coverage by at least 2 stations ......

Each candidate location can cover specific districts:

Location 1 covers districts: 1, 2, 6, 10; Location 2 covers districts: 3, 5, 6, 9 ......

The objective is to decide which locations should be selected for building medical stations, minimizing the total construction cost while meeting the coverage requirements for each district. Each location can only be selected or not selected (binary decision).

Citation

@misc{lu2025optmathscalablebidirectionaldata,
      title={OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling},
      author={Hongliang Lu and Zhonglin Xie and Yaoyu Wu and Can Ren and Yuxuan Chen and Zaiwen Wen},
      year={2025},
      eprint={2502.11102},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.11102},
}

Contact

We hope that our work is valuable for your research or projects. If you are interested in our research or have any bug reports or comments, please feel free to contact one of the authors:

Hongliang Lu, lhl@pku.edu.cn
Zhonglin Xie, zlxie@pku.edu.cn
Zaiwen Wen, wenzw@pku.edu.cn