A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem
Quick Answer
This paper shows that A Transformer-based scheduling policy for the Open Shop Scheduling Problem (OSSP) demonstrates competitive performance, achieving makespan gaps of 12.89-15.12% on large instances (40x40 to 100x100) compared to classical heuristics.
Quick Take
A Transformer-based scheduling policy for the Open Shop Scheduling Problem (OSSP) demonstrates competitive performance, achieving makespan gaps of 12.89-15.12% on large instances (40x40 to 100x100) compared to classical heuristics. Trained on smaller benchmarks (4x4 to 10x10), it generalizes effectively, outperforming SPT and LPT while remaining close to EST.
Key Points
- Developed a Transformer-based model using encoder-decoder architecture for OSSP.
- Achieved makespan gaps of 12.89-15.12% on large randomly generated instances.
- Trained on Taillard benchmark instances (4x4 to 10x10) with processing-time matrix input.
- Outperformed classical heuristics like SPT and LPT, remaining competitive with EST.
- Indicates potential for learning-based alternatives in large-scale scheduling problems.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 13682v1 Announce Type: new Abstract: The open shop scheduling problem (OSSP) arises in many industrial and service settings but remains computationally challenging as the number of jobs and machines increases. While exact methods quickly become intractable, classical dispatching rules and metaheuristics may require substantial tuning to maintain solution quality at large scales.
This study develops a Transformer-based scheduling policy for OSSP using an encoder-decoder architecture with multi-head attention. The model is trained on Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10) using only the processing-time matrix as input and produces feasible schedules with makespans typically within 15-30% of best-known values.
To evaluate scalability, the trained policy is applied without retraining to randomly generated instances from 40x40 to 100x100 and compared against classical dispatching heuristics, including SPT, LPT, MWKR, and EST. Across these large instances, the Transformer achieved average gaps of 12. 89-15. 12% relative to a standard lower bound. Compared with EST, the Transformer remained competitive, typically within a modest margin, while substantially outperforming SPT and LPT.
These results indicate that a Transformer policy trained on small OSSP instances can generalize to substantially larger problems and provide a feature-light, learning-based alternative to classical dispatching rules.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.