Learning the Traveling Salesman Problem
An introduction to deep learning for combinatorial optimization
Note this post is still in progress but I’m leaving it up for now :)
Learning Goals
- Dive into deep learning for combinatorial optimization with a single standalone example
The Traveling Salesman Problem (TSP) has seen large interest from the deep learning community since 2016. It is seen as a solid benchmark problem for researching deep learning for combinatorial optimization. It is seen this way for several reasons.
Autoregressive vs non-autoregressive
- The problem has been studied for ~50 years.
- Already developed solvers such as Concorde TSP allow for both validation and some forms of supervised approaches.
- …
Autoregressive -> reinforcement learning
Non-autoregressive -> supervised
The two seminal papers:
- Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep Jaitly. Sequence to sequence, always start at first city, supervised
- Attention, Learn to Solve Routing Problems! Wouter Kool, Herke van Hoof, Max Welling Reinforcement Learning, transformers, autoregressive
In this post, I will implement a similar version to “Attention, Learn to Solve Routing Problems!”. However, we will simplify the code by only solving TSP. We can write much simpler code if we ignore the Vehicle Routing Problem and the Orienteering Problem. Try to only have the essential parts to solve just this one problem.
Differences between code:
- Solves just TSP instead of TSP, Vehicle Routing Problem, and Orienteering Problem
- Transformers use LayerNorm instead of BatchNorm
- Baseline done by tracking an exponential moving average for each TSP problem vs a greedy rollout.
What does autoregressive mean? Autoregressive in this context means that we will build the TSP solution incrementally. At each step, the model will add one additional node to the tour. This continues until all nodes have been added to the tour.
Encoder-decoder architecture. The model consists of two phases.
class TspModel(nn.Module):
def __init__(self, emb):
super().__init__()
self.encoder = TspEncoder(emb)
self.decoder = TspDecoder(emb)
def forward(self, x, strategy):
"""
returns:
log_probs: [batch_size, num_nodes - 1]
tours: [batch_size, num_nodes]
"""
h = self.encoder(x)
log_probs, tours = self.decoder(h, strategy)
return log_probs, tours
The encoder consists of two pieces.
- A linear layer that converts the 2d xy coordinates into the embedding dimension
- An “encoder only” transformer consisting of a bunch of multiheaded attention blocks
class TspEncoder(nn.Module):
def __init__(self, emb, blocks=3, heads=8):
super().__init__()
self.xy = nn.Linear(2, emb)
self.blocks = nn.Sequential(
*(MultiheadAttentionBlock(emb, heads) for _ in range(blocks))
)
def forward(self, x):
h = self.xy(x)
return self.blocks(h)