Maze generation with Variational Autoencoders

Juna Salviati
5 min readFeb 1, 2022


Photo by Clark Van Der Beken on Unsplash

“People change in the maze. Oh, find the cup if you can. But be very wary; you could just lose yourselves along the way.” (Albus Dumbledore)

In this post, we will approach the maze generation problem with Machine Learning. We will use Variational Autoencoders and TensorFlow to train a neural net and to generate new mazes starting from a generated training set.

(Sources here)

Variational Autoencoders 101

(A short summary — readapted for mazes — from

Before understanding what Variational Autoencoders are, it’s better to take a step back to know what “Autoencoders” are.

Autoencoders are a kind of neural network trained to produce an output that wants to be as similar as possible to the input.

(Image from

The net architecture is composed by an encoder and a decoder:

  • the encoder is responsible for reducing the number of features describing the input (performing a dimensionality reduction — that is, data compression). This process will “project” the data from the original initial space to the latent space.
  • the decoder is responsible to decompress the data from the encoded latent space back to the initial space (reversing the process).
Image from Wikipedia

Performing encoding/decoding, what we want to achieve is to minimize the loss of information when encoding and to minimize error when decoding.

The idea to generate new images is to get a point in the latent space and then to decode it. To do so, we have to be sure that no matter how we choose a random sample from the latent space, it will always be decoded to meaningful content — something that Autoencoders are not suitable for.

In order to make the generative process available, a “regularization” is introduced in Variational Autoencoders (VAE). The “regularization” gives the space two properties: continuity (decoding from two sampled points which are near does not give back two complete different results) and completeness (when decoding a random point from the latent space we always get meaningful content back).

Instead of sampling inputs as points in the latent space, they are sampled as normal distributions. Variational Autoencoders make those distributions overlap.

When we take a point on the intersection of n distributions we obtain something that, decoded, has properties from both the distributions.

Loss for VAE is composed by the reconstruction loss plus the regularization term.

Generating the training set

We will generate the training set with a script (

The script will use the “binary tree” algorithm with probability p = 0.5 to produce a number of maze images.

Borrowing the algorithm output from the previous post (, we will just replace “#” with 1s and “ “ (spaces) with 0s to output a black and white image for each generated maze.

In main(), we will use those methods to generate 665 training images and 332 test images of size 36*36 pixels.

Here you can see how the images resulting from this script look like:

Output from maze training set generation script

Building the neural network

To build the neural network we have to define both the encoder and the decoder.

We define the size to build the original dimension of the image as the edge of the maze multiplied by 3, as a consequence of how we draw the training images — “carve_maze” carves a corridor in a 3x3 block for each maze matrix cell (see here for details).

Looking at the summary for the encoder we can see how the dimensions reduce from the input to the latent space.

In a case of a 36 * 3 maze size, the original dimension is:

(36 * 3) * (36 * 3) = 11664 (input as a single array with 11664 elements)

Dense layers reduce dimensions to 64 and then to 2.

“Sampling” is an additional special layer which allows the reparameterization trick: instead of mapping input data point to points, it maps them to normal multivariate distributions.

The decoder takes a sample from the latent space and “decompress” it:

At this point, everything is in place to train the VAE: after reading the train and test images, we customize the loss to track and set the checkpoint directory to allow weight saving during the training. Here we also define an EarlyStopping callback to stop the training when we don’t see an improvement in 10 epochs.

fit() function returns an history, which can be used to plot the loss:

We can plot the latent space by taking samples out of it to draw:

Generating a maze (this function is improperly called “get_prediction”) can be done by sampling a random point from the latent space and decoding it:

After loading the weights, we can plot the latent space and/or get an output to save to an image:

Here it is one of the produced outputs. It’s for sure coming from the VAE, because the images from the training set are spanning trees while here we can see loops (corridors are drawn in black).

VAE generated maze

About the output

Observing samples decoded from latent space we can see that from the very center of the space we decode…a lattice.

It is clear why if we think about vector sampled near the center of the latent space as an interpolation of all the inputs.

This theory is confirmed by picking a sample very close to the center (scale=10). Here we obtain a quasi-lattice:

A sample in the middle of the latent space is decoded as a quasi-lattice

While we get samples away from the center we obtain better looking mazes (not guarantee to be always connected):

A non completely connected VAE generated maze
A VAE generated good looking maze
Some VAE generated mazes are “furry”


I would like to thank Gregory Sech for kindly supporting this exploration and giving advices.



Juna Salviati

Full-time Human-computer interpreter. Opinions are my own.