Building a Unity environment for training an agent with Python

At the end of my previous reinforcement learning post, I showed you a demo of a Deep Q-Network agent that evolves in a Unity environment. I used the Unity Machine Learning Agents Toolkit to build one.

In this post, we will see how to build this learning environment and to use it with Python. The Banana Collector environment is an example from the ml-agents repository that I modified to be more straightforward. I will show you from start to end how to do it.

Tutorial made with Unity 2017.4.xx LTS and mlagents 0.7.0.

Unity Hub

For this tutorial, we will need the Unity Hub.

Once installed and opened, go in the Installs > Official Releases to install the Unity 2017.4.xx LTS release. Be careful to add "Linux Build Support" in the installer wizard.

Downloading the resources

For the following, we will use the environments examples available in the ml-agents repository.

We will also need to download the TensorFlow Sharp plugin.

Setting the project

After creating a new project, you should get something similar to the following image.

Unity interface with annotations

Unity interface with annotations

We will now set some values in the Project Settings.

  • Go to Edit > Project Settings > Player.

  • In the Inspector panel, under the Resolution and Presentation, check the Run in Background option.

  • In the Other Settings tab, add ENABLE_TENSORFLOW to the Scripting Define Symbols and set the Scripting Runtime Version to .NET 4.6 (it will ask you to restart Unity).

[Click to zoom]

That's it for the project settings. Now we will open an example and do some change before exporting our learning environment.

Getting assets

From the ml-agents repository that you downloaded, drag and drop the content of UnitySDK/Assets/ in the Project Panel. You will also need to install the TFSharpPlugin.unitypackage. You can also use a drag and drop movement to do it.

Now we can open and modify the Banana Collector example.

  • Open the Banana scenes from Assets > ML-Agents > Examples > BananaCollectors > Scenes

  • You should have content in the Hierarchy panel.

Modifying the Banana Collector example

We can simplify the project by removing in the Hierarchy panel, the RLArea(x) and the Agent(x) under RLArea to keep just one agent.

Testing the environment

For testing the environment, we need to select the Agent object in the Hierarchy panel. You should see in the Inspector some information. In this panel, under Banana Agent (Script), set Brain value to BananaPlayer.

Now we will look at the information contained in the BananaPlayer. For that, in the project panel, open BananaPlayer in Assets > ML-Agents > Examples > BananaCollectors > Brains.

Under Discrete Player Actions, you can explore the key-mapping or change it if you like. Now give it a try to play yourself the environment by clicking on Play.

Keys to play the environment

Keys to play the environment

Changing the actions (optional)

Vector actions + definition

According to the description of the Banana Collector environment in Unity doc examples the vector action space has 11 possibles actions organized in four branches. We can recover this definition in the Inspector of BananaPlayer, under Brain Parameters.

Vector Action details

Vector Action details

However, we want something manageable like a vector with five possible actions: Forward, Backward, Rotate Left, Rotate Right and No Action.

We will see how to modify the code.

Modifying the C# code

In Assets > ML-Agents > Examples > BananaCollectors > Scripts, open BananaAgent.cs. It will launch MonoDevelop

Opening the  BananaAgent.cs  script

Opening the BananaAgent.cs script

From line 100 to 142, there is the implementation of the vector action. We will modify this block to get one branch and five actions: No Action (0), Forward (1), Backward(2), Rotate Clockwise and Rotate Counter-wise.

Modifying the vector actions definition in BananaPlayer

Changing the vector action definition and key-mapping

Changing the vector action definition and key-mapping

Now that we have modified the code, we must also change the Vector Actions definition in BananaPlayer. Don't forget to edit the Discrete Player Actions according to the previous settings if you want to test the environment again.

/!\ Once we are satisfied, we can report the vector action definition in the BananaLearning brain object. /!\


Building the environment

We will set the control of the agent to BananaLearning. For that, select the Agent in the Hierarchy panel and in the Inspector, set the Brain value to BananaLearning.

We will also allow an external program to take control of the agent. Select Academy in the Hierarchy panel, then checks the Control checkbox in the Banana Academy section.

[Click to zoom]

Now we will produce an executable program to use it with Python. We will go in File > Build Settings …

If there is nothing in the Scenes In Build, click on Add Open Scenes.

Select your Target Platform and then click on Build.

You should get a <name_build>_Data/ folder and a <name_build>.<architecture> executable.

Build setting window

Build setting window

Using our building environment with python

We will need to install the Unity ML-Agents Toolkit.

pip install mlagents

When I made this post, it was the 0.7.0 version.

Change the environment filename with the filename of your build executable.

If you run python, you should see your agent moving randomly in the environment.

Banana Collector environment

Banana Collector environment


If you get the following error

AttributeError: 'UnityEnvironment' object has no attribute 'communicator',

that means that the environment is not closed yet. You can wait a bit or change the worker_id if you are in a hurry.

That’s it! Have fun with Unity and training your agent(s)!