Jump to content
  • Advertisement
  • 05/17/19 05:32 AM

    Unity ML Agents

    Engines and Middleware


    Learn about Unity ML-Agents in this article by Micheal Lanham, a tech innovator and an avid Unity developer, consultant, manager, and author of multiple Unity games, graphics projects, and books.

    Unity has embraced machine learning and deep reinforcement learning in particular, with the aim of producing a working seep reinforcement learning (DRL) SDK for game and simulation developers. Fortunately, the team at Unity, led by Danny Lange, has succeeded in developing a robust cutting-edge DRL engine capable of impressive results. Unity uses a proximal policy optimization (PPO) model as the basis for its DRL engine; this model is significantly more complex and may differ in some ways.

    This article will introduce the Unity ML-Agents tools and SDK for building DRL agents to play games and simulations. While this tool is both powerful and cutting-edge, it is also easy to use and provides a few tools to help us learn concepts as we go. Be sure you have Unity installed before proceeding.

    Installing ML-Agents

    In this section, we cover a high-level overview of the steps you will need to take in order to successfully install the ML-Agents SDK. This material is still in beta and has already changed significantly from version to version. Now, jump on your computer and follow these steps:

    1. Be sure you have Git installed on your computer; it works from the command line. Git is a very popular source code management system, and there is a ton of resources on how to install and use Git for your platform. After you have installed Git, just be sure it works by test cloning a repository, any repository.

    2. Open a command window or a regular shell. Windows users can open an Anaconda window.

    3. Change to a working folder where you want to place the new code and enter the following command (Windows users may want to use C:\ML-Agents):

      git clonehttps://github.com/Unity-Technologies/ml-agents


    4. This will clone the ml-agents repository onto your computer and create a new folder with the same name. You may want to take the extra step of also adding the version to the folder name. Unity, and pretty much the whole AI space, is in continuous transition, at least at the moment. This means new and constant changes are always happening. At the time of writing, we will clone to a folder named ml-agents.6, like so:

      git clone https://github.com/Unity-Technologies/ml-agents ml-agents.6


    5. Create a new virtual environment for ml-agents and set it to 3.6, like so:

      conda create -n ml-agents python=3.6
      Use the documentation for your preferred environment


    6. Activate the environment, again, using Anaconda:

      activate ml-agents


    7. Install TensorFlow. With Anaconda, we can do this using the following:

      pip install tensorflow==1.7.1


    8. Install the Python packages. On Anaconda, enter the following:

      cd ML-Agents #from root folder
      cd ml-agents or cd ml-agents.6  #for example
      cd ml-agents
      pip install -e . or pip3 install -e .
    9. This will install all the required packages for the Agents SDK and may take several minutes. Be sure to leave this window open, as we will use it shortly.


    This should complete the setup of the Unity Python SDK for ML-Agents. In the next section, we will learn how to set up and train one of the many example environments provided by Unity.

    Training an agent

    We can now jump in and look at examples where deep reinforcement learning (DRL) is put to use. Fortunately, the new agent's toolkit provides several examples to demonstrate the power of the engine. Open up Unity or the Unity Hub and follow these steps:

    1. Click on the Open project button at the top of the Project dialog.
    2. Locate and open the UnitySDK project folder as shown in the following screenshot:

      Opening the Unity SDK Project
    3. Wait for the project to load and then open the Project window at the bottom of the editor. If you are asked to update the project, say yes or continue. Thus far, all of the agent code has been designed to be backward compatible.
    4. Locate and open the GridWorld scene as shown in this screenshot:
      Opening the GridWorld example scene
    5. Select the GridAcademy object in the Hierarchy window. 
    6. Then direct your attention to the Inspector window, and beside the Brains, click the target icon to open the Brain selection dialog:

      Inspecting the GridWorld example environment
    7. Select the GridWorldPlayer brain. This brain is a player brain, meaning that a player, you, can control the game.
    8. Press the Play button at the top of the editor and watch the grid environment form. Since the game is currently set to a player, you can use the WASD controls to move the cube. The goal is much like the FrozenPond environment we built a DQN for earlier. That is, you have to move the blue cube to the green + symbol and avoid the red X.

    Feel free to play the game as much as you like. Note how the game only runs for a certain amount of time and is not turn-based. In the next section, we will learn how to run this example with a DRL agent.

    What's in a brain?

    One of the brilliant aspects of the ML-Agents platform is the ability to switch from player control to AI/agent control very quickly and seamlessly. In order to do this, Unity uses the concept of a brain. A brain may be either player-controlled, a player brain, or agent-controlled, a learning brain. The brilliant part is that you can build a game and test it, as a player can then turn the game loose on an RL agent. This has the added benefit of making any game written in Unity controllable by an AI with very little effort.

    Training an RL agent with Unity is fairly straightforward to set up and run. Unity uses Python externally to build the learning brain model. Using Python makes far more sense since as we have already seen several DL libraries are built on top of it. Follow these steps to train an agent for the GridWorld environment:

    1. Select the GridAcademy again and switch the Brains from GridWorldPlayer to GridWorldLearning as shown:

      Switching the brain to use GridWorldLearning
    2. Click on the Control option at the end. This simple setting is what tells the brain it may be controlled externally. Be sure to double-check that the option is enabled.
    3. Select the trueAgent object in the Hierarchy window, and then, in the Inspector window, change the Brain property under the Grid Agent component to a GridWorldLearning brain:

      Setting the brain on the agent to GridWorldLearning
    4. For this sample, we want to switch our Academy and Agent to use the same brain, GridWorldLearning. Make sure you have an Anaconda or Python window open and set to the ML-Agents/ml-agents folder or your versioned ml-agents folder. 
    5. Run the following command in the Anaconda or Python window using the ml-agents virtual environment:
      mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
    6. This will start the Unity PPO trainer and run the agent example as configured. At some point, the command window will prompt you to run the Unity editor with the environment loaded.
    7. Press Play in the Unity editor to run the GridWorld environment. Shortly after, you should see the agent training with the results being output in the Python script window:

      unnamed (1).png
      Running the GridWorld environment in training mode
    8. Note how the mlagents-learn script is the Python code that builds the RL model to run the agent. As you can see from the output of the script, there are several parameters, or what we refer to as hyper-parameters, that need to be configured.
    9. Let the agent train for several thousand iterations and note how quickly it learns. The internal model here, called PPO, has been shown to be a very effective learner at multiple forms of tasks and is very well suited for game development. Depending on your hardware, the agent may learn to perfect this task in less than an hour.

    Keep the agent training and look at more ways to inspect the agent's training progress in the next section.

    Monitoring training with TensorBoard

    Training an agent with RL or any DL model for that matter is not often a simple task and requires some attention to detail. Fortunately, TensorFlow ships with a set of graph tools called TensorBoard that we can use to monitor training progress. Follow these steps to run TensorBoard:

    1. Open an Anaconda or Python window. Activate the ml-agents virtual environment. Don't shut down the window running the trainer; we need to keep that going.
    2. Navigate to the ML-Agents/ml-agents folder and run the following command:
      tensorboard --logdir=summaries
    3. This will run TensorBoard with its own built-in web server. You can load the page using the URL that is shown after you run the previous command.
    4. Enter the URL for TensorBoard as shown in the window, or use localhost:6006 or machinename:6006 in your browser. After an hour or so, you should see something similar to the following:

      The TensorBoard graph window
    5. In the preceding screenshot, you can see each of the various graphs denoting an aspect of training. Understanding each of these graphs is important to understand how your agent is training, so we will break down the output from each section:
    • Environment: This section shows how the agent is performing overall in the environment. A closer look at each of the graphs is shown in the following screenshot with their preferred trend:

    A closer look at the Environment section plots

    • Cumulative Reward: This is the total reward the agent is maximizing. You generally want to see this going up, but there are reasons why it may fall. It is always best to maximize rewards in the range of 1 to -1. If you see rewards outside this range on the graph, you also want to correct this as well.
    • Episode Length: It usually is a better sign if this value decreases. After all, shorter episodes mean more training. However, keep in mind that the episode length could increase out of need, so this one can go either way.
    • Lesson: This represents which lesson the agent is on and is intended for Curriculum Learning.
    • Losses: This section shows graphs that represent the calculated loss or cost of the policy and value. A screenshot of this section is shown next, again with arrows showing the optimum preferences:

      Losses and preferred training direction


    • Policy Loss: This determines how much the policy is changing over time. The policy is the piece that decides the actions, and in general, this graph should be showing a downward trend, indicating that the policy is getting better at making decisions.
    • Value Loss: This is the mean or average loss of the value function. It essentially models how well the agent is predicting the value of its next state. Initially, this value should increase, and then after the reward is stabilized, it should decrease.
    • Policy: PPO uses the concept of a policy rather than a model to determine the quality of actions. The next screenshot shows the policy graphs and their preferred trend:

      Policy graphs and preferred trends

    • Entropy: This represents how much the agent is exploring. You want this value to decrease as the agent learns more about its surroundings and needs to explore less.
    • Learning Rate: Currently, this value is set to decrease linearly over time.
    • Value Estimate: This is the mean or average value visited by all states of the agent. This value should increase in order to represent the growth of the agent's knowledge and then stabilize.

    6. Let the agent run to completion and keep TensorBoard running.

    7. Go back to the Anaconda/Python window that was training the brain and run this command:

    mlagents-learn config/trainer_config.yaml --run-id=secondRun --train

    8. You will again be prompted to press Play in the editor; be sure to do so. Let the agent start the training and run for a few sessions. As you do so, monitor the TensorBoard window and note how the secondRun is shown on the graphs. Feel free to let this agent run to completion as well, but you can stop it now if you want to.

    In previous versions of ML-Agents, you needed to build a Unity executable first as a game-training environment and run that. The external Python brain would still run the same. This method made it very difficult to debug any code issues or problems with your game. All of these difficulties were corrected with the current method.

    Now that we have seen how easy it is to set up and train an agent, we will go through the next section to see how that agent can be run without an external Python brain and run directly in Unity.

    Running an agent

    Using Python to train works well, but it is not something a real game would ever use. Ideally, what we want to be able to do is build a TensorFlow graph and use it in Unity. Fortunately, a library was constructed, called TensorFlowSharp that allows .NET to consume TensorFlow graphs. This allows us to build offline TFModels and later inject them into our game. Unfortunately, we can only use trained models and not train in this manner, at least not yet.

    Let's see how this works using the graph we just trained for the GridWorld environment and use it as an internal brain in Unity. Follow the exercise in the next section to set up and use an internal brain:

    1. Download the TFSharp plugin from here
    2. From the editor menu, select Assets | Import Package | Custom Package... 
    3. Locate the asset package you just downloaded and use the import dialogs to load the plugin into the project.
    4. From the menu, select Edit | Project Settings. This will open the Settings window (new in 2018.3)
    5. Locate under the Player options the Scripting Define Symbols and set the text to ENABLE_TENSORFLOW and enable Allow Unsafe Code, as shown in this screenshot:

      unnamed (2).png
      Setting the ENABLE_TENSORFLOW flag
    6. Locate the GridWorldAcademy object in the Hierarchy window and make sure it is using the Brains | GridWorldLearning. Turn the Control option off under the Brains section of the Grid Academy script.
    7. Locate the GridWorldLearning brain in the Assets/Examples/GridWorld/Brains folder and make sure the Model parameter is set in the Inspector window, as shown in this screenshot:

      Setting the model for the brain to use
    8. The Model should already be set to the GridWorldLearning model. In this example, we are using the TFModel that is shipped with the GridWorld example.
    9. Press Play to run the editor and watch the agent control the cube.

    Right now, we are running the environment with the pre-trained Unity brain. In the next section, we will look at how to use the brain we trained in the previous section.

    Loading a trained brain

    All of the Unity samples come with pre-trained brains you can use to explore the samples. Of course, we want to be able to load our own TF graphs into Unity and run them. Follow the next steps in order to load a trained graph:

    1. Locate the ML-Agents/ml-agents/models/firstRun-0 folder. Inside this folder, you should see a file named GridWorldLearning.bytes. Drag this file into the Unity editor into the Project/Assets/ML-Agents/Examples/GridWorld/TFModels folder, as shown:

      Dragging the bytes graph into Unity
    2. This will import the graph into the Unity project as a resource and rename it GridWorldLearning 1. It does this because the default model already has the same name.
    3. Locate the GridWorldLearning from the brains folder and select it in the Inspector windows and drag the new GridWorldLearning 1 model onto the Model slot under the Brain Parameters:

      unnamed (3).png
      Loading the Graph Model slot in the brain
    4. We won't need to change any other parameters at this point, but pay special attention to how the brain is configured. The defaults will work for now.
    5. Press Play in the Unity editor and watch the agent run through the game successfully.
    6. How long you trained the agent for will really determine how well it plays the game. If you let it complete the training, the agent should be equal to the already trained Unity agent.


    If you found this article interesting, you can explore Hands-On Deep Learning for Games to understand the core concepts of deep learning and deep reinforcement learning by applying them to develop games. Hands-On Deep Learning for Games will give an in-depth view of the potential of deep learning and neural networks in game development.

      Report Article

    User Feedback

    There are no comments to display.

    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

  • Advertisement
  • Advertisement
  • Latest Featured Articles

  • Featured Blogs

  • Advertisement
  • Popular Now

  • Similar Content

    • By horror_man
      Hello, I'm currently searching for additional talented and passionate members for our team that's creating a small horror game.
      About the game: The game would be a small sci-fi/post-apocalyptic survival horror 3D game with FPS (First person shooter) mechanics and an original setting and story based in a book (which I'm writing) scene, where a group of prisoners are left behind in an abandoned underground facility. It would play similar to Dead Space combined with Penumbra and SCP: Secret Laboratory, with the option of playing solo or multiplayer.
      Engine that'd be used to create the game: Unity
      About me: I'm a music composer with more than 4 years of experience and I'm fairly new in this game development world, and I'm currently leading the team that'd be creating this beautiful and horrifying game. I decided that making the book which I'm writing into a game would be really cool, and I got more motivated about doing so some time ago when I got a bunch of expensive Unity assets for a very low price. However, I researched about how to do things right in game development so I reduced the scope of it as much as I could so that's why this game is really based in a scene of the book and not the entire thing. Also I'm currently learning how to use Unity and learning how to program in C#.
      Our team right now consists of: Me (Game Designer, Creator, Music Composer, Writer), 2 3D Modelers, 5 Game Programmers, 1 Sound Effect Designer, 1 3D Animator and 2 2D Artists.
      Who am I looking for: We are looking for a talented and passionated 3D Environment Artist that's experienced in the modeling of closed environments and is familiar with the horror and sci-fi genre.
      Right now the game is in mid development and you can see more information about it and follow our progress in our game jolt page here: https://gamejolt.com/games/devilspunishment/391190 . We expect to finish some sort of prototype in 3 months from now.
      This is a contract rev-share position
      If you are interested in joining, contributing or have questions about the project then let's talk. You can message me in Discord: world_creator#9524
    • By INTwindwolf

      INT is a 3D Sci-fi RPG with a strong emphasis on story, role playing, and innovative RPG features such as randomized companions. The focus is on the journey through a war-torn world with fast-paced combat against hordes of enemies. The player must accomplish quests like a traditional RPG, complete objectives, and meet lively crew members who will aid in the player's survival. Throughout the game you can side and complete missions through criminal cartels, and the two major combatants, the UCE and ACP, of the Interstellar Civil War.
      Please note that all of our current positions are remote work. You will not be required to travel.
      For more information about us, follow the links listed below.
      INT Official website
      IndieDB page
      Also follow social media platforms for the latest news regarding our projects.
      Website Manager
      3D Character Modeller
      3D Environment Modeller
      3D Animator
      Unity Engine Programmer
      The project is marching increasingly closer to be ready for our crowd-funding campaign. Being an Indie team we do not have the creative restrictions often imposed by publishers or other third parties. We are extremely conscientious of our work and continuously uphold a high level of quality throughout our project.
      We are unable to offer wages or per-item payments at this time. However revenue-sharing from crowd-funding is offered to team members who contribute 15-20 hours per week to company projects, as well as maintain constant communication and adhere to deadlines. Your understanding is dearly appreciated.
      TO APPLY
      Please send your Cover Letter, CV, Portfolio (if applicable), and other relevant documents/information to this email: JohnHR@int-game.net
      Thank you for your time! Please feel free to contact me via the email provided should you have any questions or are interested to apply for this position. We look forward to hearing from you!
      John Shen
      HR Lead
      Starboard Games LLC
    • By DreamcityClass
      Dream City: Classified – "Survival Code" (Proof of Concept Framework)
      Episodic, 3D 3rd Person Co-op, Action Adventure Puzzle Plat-former
      Hey everyone I'm looking for a PART TIME/ HOBBYIST PROGRAMMER with an interest in the "Afropunk" style and culture. He would need a understanding of Unity python and marching cubes, or a willingness to learn it. I'm a character artist/ animator dabbling in coding and while starting to develop this game myself I just realized I don't have the time. I need help. I need a team. Hopefully some what passionate, but any little bit will help. You covering the coding would free me up to do more art, animation and character design, (and find more guys).
      The game is a procedural puzzle game, which aims to make all of the 5 (or more) characters on screen use different methods of traversal and fighting styles. The more characters in the party the more complicated the puzzles get. The players need to work together to survive (Dark Souls combat difficulty). There are charts, diagrams, and examples of each with assets I've already created, and a frame work you just need to stitch together. 
      But don't fear, this is a "BY THE EPISODE PROJECT", each of which will be individually Kickstarter'd. Once you sign on we will begin to understand one another's work habits, schedules, etc. while we make this FRAMEWORK. The framework is what we CROWD FUND for support to make the first episode.  
      If there is anyone out there interested in...
      Bringing more diversity to Indie games Working with an unique horror adventure world (World Anvil WIP) Working with an committed artist (who understands coding) and a remote growing team Developing a tight development plan, with passive income contracts: Patreon, product sales, (micros) and of course Revenue sharing Interested in working on a co-op TRINE ~like game mixed with DARK SOULS Willing to grow with this me/(us) as this company takes off.  
      Lets make a dream worth dreaming. 
      (Contact with questions)

    • By RoKabium Games
      Metis enemies – "Creeble" is the only creature on Metis that can crawl on the walls and it spins sticky webs that the Alien can get stuck in.

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!