MARL used in PredPreyGrass

The Predator, Prey, Grass project(PredPreyGrass) is implemented utilizing multi-agent reinforcement learning (MARL). The cusomized RLLb multi agent environment is trained using a Proximal Policy Optimization (PPO).

Learning agents Predators (red) and Prey (blue) both expend energy moving around, and replenish it by eating. Prey eat Grass (green), and Predators eat Prey if they end up on the same grid cell. In the base case for simplicity, the agents obtain all energy from the eaten Prey of Grass. However, in reality usefull energy is not 100% onverted because the ecological efficiency is only around 10% in most cases. In the configuration files the ecological effciency can be tuned at will.

Predators die of starvation when their energy is zero, Prey die either of starvation or when being eaten by a Predator. The agents asexually reproduce when energy levels of learning agents rise above a certain threshold by eating. Learning agents, learn to execute movement actions based on their partial observations of the environment to maximize cumulative reward. Due to tehe relatvely low ecological effciency founf in nature, ususually food chains are relativley short, not more than five steps ususally. Therefore we believe the implemented PredPreyGrass environment is already a decent absraction from biological reality.

Display 1: Single Agent Reinforcement learning — **Display 1:** Single Agent Reinforcement Learning

Display 2: Multi Agent Reinforcement learning — **Display 2:** Multi Agent Reinforcement Learning