AI Olympics With RealAIGym: Is AI Ready for Athletic Intelligence in the Real World?

Motivation

As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. While software such as ChatGPT has recently revolutionized certain areas of AI, athletic intelligence seems to still be elusive in the AI community. To have better robots in the future which can perform a wide variety of dynamic tasks in uncertain environments the physical or athletic intelligence of robots has to be improved. However, this is quite challenging. In particular, the fields of robotics and reinforcement learning (RL) lack standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, the AI Olympics competition is being held at IJCAI 2023 based on the RealAIGym project. The challenge will involve two stages: simulation and real-robot experiments where teams (and their agents) can compete to get the highest score to win some cool prizes! We invite people from all communities (AI/ML/RL/Optimal Control/Heuristics/etc…) to try this competition and submit their best efforts to try to do some very standard dynamic tasks on standard and simple underactuated robotic systems. The motivation for these tasks is the Acrobat performers and athletes as seen below from the 2016 Olympics!

From Men’s Horizontal Bar Final – Artistic Gymnastics | Rio 2016 Replay

The Challenge

For the challenge, we will use a canonical 2-link robot system with two different configurations. When the actuator in the shoulder joint is active and the elbow is passive, it functions as a Pendubot. And when the shoulder actuator is passive and the elbow is active, it functions as an Acrobot (inspired by the acrobat athlete seen above).

The challenge consists of the following task that has to be carried out first in simulation and then the 4 best teams will be selected to carry out the experiments on real robots: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot. The swing-up is carried out from an initial position which is the robot pointing straight down. The participating teams can decide to either work on the Acrobot swing-up or the Pendubot swing-up or both. For scoring and prizes, Acrobot and Pendubot will be treated as 2 separate tracks i.e. the Acorbot scores/papers will be compared only against other Acrobot teams. For each track, 2 teams will be selected from the simulation stage to participate in the real robot stage. One final winner will be selected for each track.

The performance and robustness of the swing-up and stabilize controllers will be judged based on a custom scoring system. The final score is the average of the performance score and the robustness score for the acrobot/pendubot system. The final scores of the submissions will be added to the RealAIGym leaderboard.

 

Acrobot Swing Up with Threshold Line
Pendubot Swing Up with Threshold Line

The acrobot/pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of dt=0.002s for T=10s. The initial configuration is \mathbf{x}_{0}=(0.0,0.0,0.0,0.0) (hanging down) and the goal is the unstable fixpoint at the upright configuration \mathbf{x}_{g}=(\pi,0.0,0.0,0.0). The upright position is considered to be reached for performance score when above the threshold line and for the robustness score when the distance in the state coordinates are below \mathbf{\epsilon} = (0.1, 0.1, 0.5, 0.5).

Acrobot Robot
Pendubot Robot

The task for the controller is to swing up and balance the acrobot/pendubot and keep the end-effector above the threshold line. The performance score compared the performance of different controllers in simulation assuming all the parameters are already well known.

For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

  • Swingup Success c_{success}: Whether the swing-up was successful, i.e. if the end-effector is above the threshold line at the end of the simulation.
  • Swingup time c_{time}: The time it takes for the acrobot to reach the goal region above the threshold line and stay there. If the end-effector enters the goal region but falls below the line before the simulation time is over the swing-up is not considered successful! The swing-up time is the time when the end-effector enters the goal region and does not leave the region until the end.
  • Energy c_{energy}: The mechanical energy used during the execution.
  • Max Torque c_{\tau, max}: The peak torque that was used during the execution.
  • Integrated Torque c_{\tau,integ}: The time integral over the used torque over the execution duration.
  • Torque Cost c_{\tau, cost}: A quadratic cost on the used torques. (c_{\tau, cost} = \sum \tau^T R \tau with R=1)
  • Torque Smoothness c_{\tau, smooth}: The standard deviation of the changes in the torque signal.
  • Velocity Cost c_{vel, cost}: A quadratic cost on the joint velocities (\dot{\mathbf{q}}) that were reached during the execution.(c_{vel} = \mathbf{\dot{q}}^T \mathbf{Q} \mathbf{\dot{q}} with \mathbf{Q}= identity)

These criteria are used to calculate the overall Real AI Score with the formula:

S = c_{success} \left( \omega_{time}\frac{c_{time}}{n_{time}} +\omega_{energy}\frac{c_{energy}}{n_{energy}} +\omega_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} +\omega_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} +\omega_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} +\omega_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} +\omega_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}}\right)

The weights and normalizations are:

Criterion Normalization Weight
Swingup Time 10.0 0.2
Energy 100.0 0.1
Max Torque 6.0 0.1
Integrated Torque 60.0 0.1
Torque Cost 360 0.1
Torque Smoothness 12.0 0.2
Velocity Cost 1000.0 0.2

The performance leaderboards for the acrobot and pendubot systems can be found here.

The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swing-up and balance the acrobot/pendubot even with these perturbations. For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

  • Model inaccuracies c_{model}: The model parameters, that have been determined with system identification, will never be perfectly accurate. To assess inaccuracies in these parameters, we vary the independent model parameters one at a time in the simulator while using the original model parameters in the controller.
  • Measurement noise c_{vel, noise}: The controllers’ outputs depend on the measured system state. In the case of the QDDs, the online velocity measurements are noisy. Hence, it is important for the transferability that a controller can handle at least this amount of noise in the measured data. The controllers are tested with and without a low-pass noise filter.
  • Torque noise c_{\tau, noise}: Not only the measurements are noisy, but also the torque that the controller outputs is not always exactly the desired value.
  • Torque response c_{\tau, response}: The requested torque of the controller will in general not be constant but change during the execution. The motor, however, is sometimes not able to react immediately to large torque changes and will instead overshoot or undershoot the desired value. This behavior is modeled by applying the torque \tau = \tau_{t-1} + k_{resp} (\tau_{des} – \tau_{t-1}) instead of the desired torque \tau_{des}. Here, \tau_{t-1} is the applied motor torque from the last time step and k_{resp} is the factor that scales the responsiveness. k_{resp}=1 means the torque response is perfect while k_{resp}\neq 1 means the motor is over/undershooting the desired torque.
  • Time delay c_{delay}: When operating on a real system there will always be time delays due to communication and reaction times.

For each criterion, the quantities are varied in N=21 steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swings.

These criteria are used to calculate the overall Real AI Score with the formula:

S = \omega_{model} c_{model} + \omega_{vel, noise} c_{vel, noise} + \omega_{\tau, noise} c_{\tau, noise} + \omega_{\tau, response} c_{\tau, response} + \omega_{delay} c_{delay}

The weights are:

\omega_{model} = \omega_{vel, noise} = \omega_{\tau, noise} = \omega_{\tau, response} = \omega_{delay} = 0.2
 
The robustness leaderboards for the acrobot and pendubot systems can be found here.

Protocol

The two stages of the challenge are as follows:

For the simulation stage of the competition, we use the following repository from the RealAIGym Project: Double Pendulum (https://github.com/dfki-ric-underactuated-lab/double_pendulum). The documentation of the project for installation, double pendulum dynamics, repository structure, hardware, and controllers can be found here (https://dfki-ric-underactuated-lab.github.io/double_pendulum/index.html). Please follow the installation instructions to start developing your controllers.

You have to develop a new controller for the given simulator (plant). The controller can then be tested for the leaderboard using the instructions given for the Acrobot here: Robustness Scoring, and Performance Scoring. Similar Pendubot scoring scripts are available here (performance) and here (robustness).

To develop a new controller, you can use any of the many many examples given in the repo. A good starting point would be to look at the controllers given here. Your controller must inherit from the AbstractController class provided in the repository. See here for the documentation on how to write your controller using the AbstractController class.

Once you’ve developed a new controller and are happy with the results, please follow the following submission guidelines:

  • Create a fork of the repository.
  • Add a Dockerfile to your forked repository that includes all the custom libraries you’ve installed/used that are not part of the double pendulum dependencies. This allows us to use the Dockerfile to recreate your environment with the correct libraries to run the submitted controller. For a tutorial on how to make a Dockerfile, we can recommend the official Docker website.
  • Add your developed controllers to the forked repository. Important: Do not change the plant/dynamics/integrator (This may result in an outright disqualification of the team)!! Remember to use the AbstractController class.
  • Submit the URL of the fork along with a 2-4 page paper about the method developed and the results to ijcai-23@dfki.de with [AI Olympics] in the email subject. Please follow the following guidelines for the paper:
    • Page Limit: 2-4 Pages including references
    • Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
    • Include the tables for performance and robustness metrics against the baseline controllers made available on the RealAIGym leaderboards.
    • Include the robustness bar chart as generated here
    • Use the following template: IJCAI 2023 Formatting Guidelines.

The submitted code and papers will be reviewed and the leaderboard benchmarks will be re-run by us to compute the final scores. The scores as well as the paper reviews will be used to determine the best 4 teams which will carry out the experiments using their controllers on the real systems at IJCAI 2023 AI Olympics!

The real system stage/hardware challenge protocol will be released on 15 June 2023.

Schedule/Important Dates

Keynote Speakers

Prizes

The winners of the competition will have the opportunity to win prizes worth over $2000 combined!

Organizing Committee

Organizing Institutions

Sponsors