AI Olympics With RealAIGym: Is AI Ready for Athletic Intelligence in the Real World?

Motivation

As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. While software such as ChatGPT has recently revolutionized certain areas of AI, athletic intelligence seems to still be elusive in the AI community. To have better robots in the future which can perform a wide variety of dynamic tasks in uncertain environments the physical or athletic intelligence of robots has to be improved. However, this is quite challenging. In particular, the fields of robotics and reinforcement learning (RL) lack standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, the AI Olympics competition is being held at IJCAI 2023 based on the RealAIGym project. The challenge will involve two stages: simulation and real-robot experiments where teams (and their agents) can compete to get the highest score to win some cool prizes! We invite people from all communities (AI/ML/RL/Optimal Control/Heuristics/etc…) to try this competition and submit their best efforts to try to do some very standard dynamic tasks on standard and simple underactuated robotic systems. The motivation for these tasks is the Acrobat performers and athletes as seen below from the 2016 Olympics!

The Challenge

For the challenge, we will use a canonical 2-link robot system with two different configurations. When the actuator in the shoulder joint is active and the elbow is passive, it functions as a Pendubot. And when the shoulder actuator is passive and the elbow is active, it functions as an Acrobot (inspired by the acrobat athlete seen above).

The challenge consists of the following task that has to be carried out first in simulation and then the 4 best teams will be selected to carry out the experiments on real robots: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot. The swing-up is carried out from an initial position which is the robot pointing straight down. The participating teams can decide to either work on the Acrobot swing-up or the Pendubot swing-up or both. For scoring and prizes, Acrobot and Pendubot will be treated as 2 separate tracks i.e. the Acorbot scores/papers will be compared only against other Acrobot teams. For each track, 2 teams will be selected from the simulation stage to participate in the real robot stage. One final winner will be selected for each track.

The performance and robustness of the swing-up and stabilize controllers will be judged based on a custom scoring system. The final score is the average of the performance score and the robustness score for the acrobot/pendubot system. The final scores of the submissions will be added to the RealAIGym leaderboard.

The acrobot/pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of $dt=0.002s$ for $T=10s$ . The initial configuration is $\mathbf{x}_{0}=(0.0,0.0,0.0,0.0)$ (hanging down) and the goal is the unstable fixpoint at the upright configuration $\mathbf{x}_{g}=(\pi,0.0,0.0,0.0)$ . The upright position is considered to be reached for performance score when above the threshold line and for the robustness score when the distance in the state coordinates are below $\mathbf{\epsilon} = (0.1, 0.1, 0.5, 0.5)$ .

Performance Score

The task for the controller is to swing up and balance the acrobot/pendubot and keep the end-effector above the threshold line. The performance score compared the performance of different controllers in simulation assuming all the parameters are already well known.

For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

Swingup Success $c_{success}$ : Whether the swing-up was successful, i.e. if the end-effector is above the threshold line at the end of the simulation.
Swingup time $c_{time}$ : The time it takes for the acrobot to reach the goal region above the threshold line and stay there. If the end-effector enters the goal region but falls below the line before the simulation time is over the swing-up is not considered successful! The swing-up time is the time when the end-effector enters the goal region and does not leave the region until the end.
Energy $c_{energy}$ : The mechanical energy used during the execution.
Max Torque $c_{\tau, max}$ : The peak torque that was used during the execution.
Integrated Torque $c_{\tau,integ}$ : The time integral over the used torque over the execution duration.
Torque Cost $c_{\tau, cost}$ : A quadratic cost on the used torques. ( $c_{\tau, cost} = \sum \tau^T R \tau$ with $R=1$ )
Torque Smoothness $c_{\tau, smooth}$ : The standard deviation of the changes in the torque signal.
Velocity Cost $c_{vel, cost}$ : A quadratic cost on the joint velocities ( $\dot{\mathbf{q}}$ ) that were reached during the execution.( $c_{vel} = \mathbf{\dot{q}}^T \mathbf{Q} \mathbf{\dot{q}}$ with $\mathbf{Q}=$ identity)

These criteria are used to calculate the overall Real AI Score with the formula:

S = c_{success} \left( \omega_{time}\frac{c_{time}}{n_{time}} +\omega_{energy}\frac{c_{energy}}{n_{energy}} +\omega_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} +\omega_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} +\omega_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} +\omega_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} +\omega_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}}\right)

The weights and normalizations are:

Criterion	Normalization	Weight
Swingup Time	10.0	0.2
Energy	100.0	0.1
Max Torque	6.0	0.1
Integrated Torque	60.0	0.1
Torque Cost	360	0.1
Torque Smoothness	12.0	0.2
Velocity Cost	1000.0	0.2

The performance leaderboards for the acrobot and pendubot systems can be found here.

Robustness Score

The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swing-up and balance the acrobot/pendubot even with these perturbations. For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:

Model inaccuracies $c_{model}$ : The model parameters, that have been determined with system identification, will never be perfectly accurate. To assess inaccuracies in these parameters, we vary the independent model parameters one at a time in the simulator while using the original model parameters in the controller.
Measurement noise $c_{vel, noise}$ : The controllers’ outputs depend on the measured system state. In the case of the QDDs, the online velocity measurements are noisy. Hence, it is important for the transferability that a controller can handle at least this amount of noise in the measured data. The controllers are tested with and without a low-pass noise filter.
Torque noise $c_{\tau, noise}$ : Not only the measurements are noisy, but also the torque that the controller outputs is not always exactly the desired value.
Torque response $c_{\tau, response}$ : The requested torque of the controller will in general not be constant but change during the execution. The motor, however, is sometimes not able to react immediately to large torque changes and will instead overshoot or undershoot the desired value. This behavior is modeled by applying the torque $\tau = \tau_{t-1} + k_{resp} (\tau_{des} – \tau_{t-1})$ instead of the desired torque $\tau_{des}$ . Here, $\tau_{t-1}$ is the applied motor torque from the last time step and $k_{resp}$ is the factor that scales the responsiveness. $k_{resp}=1$ means the torque response is perfect while $k_{resp}\neq 1$ means the motor is over/undershooting the desired torque.
Time delay $c_{delay}$ : When operating on a real system there will always be time delays due to communication and reaction times.

For each criterion, the quantities are varied in $N=21$ steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swings.

These criteria are used to calculate the overall Real AI Score with the formula:

S = \omega_{model} c_{model} + \omega_{vel, noise} c_{vel, noise} + \omega_{\tau, noise} c_{\tau, noise} + \omega_{\tau, response} c_{\tau, response} + \omega_{delay} c_{delay}

The weights are:

\omega_{model} = \omega_{vel, noise} = \omega_{\tau, noise} = \omega_{\tau, response} = \omega_{delay} = 0.2

The robustness leaderboards for the acrobot and pendubot systems can be found here.

Protocol

The two stages of the challenge are as follows:

Simulation Stage

For the simulation stage of the competition, we use the following repository from the RealAIGym Project: Double Pendulum (https://github.com/dfki-ric-underactuated-lab/double_pendulum). The documentation of the project for installation, double pendulum dynamics, repository structure, hardware, and controllers can be found here (https://dfki-ric-underactuated-lab.github.io/double_pendulum/index.html). Please follow the installation instructions to start developing your controllers.

You have to develop a new controller for the given simulator (plant). The controller can then be tested for the leaderboard using the instructions given for the Acrobot here: Robustness Scoring, and Performance Scoring. Similar Pendubot scoring scripts are available here (performance) and here (robustness).

To develop a new controller, you can use any of the many many examples given in the repo. A good starting point would be to look at the controllers given here. Your controller must inherit from the AbstractController class provided in the repository. See here for the documentation on how to write your controller using the AbstractController class.

Once you’ve developed a new controller and are happy with the results, please follow the following submission guidelines:

Create a fork of the repository.
Add a Dockerfile to your forked repository that includes all the custom libraries you’ve installed/used that are not part of the double pendulum dependencies. This allows us to use the Dockerfile to recreate your environment with the correct libraries to run the submitted controller. For a tutorial on how to make a Dockerfile, we can recommend the official Docker website.
Add your developed controllers to the forked repository. Important: Do not change the plant/dynamics/integrator (This may result in an outright disqualification of the team)!! Remember to use the AbstractController class.
Submit the URL of the fork along with a 2-4 page paper about the method developed and the results to ijcai-23@dfki.de with [AI Olympics] in the email subject. Please follow the following guidelines for the paper:
- Page Limit: 2-4 Pages including references
- Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
- Include the tables for performance and robustness metrics against the baseline controllers made available on the RealAIGym leaderboards.
- Include the robustness bar chart as generated here.
- Use the following template: IJCAI 2023 Formatting Guidelines.

The submitted code and papers will be reviewed and the leaderboard benchmarks will be re-run by us to compute the final scores. The scores as well as the paper reviews will be used to determine the best 4 teams which will carry out the experiments using their controllers on the real systems at IJCAI 2023 AI Olympics!

The results are in! The following teams are selected from the Simulation Stage to go on to the Real-Robot Stage:

Athletic Intelligence Olympics challenge with Model-Based Reinforcement Learning by Alberto Dalla Libera , Niccolo’ Turcato , Giulio Giacomuzzo , Ruggero Carli, and Diego Romeres.
Solving the swing-up and balance task for the Acrobot and Pendubot with SAC by Chi Zhang and Akhil Sathuluri.
Swing up for Acrobot and Pendubot using Reinforcement Learning by Raghav Soni, Hemanth Patel, and Krishna Chaitanya.
Deep Reinforcement Learning for Pendubot by Theo Vincent, and Boris Belousov.

We congratulate all the winning teams! The results of the controllers for the simulation stage can be found in the double pendulum leaderboards: Acrobot Simulation Performance, Acrobot Simulation Robustness, Pendubot Simulation Performance, and Pendubot Simulation Robustness.

Real-Robot Stage

We’ve created the following protocol for the remote hardware experiments for the Real-Robot stage of the competition.

Protocol for Scheduling Experiment Slots:

The scheduling will be handled by a common Google calendar sent to the teams. The calendar is available to the public to as well and can be seen here: https://calendar.google.com/calendar/u/1?cid=NGQxMjg0NmE3MGFlNzQ5YmU1YWE1NWI0NTM3OTI1NDViYzZiMDQ5NmMxMjY3ZDMyZTc3MGY3MTBiZWMzMTFlMEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t
Each team is allotted a total of 20 hours for experiments. They can create 1-3 hour slots in the shared calendar and invite the following organizers for the meeting slot: Shivesh Kumar, Felix Wiebe, and Shubham Vyas. Once any one of the organizers confirms the meeting, the experiment slot is confirmed.
From the provided 20 hours maximum time, the last 2 hours are reserved for the final test where the controllers will be evaluated for the hardware leaderboard.
At the start of the slot, a Microsoft Teams meeting will be started for the live stream along with Q&A for debugging.
After the end of the slot, teams will be provided up to 1 hour extra for copying the data back to their computers.

Protocol For Running Experiments in the given Slot:

The Double Pendubum Acrobot/Pedubot is prepared at DFKI RIC, Bremen such that the teams can access the robot via a local control PC running Ubuntu.
The experiments on the real robot will be carried out remotely using VPN+SSH.
A video stream via Microsoft Teams call and video file post-experiment runs will be provided.
First, a VPN must be connected to enter the private network setup for the experiments. For this, each team will be provided with a VPN config file.
We use/support the wireguard VPN on Ubuntu. For installing the VPN, the teams have to install the following packages via apt: wireguard-tools, wireguard, and resolvconf. This can be done via the command: sudo apt-get install wireguard-tools wireguard resolvconf
After installing, you can go to the folder containing the provided VPN config file and run the following to start the VPN: wg-quick up wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
To exit the VPN, run: wg-quick down wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
Once you are within the VPN, you can SSH to the control computer whose IP address will be provided at the start of each experiment session.
For SSH, a username and password will be provided to each team. For SSH, the following command can be used: ssh <username>@<IP Address>. (Hint: ssh –Y <username>@<IP Address> can be used to view the plots after experiments without copying the data. This can sometimes cause issues though.)
Once in the control PC via SSH, teams can execute scripts remotely and copy data in/out from the PC. The data can be foundTools such as scp/git are suggested to be used for transferring code/data. (Hint: A tutorial on scp to copy data: https://linuxize.com/post/how-to-use-scp-command-to-securely-transfer-files/)
The double pendulum repo library along with motor drivers are installed on the control PC at the root. Hence, they should be available for all teams/users.

Some rules and information for the hardware experiments regarding experiment duration and safety limits:

Each attempt must not exceed a total time duration of 60 seconds (swing-up + stabilization)
Friction compensation on both joints is allowed in both pendubot and acrobot configurations. The teams are free to choose a friction compensation model of their choice but the utilized torque on the passive joint must not exceed 0.5 Nm.
The controller must inherit from the AbstractController class provided in the project repository.
The following hardware restriction must be respected by the controller:
Control Loop Frequency: 500Hz Max. Usually around 400Hz.
Torque Limit: 6Nm
Velocity Limit: 20 rad/s
Position Limits: +- 360 degrees for both joints
When the motors exceed these limits, the controller is (usually) automatically switched off and a damper is applied to bring the system to zero velocity. Once zero velocity is achieved, experiments can start again.
When the motors are initially enabled, they set the “zero position”. This happens every time they are enabled.
For the hardware experiments, the Acrobot Pendubot system parameters are the same but different from the ones in the simulation. We have done the basic system identification and the teams can re-train their controllers using the following system parameters for the hardware: https://github.com/dfki-ric-underactuated-lab/double_pendulum/blob/main/data/system_identification/identified_parameters/design_C.1/model_1.0/model_parameters.yml
A person will be watching the experiments and will have access to an Emergency Stop.

The results are in! The hardware challenge results can be seen in the following leaderboards: Acrobot Real System Leaderboard, and Pendubot Real System Leaderboard.. The winning team for both Acrobot and Pendubot setup is the group with Alberto Dalla Libera , Niccolo’ Turcato , Giulio Giacomuzzo , Ruggero Carli, and Diego Romeres (team username turcato-niccolo in the leaderboards)! We congratulate them for winning the competition! We also congratulate Chi Zhang and Akhil Sathuluri (team username chiniklas in the leaderboards) for coming as runner-ups in the competition.

Winning and Runner-up teams with Competition Organizers. From left to right: Chi Zhang, Niccolo’ Turcato, Shivesh Kumar, and Boris Belousov.

Schedule/Important Dates

Important Dates

Competition Day Schedule

Important Dates

Competition Day Schedule

Keynote Speakers

Prof. Frank Kirchner

Prof. Dr. Jan Peters

Prof. Sylvain Calinon

Prof. Andreas Mueller

Prof. Frank Kirchner

Prof. Dr. Jan Peters

Prof. Sylvain Calinon

Prof. Andreas Mueller

Prizes

The winners of the competition will have the opportunity to win prizes worth over $2000 combined!

The prizes are Quasi-Direct Drive (QDD) Motor sets from our sponsors mjbots and Cubemars! The winning teams can then use these to create their own double pendulum setups or other robots!

The prizes will be shipped directly to the desired address given by the winning teams and the prize certificates will be awarded in-person at IJCAI 2023.