Motivation
As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. While software such as ChatGPT has recently revolutionized certain areas of AI, athletic intelligence seems to still be elusive in the AI community. To have better robots in the future which can perform a wide variety of dynamic tasks in uncertain environments the physical or athletic intelligence of robots has to be improved. However, this is quite challenging. In particular, the fields of robotics and reinforcement learning (RL) lack standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, the AI Olympics competition is being held at IJCAI 2023 based on the RealAIGym project. The challenge will involve two stages: simulation and real-robot experiments where teams (and their agents) can compete to get the highest score to win some cool prizes! We invite people from all communities (AI/ML/RL/Optimal Control/Heuristics/etc…) to try this competition and submit their best efforts to try to do some very standard dynamic tasks on standard and simple underactuated robotic systems. The motivation for these tasks is the Acrobat performers and athletes as seen below from the 2016 Olympics!
The Challenge
For the challenge, we will use a canonical 2-link robot system with two different configurations. When the actuator in the shoulder joint is active and the elbow is passive, it functions as a Pendubot. And when the shoulder actuator is passive and the elbow is active, it functions as an Acrobot (inspired by the acrobat athlete seen above).
The challenge consists of the following task that has to be carried out first in simulation and then the 4 best teams will be selected to carry out the experiments on real robots: Swing-up and Stabilize an Underactuated 2-link System Acrobot and/or Pendubot. The swing-up is carried out from an initial position which is the robot pointing straight down. The participating teams can decide to either work on the Acrobot swing-up or the Pendubot swing-up or both. For scoring and prizes, Acrobot and Pendubot will be treated as 2 separate tracks i.e. the Acorbot scores/papers will be compared only against other Acrobot teams. For each track, 2 teams will be selected from the simulation stage to participate in the real robot stage. One final winner will be selected for each track.
The performance and robustness of the swing-up and stabilize controllers will be judged based on a custom scoring system. The final score is the average of the performance score and the robustness score for the acrobot/pendubot system. The final scores of the submissions will be added to the RealAIGym leaderboard.
The acrobot/pendubot is simulated with a Runge-Kutta 4 integrator with a timestep of dt=0.002s for T=10s. The initial configuration is \mathbf{x}_{0}=(0.0,0.0,0.0,0.0) (hanging down) and the goal is the unstable fixpoint at the upright configuration \mathbf{x}_{g}=(\pi,0.0,0.0,0.0). The upright position is considered to be reached for performance score when above the threshold line and for the robustness score when the distance in the state coordinates are below \mathbf{\epsilon} = (0.1, 0.1, 0.5, 0.5).
The task for the controller is to swing up and balance the acrobot/pendubot and keep the end-effector above the threshold line. The performance score compared the performance of different controllers in simulation assuming all the parameters are already well known.
For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:
- Swingup Success c_{success}: Whether the swing-up was successful, i.e. if the end-effector is above the threshold line at the end of the simulation.
- Swingup time c_{time}: The time it takes for the acrobot to reach the goal region above the threshold line and stay there. If the end-effector enters the goal region but falls below the line before the simulation time is over the swing-up is not considered successful! The swing-up time is the time when the end-effector enters the goal region and does not leave the region until the end.
- Energy c_{energy}: The mechanical energy used during the execution.
- Max Torque c_{\tau, max}: The peak torque that was used during the execution.
- Integrated Torque c_{\tau,integ}: The time integral over the used torque over the execution duration.
- Torque Cost c_{\tau, cost}: A quadratic cost on the used torques. (c_{\tau, cost} = \sum \tau^T R \tau with R=1)
- Torque Smoothness c_{\tau, smooth}: The standard deviation of the changes in the torque signal.
- Velocity Cost c_{vel, cost}: A quadratic cost on the joint velocities (\dot{\mathbf{q}}) that were reached during the execution.(c_{vel} = \mathbf{\dot{q}}^T \mathbf{Q} \mathbf{\dot{q}} with \mathbf{Q}= identity)
These criteria are used to calculate the overall Real AI Score with the formula:
S = c_{success} \left( \omega_{time}\frac{c_{time}}{n_{time}} +\omega_{energy}\frac{c_{energy}}{n_{energy}} +\omega_{\tau, max}\frac{c_{\tau, max}}{n_{\tau, max}} +\omega_{\tau, integ}\frac{c_{\tau, integ}}{n_{\tau, integ}} +\omega_{\tau, cost}\frac{c_{\tau, cost}}{n_{\tau, cost}} +\omega_{\tau, smooth}\frac{c_{\tau, smooth}}{n_{\tau, smooth}} +\omega_{vel, cost}\frac{c_{vel, cost}}{n_{vel, cost}}\right)The weights and normalizations are:
Criterion | Normalization | Weight |
---|---|---|
Swingup Time | 10.0 | 0.2 |
Energy | 100.0 | 0.1 |
Max Torque | 6.0 | 0.1 |
Integrated Torque | 60.0 | 0.1 |
Torque Cost | 360 | 0.1 |
Torque Smoothness | 12.0 | 0.2 |
Velocity Cost | 1000.0 | 0.2 |
The performance leaderboards for the acrobot and pendubot systems can be found here.
The robustness leaderboard compares the performance of different control methods by perturbing the simulation e.g. with noise or delay. The task for the controller is to swing-up and balance the acrobot/pendubot even with these perturbations. For the evaluation, multiple criteria are evaluated and weighted to calculate an overall score (Real AI Score). The criteria are:
- Model inaccuracies c_{model}: The model parameters, that have been determined with system identification, will never be perfectly accurate. To assess inaccuracies in these parameters, we vary the independent model parameters one at a time in the simulator while using the original model parameters in the controller.
- Measurement noise c_{vel, noise}: The controllers’ outputs depend on the measured system state. In the case of the QDDs, the online velocity measurements are noisy. Hence, it is important for the transferability that a controller can handle at least this amount of noise in the measured data. The controllers are tested with and without a low-pass noise filter.
- Torque noise c_{\tau, noise}: Not only the measurements are noisy, but also the torque that the controller outputs is not always exactly the desired value.
- Torque response c_{\tau, response}: The requested torque of the controller will in general not be constant but change during the execution. The motor, however, is sometimes not able to react immediately to large torque changes and will instead overshoot or undershoot the desired value. This behavior is modeled by applying the torque \tau = \tau_{t-1} + k_{resp} (\tau_{des} – \tau_{t-1}) instead of the desired torque \tau_{des}. Here, \tau_{t-1} is the applied motor torque from the last time step and k_{resp} is the factor that scales the responsiveness. k_{resp}=1 means the torque response is perfect while k_{resp}\neq 1 means the motor is over/undershooting the desired torque.
- Time delay c_{delay}: When operating on a real system there will always be time delays due to communication and reaction times.
For each criterion, the quantities are varied in N=21 steps (for the model inaccuracies for each independent model parameter) and the score is the percentage of successful swings.
These criteria are used to calculate the overall Real AI Score with the formula:
S = \omega_{model} c_{model} + \omega_{vel, noise} c_{vel, noise} + \omega_{\tau, noise} c_{\tau, noise} + \omega_{\tau, response} c_{\tau, response} + \omega_{delay} c_{delay}The weights are:
\omega_{model} = \omega_{vel, noise} = \omega_{\tau, noise} = \omega_{\tau, response} = \omega_{delay} = 0.2Protocol
The two stages of the challenge are as follows:
For the simulation stage of the competition, we use the following repository from the RealAIGym Project: Double Pendulum (https://github.com/dfki-ric-underactuated-lab/double_pendulum). The documentation of the project for installation, double pendulum dynamics, repository structure, hardware, and controllers can be found here (https://dfki-ric-underactuated-lab.github.io/double_pendulum/index.html). Please follow the installation instructions to start developing your controllers.
You have to develop a new controller for the given simulator (plant). The controller can then be tested for the leaderboard using the instructions given for the Acrobot here: Robustness Scoring, and Performance Scoring. Similar Pendubot scoring scripts are available here (performance) and here (robustness).
To develop a new controller, you can use any of the many many examples given in the repo. A good starting point would be to look at the controllers given here. Your controller must inherit from the AbstractController class provided in the repository. See here for the documentation on how to write your controller using the AbstractController class.
Once you’ve developed a new controller and are happy with the results, please follow the following submission guidelines:
- Create a fork of the repository.
- Add a Dockerfile to your forked repository that includes all the custom libraries you’ve installed/used that are not part of the double pendulum dependencies. This allows us to use the Dockerfile to recreate your environment with the correct libraries to run the submitted controller. For a tutorial on how to make a Dockerfile, we can recommend the official Docker website.
- Add your developed controllers to the forked repository. Important: Do not change the plant/dynamics/integrator (This may result in an outright disqualification of the team)!! Remember to use the AbstractController class.
- Submit the URL of the fork along with a 2-4 page paper about the method developed and the results to ijcai-23@dfki.de with [AI Olympics] in the email subject. Please follow the following guidelines for the paper:
- Page Limit: 2-4 Pages including references
- Include the standard plots for position, velocity, and torque with respect to time in the paper. For an example, see timeseries.png here. These plots are generated after simulation if you use the provided function plot_timeseries(T, X, U).
- Include the tables for performance and robustness metrics against the baseline controllers made available on the RealAIGym leaderboards.
- Include the robustness bar chart as generated here.
- Use the following template: IJCAI 2023 Formatting Guidelines.
The submitted code and papers will be reviewed and the leaderboard benchmarks will be re-run by us to compute the final scores. The scores as well as the paper reviews will be used to determine the best 4 teams which will carry out the experiments using their controllers on the real systems at IJCAI 2023 AI Olympics!
The results are in! The following teams are selected from the Simulation Stage to go on to the Real-Robot Stage:
- Athletic Intelligence Olympics challenge with Model-Based Reinforcement Learning by Alberto Dalla Libera , Niccolo’ Turcato , Giulio Giacomuzzo , Ruggero Carli, and Diego Romeres.
- Solving the swing-up and balance task for the Acrobot and Pendubot with SAC by Chi Zhang and Akhil Sathuluri.
- Swing up for Acrobot and Pendubot using Reinforcement Learning by Raghav Soni, Hemanth Patel, and Krishna Chaitanya.
- Deep Reinforcement Learning for Pendubot by Theo Vincent, and Boris Belousov.
We congratulate all the winning teams! The results of the controllers for the simulation stage can be found in the double pendulum leaderboards: Acrobot Simulation Performance, Acrobot Simulation Robustness, Pendubot Simulation Performance, and Pendubot Simulation Robustness.
We’ve created the following protocol for the remote hardware experiments for the Real-Robot stage of the competition.
Protocol for Scheduling Experiment Slots:
- The scheduling will be handled by a common Google calendar sent to the teams. The calendar is available to the public to as well and can be seen here: https://calendar.google.com/calendar/u/1?cid=NGQxMjg0NmE3MGFlNzQ5YmU1YWE1NWI0NTM3OTI1NDViYzZiMDQ5NmMxMjY3ZDMyZTc3MGY3MTBiZWMzMTFlMEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t
- Each team is allotted a total of 20 hours for experiments. They can create 1-3 hour slots in the shared calendar and invite the following organizers for the meeting slot: Shivesh Kumar, Felix Wiebe, and Shubham Vyas. Once any one of the organizers confirms the meeting, the experiment slot is confirmed.
- From the provided 20 hours maximum time, the last 2 hours are reserved for the final test where the controllers will be evaluated for the hardware leaderboard.
- At the start of the slot, a Microsoft Teams meeting will be started for the live stream along with Q&A for debugging.
- After the end of the slot, teams will be provided up to 1 hour extra for copying the data back to their computers.
Protocol For Running Experiments in the given Slot:
- The Double Pendubum Acrobot/Pedubot is prepared at DFKI RIC, Bremen such that the teams can access the robot via a local control PC running Ubuntu.
- The experiments on the real robot will be carried out remotely using VPN+SSH.
- A video stream via Microsoft Teams call and video file post-experiment runs will be provided.
- First, a VPN must be connected to enter the private network setup for the experiments. For this, each team will be provided with a VPN config file.
- We use/support the wireguard VPN on Ubuntu. For installing the VPN, the teams have to install the following packages via apt: wireguard-tools, wireguard, and resolvconf. This can be done via the command: sudo apt-get install wireguard-tools wireguard resolvconf
- After installing, you can go to the folder containing the provided VPN config file and run the following to start the VPN: wg-quick up wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
- To exit the VPN, run: wg-quick down wg-client.conf (Hint: Sometimes one has to provide the full path of wg-client.conf)
- Once you are within the VPN, you can SSH to the control computer whose IP address will be provided at the start of each experiment session.
- For SSH, a username and password will be provided to each team. For SSH, the following command can be used: ssh <username>@<IP Address>. (Hint: ssh –Y <username>@<IP Address> can be used to view the plots after experiments without copying the data. This can sometimes cause issues though.)
- Once in the control PC via SSH, teams can execute scripts remotely and copy data in/out from the PC. The data can be foundTools such as scp/git are suggested to be used for transferring code/data. (Hint: A tutorial on scp to copy data: https://linuxize.com/post/how-to-use-scp-command-to-securely-transfer-files/)
- The double pendulum repo library along with motor drivers are installed on the control PC at the root. Hence, they should be available for all teams/users.
Some rules and information for the hardware experiments regarding experiment duration and safety limits:
- Each attempt must not exceed a total time duration of 60 seconds (swing-up + stabilization)
- Friction compensation on both joints is allowed in both pendubot and acrobot configurations. The teams are free to choose a friction compensation model of their choice but the utilized torque on the passive joint must not exceed 0.5 Nm.
- The controller must inherit from the AbstractController class provided in the project repository.
- The following hardware restriction must be respected by the controller:
- Control Loop Frequency: 500Hz Max. Usually around 400Hz.
- Torque Limit: 6Nm
- Velocity Limit: 20 rad/s
- Position Limits: +- 360 degrees for both joints
- When the motors exceed these limits, the controller is (usually) automatically switched off and a damper is applied to bring the system to zero velocity. Once zero velocity is achieved, experiments can start again.
- When the motors are initially enabled, they set the “zero position”. This happens every time they are enabled.
- For the hardware experiments, the Acrobot Pendubot system parameters are the same but different from the ones in the simulation. We have done the basic system identification and the teams can re-train their controllers using the following system parameters for the hardware: https://github.com/dfki-ric-underactuated-lab/double_pendulum/blob/main/data/system_identification/identified_parameters/design_C.1/model_1.0/model_parameters.yml
- A person will be watching the experiments and will have access to an Emergency Stop.
The results are in! The hardware challenge results can be seen in the following leaderboards: Acrobot Real System Leaderboard, and Pendubot Real System Leaderboard.. The winning team for both Acrobot and Pendubot setup is the group with Alberto Dalla Libera , Niccolo’ Turcato , Giulio Giacomuzzo , Ruggero Carli, and Diego Romeres (team username turcato-niccolo in the leaderboards)! We congratulate them for winning the competition! We also congratulate Chi Zhang and Akhil Sathuluri (team username chiniklas in the leaderboards) for coming as runner-ups in the competition.
Schedule/Important Dates
The tentative challenge schedule is as follows:
31 March 2023 Website & Competition Release
1 June 2023 15 June 2023 Submission Deadline for Simulation Stage
15 June 2023 22 June 2023 Simulation Stage Winners Announcement
30 July 2023 11 August 2023 End of Remote Hardware Experiments
19-25 August IJCAI 2023
* All the deadlines are at 23:59 Anywhere on Earth
The competition takes place at IJCAI 2023 on 24th August 2023 from 14:00 to 18:30 (local time) and will take place in Room Almaty 6104. The schedule is as follows:
14:00-14:30 Keynote I: Robotics and AI: Future Trends and Impact by Dr. Frank Kirchner
14:30-15:00 Keynote II: Inductive Biases for Robot Reinforcement Learning by Dr. Jan Peters
15:00-15:30 Coffee Break
15:30-16:00 Keynote III: Robot learning from few examples by exploiting the structure and geometry of data by Dr. Sylvain Calinon
16:00-16:30 Keynote IV: On the Geometry of Floating Base and Space Robots by Dr. Andreas Mueller
16:30-17:00 Introduction to RealAIGym Project by Dr. Shivesh Kumar & Shubham Vyas
17:00-17:20 Athletic Intelligence Olympics challenge with Model-Based Reinforcement Learning by Alberto Dalla Libera , Niccolo’ Turcato , Giulio Giacomuzzo , Ruggero Carli, and Diego Romeres.
17:20-17:40 Solving the swing-up and balance task for the Acrobot and Pendubot with SAC by Chi Zhang and Akhil Sathuluri.
17:40-18:00 Deep Reinforcement Learning for Pendubot by Theo Vincent, and Boris Belousov.
Winners will be announced at the closing ceremony on 25th August, 16:00 – 17:00 in Room Kashgar A-H.
Keynote Speakers
Prof. Dr. Dr. h.c. Frank Kirchner is the Executive Director of the German Research Center for Artificial Intelligence, Bremen, and is responsible for the Robotics Innovation Center, one of the largest centers for AI and Robotics in Europe. Founded in 2006 as the DFKI Laboratory, it builds on the basic research of the Robotics Working Group headed by Kirchner at the University of Bremen. There, Kirchner holds the Chair of Robotics in the Department of Mathematics and Computer Science since 2002. He is one of the leading experts in the field of biologically inspired behavior and motion sequences of highly redundant, multifunctional robot systems and machine learning for robot control.
Prof. Dr. Jan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt since 2011, and, at the same time, he is the dept head of the research department on Systems AI for Robot Learning (SAIROL) at the German Research Center for Artificial Intelligence (DFKI GmbH) since 2022. He is also a founding research faculty member of the Hessian Center for Artificial Intelligence. Jan Peters has received the Dick Volz Best 2007 US Ph.D. Thesis Runner-Up Award, the Robotics: Science & Systems – Early Career Spotlight, the INNS Young Investigator Award, and the IEEE Robotics & Automation Society’s Early Career Award as well as numerous best paper awards. In 2015, he received an ERC Starting Grant and in 2019, he was appointed IEEE Fellow, in 2020 ELLIS fellow, and 2021 AAIA fellow.
Prof. Sylvain Calinon is a Senior Research Scientist at the Idiap Research Institute and a Lecturer at the Ecole Polytechnique Fédérale de Lausanne (EPFL). He heads the Robot Learning \& Interaction group at Idiap, with expertise in robot learning, optimal control, and human-robot collaboration. His research focuses on human-centered robotics applications in which robots can acquire new skills from only a few demonstrations and interactions. It requires the development of models that can rely on the structure and geometry of data, the development of optimal control techniques that can exploit task variations and coordination patterns, and the development of intuitive interfaces to acquire meaningful demonstrations. Website: https://calinon.ch
Prof. Andreas Mueller obtained diploma degrees in mathematics (1997), electrical engineering (1998), and mechanical engineering (2000), and a Ph.D. in mechanics (2004). He received his habilitation in mechanics (2008) and is currently professor and director of the Institute of Robotics at Johannes Kepler University, Linz, Austria. His current research interests include holistic modeling, model-based and optimal control of mechatronic systems, redundant robotic systems, parallel kinematic machines, and biomechanics.
Prizes
The winners of the competition will have the opportunity to win prizes worth over $2000 combined!
The prizes are Quasi-Direct Drive (QDD) Motor sets from our sponsors mjbots and Cubemars! The winning teams can then use these to create their own double pendulum setups or other robots!
The prizes will be shipped directly to the desired address given by the winning teams and the prize certificates will be awarded in-person at IJCAI 2023.
Video Proceedings
The video proceedings of the competition event at IJCAI 2023 can be found here: https://youtube.com/playlist?list=PLpz4XBkUoo6x2xlUy5lByGVEVYfQ24g5C&feature=shared