The task

Our team from KAIRA was invited to the Europe Embodied hackathon, where we picked the robotic arm challenge. The goal was to build a vision-to-robot pipeline that can balance a ball on top of a plate.

It’s a classic double integrator problem that doesn’t rely on AI.

The pipeline

Experiment setup: RealSense camera over an ArUco-tracked plate held by a Franka Panda arm
  • Perception — RealSense D405 overhead → ArUco plate localization → HSV ball detection → homography to plate coords (u,v), rim normalized to 1.0.
  • State estimation — constant-velocity Kalman filter; outlier gating + velocity for damping + latency lead (readout extrapolated forward along velocity to cancel camera delay).
  • Calibration — Kabsch for camera→robot extrinsic; automatic axis-map (tilt→roll) so no manual sign tuning. (see Challenge 1)
  • Control — PD/PID on the ball error → desired tilt (θx, θy) → absolute pose level_pose · R_tilt streamed to a Cartesian-impedance node. (see Challenge 2)
  • Safety — tilt clamp, slew limit, lost-ball re-level, return-to-level on exit.

Challenge 1 — aligning the coordinate systems

The ball position is measured in the image frame (u,v)and the robot tilts about its end-effector x/y. These differ by an unknown rotation + sign that changes any time the camera moves. Calibrating this map wasn’t easy. We tried doing it manually in the beginning, but switched to measurements later on:

We wrote a calibration script where we put the ball in the middle of the plate and sent a short pulse in one direction to measure where the ball rolls. By doing this multiple times, we get an accurate map.

Challenge 2 — tuning the PID

Ball-on-plate is a classic double integrator. Besides a single lecture in the introductionary robotics class, our team had zero experience with this type of problem. We first started by implementing a PD-Controller and once that looked decent, moved to a PID controller.

Tuning the parameters took up hours and at first we were just randomly tuning the dials. When we started to get more methodical towards the end after an all-nighter, things started to fall into place

Challenge 3 - Communicating with the robot arm

The easiest way to communicate with a Franka robot arm we found was using third party python libraries. What we didn’t know was that these do not allow us to take low-level real time control of our robot. In hindsight, we spent too much time working with these libraries instead of switching to libfranka directly.

This single switch played a big role in getting our latency low enough to be able to balance the ball

Takeaways

  1. In a vision-in-the-loop system, calibration and latency beat controller cleverness.
  2. Next steps: faster camera to cut latency at the source; second view / stereo for depth so the estimator leans less on extrapolation; learned controller to absorb the plate-dependent distortion instead of calibrating it by hand.
  3. Don’t waste all your energy when there’s big events on the finishing day so you don’t sleep through all the networking opportunities

Thanks to RoboTUM and ESRA for hosting and thanks to my team for pulling insane all-nighters.