12 min readAI/ML

1 Month Challenge: Building an AI Vision System for Clash Royale

A deep dive into creating a real-time AI system that detects units, classifies card types, estimates tower health, and recognizes cards from Clash Royale gameplay footage.

The Challenge

I set myself an ambitious goal: build a complete AI vision system for Clash Royale in just one month. The system needed to handle multiple computer vision tasks simultaneously—detect units on the battlefield, classify what type of unit each one is, estimate tower health percentages, and recognize which cards are in hand. All of this had to work in real-time on gameplay footage.

Clash Royale is a fast-paced mobile strategy game where unit positioning and timing are crucial. For an AI system to be useful for analysis or automation, it needs to understand the complete game state at any moment—what units are deployed, where they are, what condition your towers are in, and what cards you can play.

Technical Architecture

I chose YOLO (You Only Look Once) for object detection because of its excellent balance between speed and accuracy. Real-time performance was non-negotiable—the system needed to process frames fast enough to keep up with live gameplay without introducing noticeable lag.

For unit classification, I trained a custom ResNet-18 model on thousands of annotated Clash Royale units. This smaller ResNet variant was chosen specifically for its efficiency—it's lightweight enough to run alongside YOLO without overwhelming the GPU, while still providing excellent classification accuracy.

The tower health estimation required a different approach. Instead of traditional object detection, I used template matching combined with color analysis to extract health bar values. Card recognition used a combination of feature matching and a small classification network trained specifically on card images.

Data Collection Challenge

One of the biggest challenges was getting enough training data. Computer vision models are data-hungry, and I needed thousands of annotated frames showing different units, positions, lighting conditions, and game scenarios.

This is where RoyaleTrainer.com came in (which became its own separate project). I built a crowdsourcing platform where the Clash Royale community could help annotate training data. Over 500 contributors provided 82,000+ annotations, which became the foundation for training these models.

Optimization for Real-Time Performance

Getting the system to run in real-time required extensive optimization. I converted all models to ONNX format, which provides significant performance improvements over native PyTorch inference. Model quantization reduced the precision from FP32 to FP16 without meaningful accuracy loss, effectively doubling inference speed.

I also implemented smart batching and frame skipping strategies. Not every task needed to run on every frame—card detection could run every few frames since cards don't change that frequently, while unit detection needed higher frequency updates to track fast-moving troops.

Results and Accuracy

The final system achieved impressive results across all tasks. Unit detection reached 94% mAP (mean average precision), unit classification hit 97% accuracy, tower health estimation was accurate within ±3%, and card recognition achieved 99% accuracy.

More importantly, the system maintained real-time performance, processing frames at 30+ FPS on a mid-range GPU. This made it practical for actual use rather than just a research project.

Key Takeaways

Building this system in one month taught me the importance of scoping projects realistically and making pragmatic technical choices. I couldn't build perfect models for everything, so I focused on getting each component good enough and then optimizing the integration between components.

The community aspect through RoyaleTrainer was crucial. Building tools that enable others to help you achieve your goals is a powerful force multiplier. Without those 500+ contributors, collecting that much training data myself would have taken months.

Finally, this project reinforced how important performance optimization is for computer vision systems. An accurate model that's too slow to use is less valuable than a slightly less accurate model that runs in real-time. Understanding this trade-off and making smart optimization choices is what separates research projects from production systems.

PyTorchYOLOResNet-18Computer VisionGame AIPython