9 min readAI/ML

Py-ClashBot: Full Game Automation with Computer Vision

Building an automation bot that handles complete Clash Royale gameplay—from farming gold to competitive matches—using PyTorch and OpenCV.

Beyond Simple Automation

Most game bots handle simple, repetitive tasks—clicking specific coordinates, following fixed patterns, or grinding predictable content. Py-ClashBot was different: it needed to play Clash Royale autonomously, making strategic decisions in real-time against human opponents. This required understanding the complete game state and responding intelligently to unpredictable situations.

Clash Royale is a real-time strategy game where you deploy units (cards) to attack your opponent's towers while defending your own. Every match is unique—different opponents, different decks, different strategies. The bot needed computer vision to understand what's happening on screen and decision-making logic to choose appropriate actions.

System Architecture

Py-ClashBot consists of three main components: a perception system that understands the game state, a decision engine that chooses actions, and an execution layer that carries out those actions through simulated input.

The perception system uses the AI Vision models I developed (detailed in another blog post) to detect units, classify cards, track elixir, and monitor tower health. This provides a structured representation of the game state that the decision engine can reason about.

The decision engine implements gameplay strategies: when to play defensively vs aggressively, which cards to play in response to opponent moves, optimal timing for pushes, and elixir management. These rules were developed through analyzing thousands of matches and understanding high-level strategy.

Computer Vision Foundation

The bot relies heavily on PyTorch models for unit detection and classification, with OpenCV handling lower-level image processing tasks like template matching for UI elements. ONNX conversion was crucial for performance—the bot needed to process frames quickly enough to react in real-time during fast-paced matches.

One challenge was handling visual noise—animations, particle effects, and overlapping units that could confuse the detection models. I tackled this with smart pre-training augmentations (simulated overlays, intelligent masking, simulating VFX like blurs and fireball spells) combined with a massive amount of labeled data. Having extensive labeled class data allowed me to bake the game's random noise directly into the model's training.

Strategic Decision Making

Making intelligent gameplay decisions required encoding strategic knowledge into rules and heuristics. The bot understands concepts like elixir advantage (having more resources than your opponent), card cycling (playing cheap cards to access better ones), and counter-play (deploying units that effectively counter opponent threats).

The decision-making is handled by a single monolithic function that's still in active development. Rather than modular strategy systems, the bot uses one central decision maker that processes the game state and chooses actions—an approach that keeps the logic straightforward while we continue refining its strategic capabilities.

Handling Edge Cases

Real-world game automation involves countless edge cases. Network lag could desynchronize the bot's understanding of game state. UI changes from updates could break detection. Unexpected opponent strategies could confuse the decision logic. Building a robust bot meant anticipating and handling these failure modes.

I implemented safety mechanisms: confidence thresholds for detections, fallback behaviors when uncertain, periodic state validation to catch desynchronization, and graceful error handling that could recover from temporary issues without crashing. The bot needed to be resilient enough to run unsupervised for hours.

Performance and Results

Py-ClashBot successfully automated the entire Clash Royale gameplay loop. It could farm gold consistently, progress accounts through leagues, and even compete at moderate trophy levels. The bot's win rate varied by trophy range—higher against casual players, lower against skilled human opponents with optimized decks.

The open-source nature of the project attracted a community of users who contributed improvements, reported bugs, and shared strategies. This collaborative development significantly improved the bot's capabilities beyond what I could achieve alone.

Technical Challenges

One of the most difficult challenges was timing. Clash Royale is fast-paced, and delays of even a few hundred milliseconds can matter. The bot's entire perception-decision-action loop needed to complete quickly enough to respond to threats in real-time. This required careful optimization at every stage.

Another challenge was adapting to meta changes. As the game balanced cards and introduced new ones, the bot's strategies needed updates. Building a system flexible enough to accommodate new cards and counter-strategies without requiring complete rewrites was essential for long-term viability.

Ethical Considerations

Game automation exists in a gray area ethically and legally. While technically interesting to build, automation bots can negatively impact game communities and violate terms of service. This project was primarily an exploration of computer vision and AI capabilities, demonstrating what's possible with current technology.

The open-source nature of Py-ClashBot also serves an educational purpose—showing how game automation works helps game developers build better anti-cheat systems and helps players understand the techniques that automated opponents might use.

Key Lessons

This project taught me that building AI systems for dynamic, adversarial environments is fundamentally different from static prediction tasks. The bot needed to be robust to visual noise, adaptive to changing strategies, and fast enough for real-time interaction—requirements that pushed my understanding of computer vision and software engineering.

Building ML vision models for something as visually complex as Clash Royale—with 107+ units and 150+ cards, all with unique visual signatures and animations—directly parallels real-world CV applications like autonomous driving and industrial automation. The techniques for handling multiple classes, visual noise, and real-time detection in a game environment are the same ones used in high-value commercial applications. Getting this to work here demonstrates technology that's easily portable to serious, high-ticket problems.

I also learned the importance of modular architecture. Keeping perception, decision-making, and execution separate made the system easier to debug, test, and improve. When one component had issues, I could isolate and fix it without touching the others.

PyTorchOpenCVONNXComputer VisionGame AutomationPythonAI