A little course on Reinforcement Learning Environments for evaluating and training Language Models.
Unlike classic fine-tuning, RL environments let models explore and improve beyond what curated datasets can teach.
In this course, we'll build a Tic Tac Toe environment and use it to transform a Small Language Model
(LiquidAI/LFM2-2.6B) into a master player that beats gpt-5-mini.
➡️ Start here: Chapter 1 - Agents, Environments, and LLMs
🎥 Video walkthrough @ AI Engineer
🤗🕹️ Play against Mr. Tic Tac Toe
- AI Engineers: You are familiar with classic LLM fine-tuning techniques (Supervised Fine-Tuning) but have little to no experience with Reinforcement Learning.
- Traditional RL Practitioners: You know how RL works, but you want to learn how to apply it to Language Models.
- Curious Tinkerers: You keep hearing about "reasoning models" and RL post-training, and you want to see how it works under the hood.
➡️ Start here: Chapter 1 - Agents, Environments, and LLMs
- Agents, Environments, and LLMs: mapping Reinforcement Learning concepts to the LLM domain.
- Verifiers: an open-source library to build RL environments as software artifacts.
- Developing a Tic Tac Toe environment with Verifiers
- Evaluating existing models with RL environments
- Training preparation and synthetic data generation for Supervised Fine-Tuning
- Supervised Fine-Tuning warm-up
- Reinforcement Learning training to teach our model Tic Tac Toe
- Reinforcement Learning pt.2: towards Tic Tac Toe mastery
- What did not work: a Tic Tac Toe Post-Mortem from my failed experiments
- What we have learned and the future
This course is not affiliated with any of the following projects:
| Project | Description |
|---|---|
![]() Verifiers |
An open-source library by Prime Intellect for building RL environments as software artifacts |
![]() Liquid AI models |
Small, fast Language Models based on a novel architecture |
![]() vLLM |
High-throughput and memory-efficient serving engine for LLMs |
Stefano Fiorucci/anakin87
- 🏗️ AI orchestration by day (Haystack developer)
- Small Language Models post-training, RL tinkering by night 🌙
I built this course from hands-on experimentation. If you spot any errors, please open a GitHub issue.
Feel free to follow me on my social profiles: GitHub, LinkedIn, X, Hugging Face.





