LLM RL Environments Lil Course

A little course on Reinforcement Learning Environments for evaluating and training Language Models.

Unlike classic fine-tuning, RL environments let models explore and improve beyond what curated datasets can teach.

In this course, we'll build a Tic Tac Toe environment and use it to transform a Small Language Model (LiquidAI/LFM2-2.6B) into a master player that beats gpt-5-mini.

➡️ Start here: Chapter 1 - Agents, Environments, and LLMs

🎥 Video walkthrough @ AI Engineer

🤗🕹️ Play against Mr. Tic Tac Toe

Who is this course for?

AI Engineers: You are familiar with classic LLM fine-tuning techniques (Supervised Fine-Tuning) but have little to no experience with Reinforcement Learning.
Traditional RL Practitioners: You know how RL works, but you want to learn how to apply it to Language Models.
Curious Tinkerers: You keep hearing about "reasoning models" and RL post-training, and you want to see how it works under the hood.

Chapters

➡️ Start here: Chapter 1 - Agents, Environments, and LLMs

Agents, Environments, and LLMs: mapping Reinforcement Learning concepts to the LLM domain.
Verifiers: an open-source library to build RL environments as software artifacts.
Developing a Tic Tac Toe environment with Verifiers
Evaluating existing models with RL environments
Training preparation and synthetic data generation for Supervised Fine-Tuning
Supervised Fine-Tuning warm-up
Reinforcement Learning training to teach our model Tic Tac Toe
Reinforcement Learning pt.2: towards Tic Tac Toe mastery
What did not work: a Tic Tac Toe Post-Mortem from my failed experiments
What we have learned and the future

Technologies

This course is not affiliated with any of the following projects:

Project	Description
Verifiers	An open-source library by Prime Intellect for building RL environments as software artifacts
Liquid AI models	Small, fast Language Models based on a novel architecture
vLLM	High-throughput and memory-efficient serving engine for LLMs

Course author

Stefano Fiorucci/anakin87

🏗️ AI orchestration by day (Haystack developer)
Small Language Models post-training, RL tinkering by night 🌙

I built this course from hands-on experimentation. If you spot any errors, please open a GitHub issue.

Feel free to follow me on my social profiles: GitHub, LinkedIn, X, Hugging Face.

Version	Changes	Urgency	Date
main@2026-04-17	Latest activity on main branch	High	4/17/2026
0.0.0	No release found — using repo HEAD	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026
main@2026-04-11	Latest activity on main branch	High	4/11/2026