Search results for "grpo"
🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models
Agentic RAG R1 Framework via Reinforcement Learning
Autonomous Agents (LLMs) research papers. Updated Daily.
Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
2026 swarm Agent 年,swarm Agent 、Agent team、 ai coding、skill、memory、evolve、agentic RL 等 AI Agent集合
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Open Framework for AI Agents to play Red Alert through Reinforcement Learning
Robust, fast, scalable, and sandboxed open-source online code execution system for humans and AI.
