LLM inference in C/C++
Unified framework for building enterprise RAG pipelines with small, specialized models