1 package • ⭐ 76,155 total stars
A high-throughput and memory-efficient inference and serving engine for LLMs