freshcrate

Search results for "inference"

Clear filters
91 results found (Python)
ContextPilotπŸ“v0.4.1🌿 Growing⭐79

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.

lm-proxyπŸ“v3.2.2🌿 Growing⭐114

OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch). Lightweight, extensible Python/FastAPIβ€”use as library or standalone service.

rasputin-memoryπŸ“v0.9.1🌱 Seedling⭐17

The memory system your AI agent deserves. 4-stage hybrid retrieval β€” Vector + BM25 + Knowledge Graph + Neural Reranker β€” in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb

claude-code-plugins-plus-skillsπŸ“v4.26.0🌳 Mature⭐1,995

423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.

mcp-memory-serviceπŸ“v10.39.1🌳 Mature⭐1,643

Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.

cyllamaπŸ“0.2.11🌱 Seedling⭐22

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

litellmπŸ“v1.83.7-stable🌳 Mature⭐42,951

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi

Constrained-Text-Generation-StudioπŸ“0.0.0🌿 Growing⭐216

Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)

LLM-Agents-Ecosystem-HandbookπŸ“0.0.0🌳 Mature⭐508

One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.

pinecone-python-clientπŸ“v8.1.2🌿 Growing⭐432

The Pinecone Python client

droid-llm-hunterπŸ“v1.0.0🌿 Growing⭐100

Droid LLM Hunter is a tool to scan for vulnerabilities in Android applications using Large Language Models (LLMs).

RAGLightπŸ“3.4.7🌳 Mature⭐658

RAGLight is a modular framework for Retrieval-Augmented Generation (RAG). It makes it easy to plug in different LLMs, embeddings, and vector stores, and now includes seamless MCP integration to connec

RAG-AnythingπŸ“v1.2.10πŸ›οΈ Flagship⭐16,761

"RAG-Anything: All-in-One RAG Framework"

openlitπŸ“openlit-1.18.1🌿 Growing⭐2,358

Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. πŸš€πŸ’» Integrates with 50+ LLM Providers,

GhostDeskπŸ“v7.1.0🌱 Seedling⭐39

Give any AI agent a full desktop β€” it sees the screen, clicks, types, and runs apps like a human. Automate anything with a UI: browsers, legacy software, internal tools. No API needed. One Docker comm

RAGEloπŸ“0.4.0🌿 Growing⭐128

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

JRVSπŸ“0.0.0🌿 Growing⭐236

JRVS AI Agent with JARCORE autonomous coding engine - RAG knowledge base, web scraping, calendar, code generation. Powered by whatever local AI you choose.

auraπŸ“main@2026-04-21🌱 Seedling⭐47

A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules β€” running loca

vllmπŸ“v0.19.1🌿 Growing⭐76,155

A high-throughput and memory-efficient inference and serving engine for LLMs

monocleπŸ“v0.7.8🌿 Growing⭐72

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.

Agentic-RAG-R1πŸ“0.0.0🌿 Growing⭐412

Agentic RAG R1 Framework via Reinforcement Learning

vllm-mlxπŸ“v0.2.8🌿 Growing⭐798

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

arthur-engineπŸ“2.1.529🌿 Growing⭐75

Make AI work for Everyone - Monitoring and governing for your AI/ML

GitoπŸ“v4.0.3🌿 Growing⭐210

An AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesβ€”such as security vulnerabilities, bugs, and maintainability concerns.

DeepCodeπŸ“v1.2.0πŸ›οΈ Flagship⭐15,244

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

mcpπŸ“2026.04.20260414152327🌿 Growing⭐8,740

Official MCP Servers for AWS

Anthropic-Cybersecurity-SkillsπŸ“v1.2.0🌿 Growing⭐5,443

754 structured cybersecurity skills for AI agents Β· Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF Β· agentskills.io standard Β· Works with Claude Code, GitHub Cop

RIGELπŸ“0.0.0🌱 Seedling⭐26

A Multi-Agentic AI Assistant/Builder

server-nexeπŸ“v1.0.0-beta🌱 Seedling⭐9

Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine β€” zero data sent to external services.

LLM-Agent-Paper-dailyπŸ“main@2026-04-21🌱 Seedling⭐20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

awesome-code-agentsπŸ“main@2026-04-20🌿 Growing⭐94

A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β€” they're redefining how software changes the world.

orbitπŸ“v2.6.6🌿 Growing⭐250

One API for 20+ LLM providers, your databases, and your files β€” self-hosted, open-source AI gateway with RAG, voice, and guardrails.

AGI-Alpha-Agent-v0πŸ“main@2026-04-18🌿 Growing⭐283

META‑AGENTIC α‑AGI πŸ‘οΈβœ¨ β€” Mission 🎯 End‑to‑end: Identify πŸ” β†’ Out‑Learn πŸ“š β†’ Out‑Think 🧠 β†’ Out‑Design 🎨 β†’ Out‑Strategise β™ŸοΈ β†’ Out‑Execute ⚑

OmicsClawπŸ“main@2026-04-18🌿 Growing⭐116

Conversational & memory-enabled AI research partner for multi-omics analysis. From biological idea to full research paper.

ag2πŸ“v0.12.0🌿 Growing⭐4,383

AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x

sdk-pythonπŸ“v1.36.0🌿 Growing⭐5,602

A model-driven approach to building AI agents in just a few lines of code.

AReaLπŸ“v1.0.3🌿 Growing⭐5,017

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

llmwareπŸ“v0.4.6🌿 Growing⭐14,857

Unified framework for building enterprise RAG pipelines with small, specialized models

agenticSeekπŸ“main@2026-04-11🌿 Growing⭐25,891

Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. πŸ”” Official updates only via twitter @Martin993

EvoScientistπŸ“v0.0.7🌿 Growing⭐2,731

πŸ”¬ Harness Vibe Research with Self-evolving AI Scientists

UltraRAGπŸ“v0.3.0.2🌿 Growing⭐5,480

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

qwe-qweπŸ“v0.17.6🌱 Seedling⭐35

⚑ Lightweight offline AI agent for local models. No cloud, no API keys β€” just your GPU.

swing-trading-agentπŸ“0.0.0🌱 Seedling⭐7

Multi-agent swing trading system β€” automated screening, research, and execution with backtesting and live trading

0xClawπŸ“0.0.0🌱 Seedling⭐10

πŸ¦€ The first autonomous hackathon agent stop assisting and start competing (πŸ† Hackathon Champion Project).

zettelforgeπŸ“v2.4.0🌱 Seedling⭐25

Agentic memory for CTI in Python β€” STIX knowledge graphs, threat-actor alias resolution, offline-first RAG, MCP server for Claude Code and LangChain agents

My_AIπŸ“v7.2.0🌱 Seedling⭐7

Local-first AI assistant β€” 9 specialized agents (code, web, debug, security…), 10M token vector memory, mobile relay via secure tunnel, real-time web search and document processing. Runs 100% on your

Ultimate-Agent-DirectoryπŸ“0.0.0🌱 Seedling⭐51

πŸ€– The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil

llm_context_benchmarksπŸ“0.0.0🌱 Seedling⭐59

πŸ“Š LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

apiclawπŸ“v2.0.0🌱 Seedling⭐7

The API layer for AI agents. Dashboard + 22K APIs + 18 Direct Call providers. MCP native.

Open-SableπŸ“v1.7.0🌱 Seedling⭐18

Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int

codexlens-searchπŸ“v0.8.0🌱 Seedling⭐44

Lightweight semantic code search engine β€” 2-stage vector + FTS + RRF fusion + MCP server for Claude Code

vllm-cliπŸ“v0.2.5πŸ’€ Dormant⭐491

A command-line interface tool for serving LLM using vLLM.

SomiπŸ“Mineralization🌱 Seedling⭐21

Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features β€” fully self-hosted via Ollama.

daivπŸ“v2.0.0🌱 Seedling⭐18

Your AI-powered SWE teammate, built into your git workflow

reinaπŸ“v1.0.0🌱 Seedling⭐35

Autonomous AI agent for Crustocean, powered by Hermes Agent from Nous Research

LettuceDetectπŸ“0.1.8πŸ’€ Dormant⭐565

Lightweight hallucination detection framework for RAG applications

robotsπŸ“v0.3.8🌱 Seedling⭐44

Control robots and physical hardware with natural language through Strands Agents.

clonemeπŸ“0.0.0πŸ’€ Dormant⭐38

CloneMe is an advanced AI platform that builds your digital twinβ€”an AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th

CompilerπŸ“v2🌱 Seedling⭐20

A tool that compiles messy natural language prompts into a structured intermediate representation (IR) and optionally sends them to LLMs like ChatGPT for cleaner, more reliable responses.

uniAIπŸ“0.0.0🌱 Seedling⭐1

Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate β€” built to help studen

evo-agentsπŸ“master@2026-04-19🌱 Seedling⭐3

Complete Workspace Template for OpenClaw - Full agent lifecycle with unified memory system (Markdown + SQLite), self-evolution, RAG. Not for SubAgent/Skill use.

KAGπŸ“v0.8.0πŸ’€ Dormant⭐8,688

KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge base

multi-agent-orchestration-frameworkπŸ“v0.1.0🌱 Seedling⭐26

Modular multi-agent orchestration framework powered by LangGraph and FastAPI.

Grinta-AgentπŸ“main@2026-04-20🌱 Seedling⭐1

Local-first autonomous coding agent that plans, executes, validates, and finishes software tasks end-to-end.

Government-Citizen-Services-Voice-AgentπŸ“main@2026-04-15🌱 Seedling⭐1

Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services

flashinfer-pythonπŸ“0.6.8.post1🌱 Seedling

FlashInfer: Kernel Library for LLM Serving

torchaoπŸ“0.17.0🌱 Seedling

Package for applying ao techniques to GPU models

azure-ai-inferenceπŸ“1.0.0b9🌱 Seedling

Microsoft Azure AI Inference Client Library for Python

roboflowπŸ“1.3.3🌱 Seedling

Official Python package for working with the Roboflow API

xgrammarπŸ“0.1.33🌱 Seedling

Efficient, Flexible and Portable Structured Generation

ctranslate2πŸ“4.7.1🌱 Seedling

Fast inference engine for Transformer models

cmdstanpyπŸ“1.3.0🌱 Seedling

Python interface to CmdStan

genai-pricesπŸ“0.0.57🌱 Seedling

Calculate prices for calling LLM inference APIs.

timmπŸ“1.0.26🌱 Seedling

PyTorch Image Models

pinecone8.1.2🌱 Seedling

Pinecone client and SDK

qdrant-clientπŸ“1.17.1🌱 Seedling

Client library for the Qdrant vector search engine

tritonclient2.67.0🌱 Seedling

Python client library and utilities for communicating with Triton Inference Server

kerasπŸ“3.14.0🌱 Seedling

Multi-backend Keras

blisπŸ“1.3.3🌱 Seedling

The Blis BLAS-like linear algebra library, as a self-contained C-extension.

cohereπŸ“6.1.0🌱 Seedling

No description

sagemakerπŸ“3.8.0🌱 Seedling

Open source library for training and deploying models on Amazon SageMaker.

sglangπŸ“0.5.10.post1🌱 Seedling

SGLang is a fast serving framework for large language models and vision language models.

astroidπŸ“4.1.2🌱 Seedling

An abstract syntax tree for Python with inference support.

transformersπŸ“5.5.4🌱 Seedling

Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

mypyπŸ“1.20.2🌱 Seedling

Optional static typing for Python

setuptools-scmπŸ“10.0.5🌱 Seedling

the blessed package to manage your versions by scm tags

google-cloud-aiplatformπŸ“1.148.1🌱 Seedling

Vertex AI API client library

medicalAIπŸ“v1.2.9-rc⚰️ Archived⭐21

Medical-AI is a AI framework specifically for Medical Applications https://aibharata.github.io/medicalAI/