# jdatamunch-mcp

> Token-efficient MCP server for tabular data retrieval. Index CSV/Excel files, query rows, aggregate — 99%+ token savings vs raw file reads.

- **URL**: https://www.freshcrate.ai/projects/jdatamunch-mcp
- **Author**: jgravelle
- **Category**: MCP Servers
- **Latest version**: `v1.13.0` (2026-05-14)
- **License**: NOASSERTION
- **Source**: https://github.com/jgravelle/jdatamunch-mcp
- **Language**: Python
- **GitHub**: 36 stars, 11 forks
- **Registry**: github
- **Tags**: `python`

## Description

Token-efficient MCP server for tabular data retrieval. Index CSV/Excel files, query rows, aggregate — 99%+ token savings vs raw file reads.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v1.13.0` | 2026-05-14 | High | Reported by @AlexJ-StL on jcm#297: Google Antigravity caps MCP servers at 50 tools, but the full munch suite ships 81 + 60 + 35 = 176. Sibling-parity port of the same knobs jdoc gained in v1.64.0.  ## New config knobs  - `JDATAMUNCH_TOOL_PROFILE=core\|standard\|full` (default `full`).   - `core` (10 tools): index + describe + row-retrieval essentials.   - `standard` (~30 tools): core + analysis tools.   - `full` (35 tools): everything, current behavior. - `JDATAMUNCH_DISABLED_TOOLS=tool1,tool2,... |
| `v1.5.0` | 2026-05-12 | High | ## v1.5.0 — Cell-level redaction on tool output  Tabular tools now scrub PII and credentials from cells before returning them to MCP clients. CSV / Excel / Parquet / JSONL data routinely carry emails, SSNs, credit-card numbers, API keys, and PEM bodies in raw columns — those cells would otherwise flow straight into LLM context where they may be cached, logged, or reflected to a tool downstream.  The default policy is **ON**; callers opt out per call with `redact=False`.  ### New - **`src/jdatamu |
| `v1.4.0` | 2026-04-28 | High | Closes the full Phase C list in todo.md. **317 tests passing.** Fully backward-compatible.  With 1.4.0, the entire Phase A + B + C roadmap is shipped.  ## Aggregation - **`aggregate(approximate=True)`** (C1) — approximate-mode path:   - `count_distinct` → HyperLogLog (~2% standard error)   - `median` → t-digest (~1% accuracy)   - `sum` / `avg` → sampled estimator with 95% confidence-interval half-width   Whole-dataset only (no group_by/having/order_by). For very large joined datasets where exact |
| `v0.8.4` | 2026-04-15 | High | ### Documentation - Added "Works with" section to README with Hermes Agent integration config - Submitted optional skill PR to [NousResearch/hermes-agent#10413](https://github.com/NousResearch/hermes-agent/pull/10413) |
| `v0.8.3` | 2026-04-09 | High | ### New features  - **`meta_fields` support** — control which `_meta` fields appear in tool responses via `JDATAMUNCH_META_FIELDS` env var. Matches jcodemunch-mcp's `meta_fields` affordance.   - Unset / `[]` = strip `_meta` entirely (default, maximum token savings)   - `null` / `all` / `*` = include all fields   - Comma-separated list = include only those fields (e.g. `timing_ms,powered_by`)  ### Tests  - 11 new tests (228 total)  `pip install --upgrade jdatamunch-mcp` |
| `v0.8.2` | 2026-04-08 | Medium | ## Documentation  - **README.md rewrite** — all 18 tools organized by category (indexing, exploration, querying, analysis, management), file format table, semantic search, cross-dataset joins, correlations, NL summaries, data quality tools, built-in guardrails, full configuration reference - **QUICKSTART.md** — new beginner-friendly guide: install, connect, index, query in three steps - **USER-MANUAL.md** — comprehensive manual for non-developer users (analysts, finance, ops) covering all tools |
| `v0.8.1` | 2026-04-08 | Medium | ## New features  - **`list_repos()` tool** — list GitHub repositories indexed via `index_repo`. Shows repo name, HEAD SHA, dataset count, total rows, total size, and dataset names.  ## Tests  - 8 new tests (217 total, 10 skipped for optional deps)  **Full Changelog**: https://github.com/jgravelle/jdatamunch-mcp/compare/v0.8.0...v0.8.1 |
| `v0.8.0` | 2026-04-08 | Medium | ## New features  - **Semantic / embedding search** — `search_data` now supports `semantic=true` for embedding-based column search. Queries like "where did the crime happen" match `AREA NAME` even without keyword overlap. Three new parameters: `semantic`, `semantic_weight`, `semantic_only`. - **`embed_dataset(dataset)` tool** — precompute column embeddings. Optional warm-up so first semantic query returns immediately. - **Three embedding providers** (first configured wins):   - sentence-transform |
| `v0.7.1` | 2026-04-08 | Medium | ## New features  - **`delete_dataset(dataset)` tool** — remove an indexed dataset and its SQLite store, freeing disk space. Returns rows/columns removed and bytes freed.  ## Bug fixes  - Fixed unclosed SQLite connections in `create_table` and `create_indexes` that caused `PermissionError` on Windows when deleting datasets (WAL file locks)  ## Tests  - 26 new tests (177 total, 10 skipped for optional deps)  ## Install  ``` pip install --upgrade jdatamunch-mcp ``` |
| `v0.7.0` | 2026-04-08 | Medium | ## New features  - **`join_datasets(dataset_a, dataset_b, join_column_a, join_column_b)` tool** — SQL JOIN across two indexed datasets via SQLite `ATTACH DATABASE`. Supports `inner`, `left`, `right`, and `cross` join types. Column projection (`columns_a`/`columns_b`), per-side filters (`filters_a`/`filters_b`), ordering, and pagination. Handles column-name collisions with `__b` suffix. Row limit capped at 500, 30 columns per side.  ## Tests  - 20 new tests (171 total, 10 skipped for optional dep |

## Citation

- HTML: https://www.freshcrate.ai/projects/jdatamunch-mcp
- Markdown: https://www.freshcrate.ai/projects/jdatamunch-mcp.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/jdatamunch-mcp/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._
