freshcrate
Skin:/
Home > MCP Servers > jdatamunch-mcp

jdatamunch-mcp

Token-efficient MCP server for tabular data retrieval. Index CSV/Excel files, query rows, aggregate โ€” 99%+ token savings vs raw file reads.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

Token-efficient MCP server for tabular data retrieval. Index CSV/Excel files, query rows, aggregate โ€” 99%+ token savings vs raw file reads.

README

Quickstart - https://github.com/jgravelle/jdatamunch-mcp/blob/main/QUICKSTART.md

FREE FOR PERSONAL USE

Use it to make money, and Uncle J. gets a taste. Fair enough? details


Documentation

Doc What it covers
QUICKSTART.md Zero-to-indexed in three steps
USER-MANUAL.md Full guide for analysts, ops, and non-developers

Cut spreadsheet token usage by 99.997%

Most AI agents explore tabular data the expensive way:

dump the whole file into the prompt โ†’ skim a million irrelevant rows โ†’ repeat.

That is not "a little inefficient." That is a token incinerator.

A 255 MB CSV file with 1 million rows costs 111 million tokens if you paste it raw. A single describe_dataset call answers the same orientation question in 3,849 tokens.

That is a 25,333ร— reduction โ€” measured, not estimated, on a real 1M-row public dataset.

jDataMunch indexes the file once and lets agents retrieve only the exact data they need: column profiles, filtered rows, server-side aggregations, cross-dataset joins, and semantic search โ€” with SQL precision.

Benchmark: LAPD crime records โ€” 1,004,894 rows, 28 columns, 255 MB Baseline (raw file): 111,028,360 tokens ย |ย  jDataMunch: ~3,849 tokens ย |ย  25,333ร— reduction Methodology & harness ยท Full results

Task Traditional approach With jDataMunch
Understand a dataset Paste entire CSV describe_dataset โ†’ column names, types, cardinality, samples
Find relevant columns Read every row search_data โ†’ column-level results with IDs
Answer a filtered question Load millions of rows get_rows with structured filters โ†’ only matching rows
Compute a group-by Return all data aggregate โ†’ server-side SQL, one result set
Compare two datasets Load both entirely join_datasets โ†’ SQL JOIN across indexed stores
Find column relationships Export to spreadsheet get_correlations โ†’ pairwise Pearson correlations

Index once. Query cheaply. Keep moving. Precision retrieval beats brute-force context.


jDataMunch MCP

Structured tabular data retrieval for AI agents

License MCP Local-first SQLite jMRI PyPI version PyPI - Python Version

Commercial licenses

jDataMunch-MCP is free for non-commercial use.

Commercial use requires a paid license.

jDataMunch-only licenses

Want the full jMunch suite?

Stop paying your model to read the whole damn spreadsheet.

jDataMunch turns tabular data exploration into structured retrieval.

Instead of forcing an agent to load an entire CSV, scan millions of rows, and burn through context just to find the right column name, jDataMunch lets it navigate by what the data is and retrieve only what matters.

That means:

  • 25,333ร— lower data-reading token usage on a 1M-row CSV (measured)
  • less irrelevant context polluting the prompt
  • faster dataset orientation โ€” one call tells you everything about the schema
  • accurate filtered queries โ€” the agent asks for Hollywood assaults, it gets Hollywood assaults
  • server-side aggregations โ€” GROUP BY runs in SQLite, not inside the context window
  • cross-dataset joins โ€” combine two indexed files in a single SQL query
  • semantic search โ€” find columns by meaning, not just keyword match
  • natural-language summaries โ€” auto-generated descriptions of every column and dataset

It indexes your files once using a streaming parser and SQLite, stores column profiles and row data with proper type affinity, and retrieves exactly what the agent asked for instead of re-loading the entire file on every question.


Supported file formats

Format Extensions Install extra
CSV / TSV .csv, .tsv โ€” (built-in)
Excel .xlsx, .xls pip install "jdatamunch-mcp[excel]"
Parquet .parquet pip install "jdatamunch-mcp[parquet]"
JSONL / NDJSON .jsonl, .ndjson โ€” (built-in)

Why agents need this

Most agents still handle spreadsheets like someone who prints the entire internet before reading one article:

  • paste the whole CSV to answer a narrow question
  • re-load the same file repeatedly across tool calls
  • consume column headers, empty cells, malformed rows, and irrelevant records
  • burn context window on data that was never part of the question

jDataMunch fixes that by giving them a structured way to:

  • describe a dataset's schema before touching any row data
  • search for the specific column that holds the answer โ€” by keyword or meaning
  • retrieve only the rows that match the filter
  • run aggregations server-side and get back a single result set
  • join two datasets without loading either into the prompt
  • orient themselves with samples before committing to a full query
  • detect data-quality issues and column correlations automatically

Agents do not need bigger context windows.

They need better aim.


What you get

Column-level retrieval

Understand a dataset's full schema โ€” types, cardinality, null rates, value distributions, samples, and natural-language summaries โ€” in a single sub-10ms call. No rows loaded.

Filtered row retrieval

Structured filters with 10 operators (eq, neq, gt, gte, lt, lte, contains, in, is_null, between). All parameterized SQL โ€” no injection surface. Hard cap of 500 rows per call to protect context budgets.

Server-side aggregations

GROUP BY with count, sum, avg, min, max, count_distinct, median. The computation stays in SQLite. One compact result set comes back instead of the data the model would aggregate itself.

Smart column search

search_data searches column names, value indexes, and AI summaries simultaneously. Ask for "weapon type" and get Weapon Used Cd back. Ask for "Hollywood" and get the column whose values contain it.

Semantic search (v0.8+): Enable semantic=true for embedding-based search. Queries like "where did the crime happen" match AREA NAME even without keyword overlap. Supports local embeddings (sentence-transformers), Gemini, or OpenAI as providers.

Cross-dataset joins

join_datasets combines two indexed datasets via SQL ATTACH DATABASE โ€” inner, left, right, or cross joins. Column projection, per-side filters, ordering, and pagination. No data leaves SQLite.

Correlation discovery

get_correlations computes pairwise Pearson correlations between all numeric columns. Discover hidden relationships without manual exploration.

Natural-language summaries

Every indexed dataset gets auto-generated summaries describing data shape, column types, ranges, cardinality, quality issues, and temporal spans โ€” no external API calls needed.

Data quality triage

get_data_hotspots ranks columns by composite risk: null rate, cardinality anomalies, and numeric outlier spread. get_schema_drift compares schema between two dataset versions and classifies changes as identical, additive, or breaking.

Token savings telemetry

Every call reports tokens_saved and cost_avoided estimates. get_session_stats shows your cumulative savings across the session, with per-model cost breakdowns. Lifetime stats persist across sessions.

GitHub repository indexing

index_repo discovers and indexes data files directly from a GitHub repository โ€” CSV, Excel, Parquet, and JSONL. Incremental by HEAD SHA. Supports private repos via GITHUB_TOKEN.

Local-first speed

Indexes are stored at ~/.data-index/ by default. No cloud. No API keys required for core functionality.

Built-in guardrails

  • Token budget enforcement โ€” every response is capped at a configurable token limit (default 8,000)
  • Anti-loop detection โ€” warns when an agent is paginating row-by-row in a tight loop
  • Wide-table pagination โ€” describe_dataset auto-paginates at 60 columns
  • Hard caps on all parameters to prevent runaway queries

How it works

jDataMunch parses local CSV, Excel, Parquet, and JSONL files using a streaming, single-pass pipeline:

CSV/Excel/Parquet/JSONL file
  โ†’ Streaming parser (never loads full file into memory)
  โ†’ Column profiler (type inference, cardinality, min/max/mean/median, value indexes)
  โ†’ Natural-language summary generator (dataset + per-column descriptions)
  โ†’ SQLite writer (10,000-row batches, WAL mode, indexes on low-cardinality columns)
  โ†’ index.json (column profiles, stats, summaries, file hash for incremental detection)

When an agent queries:

describe_dataset  โ†’  reads index.json in memory (< 10ms)
get_rows          โ†’  parameterized SQL on data.sqlite (< 100ms on indexed columns)
aggregate         โ†’  GROUP BY SQL on data.sqlite (< 200ms for simple group-by)
search_data       โ†’  scans column profiles in memory (< 50ms)
join_datasets     โ†’  ATTACH DATABASE + cross-store SQL (< 300ms)

No raw file is ever re-read after the initial index. The SQLite database serves all row-level queries.

For a 255 MB, 1,004,894-row CSV (measured on real data):

  • Index time: ~43 seconds (one-time)
  • describe_dataset: 35 ms, 3,849 tokens vs 111,028,360 tokens raw โ€” 25,333ร—
  • describe_column (single column deep-dive): 22โ€“33 ms, ~600 tokens
  • get_rows (indexed filter): < 100 ms
  • Peak indexing memory: < 500 MB

Start fast

1. Install it

pip install jdatamunch-mcp

For additional format support:

pip install "jdatamunch-mcp[excel]"       # Excel (.xlsx, .xls)
pip install "jdatamunch-mcp[parquet]"     # Parquet
pip install "jdatamunch-mcp[semantic]"    # Semantic search (local embeddings)
pip install "jdatamunch-mcp[all]"         # Everything

2. Add it to your MCP client

Claude Code (one command)

claude mcp add jdatamunch uvx jdatamunch-mcp

Restart Claude Code. Confirm with /mcp.

Claude Desktop

Add to your config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "jdatamunch": {
      "command": "uvx",
      "args": ["jdatamunch-mcp"]
    }
  }
}

OpenClaw

Option A โ€” CLI:

openclaw mcp set jdatamunch '{"command":"uvx","args":["jdatamunch-mcp"]}'

Option B โ€” Edit ~/.openclaw/openclaw.json:

{
  "mcpServers": {
    "jdatamunch": {
      "command": "uvx",
      "args": ["jdatamunch-mcp"],
      "transport": "stdio"
    }
  }
}

Restart the gateway: openclaw gateway restart. Verify: openclaw mcp list.

Other clients (Cursor, Windsurf, Roo, etc.)

Any MCP-compatible client accepts the same JSON block in its MCP config file.

3. Index a file and start querying

index_local(path="/path/to/data.csv", name="my-dataset")
describe_dataset(dataset="my-dataset")
get_rows(dataset="my-dataset", filters=[{"column": "City", "op": "eq", "value": "Los Angeles"}], limit=10)

4. Tell your agent to actually use it

Installing jDataMunch makes the tools available. It does not guarantee the agent will stop pasting entire CSVs into prompts unless you tell it to use structured retrieval first.

Claude Code / Claude Desktop

Add this to your CLAUDE.md (global or project-level):

## Data Exploration Policy
Use jdatamunch-mcp for tabular data whenever available.
Always call describe_dataset first to understand the schema.
Use get_rows with filters rather than loading raw files.
Use aggregate for any group-by or summary questions.

OpenClaw

Add the same policy to your agent's system prompt file (e.g. ~/.openclaw/agents/analyst.md), then reference it in ~/.openclaw/openclaw.json:

{
  "agents": {
    "named": {
      "analyst": {
        "systemPromptFile": "~/.openclaw/agents/analyst.md"
      }
    }
  }
}

Check your token savings

Ask your agent: "How many tokens has jDataMunch saved me?"

The agent will call get_session_stats, which returns session and lifetime token savings with per-model cost breakdowns. Lifetime stats persist to ~/.data-index/session_stats.json across sessions.


Tools

Indexing

Tool What it does
index_local Index a local CSV, Excel, Parquet, or JSONL file. Profiles columns, generates NL summaries, loads rows into SQLite. Incremental by default (skips if file unchanged).
index_repo Index data files from a GitHub repository. Discovers CSV, Excel, Parquet, and JSONL files via the Trees API and indexes each. Incremental by HEAD SHA. Max 50 MB/file, 20 files/repo.

Exploration

Tool What it does
list_datasets List all indexed datasets with row counts, column counts, and file sizes.
list_repos List GitHub repositories indexed via index_repo. Shows repo name, HEAD SHA, dataset count, total rows.
describe_dataset Full schema profile: every column's name, type, cardinality, null%, sample values, and NL summary. Primary orientation tool. Auto-paginates at 60 columns.
describe_column Deep profile of one column: full value distribution, histogram bins, temporal range, NL summary.
search_data Search column names and values by keyword or semantically. Returns column IDs โ€” tells the agent where to look, not the data. Supports hybrid keyword + embedding search.
sample_rows Head, tail, or random sample. Good for first-look at an unfamiliar dataset.

Querying

Tool What it does
get_rows Filtered row retrieval with 10 operators. Parameterized SQL. 500-row hard cap. Column projection to reduce tokens.
aggregate Server-side GROUP BY: count, sum, avg, min, max, count_distinct, median. Pre-filter support. 1,000-group cap.
join_datasets SQL JOIN across two indexed datasets. Supports inner, left, right, cross. Per-side filters and column projection.

Analysis

Tool What it does
get_correlations Pairwise Pearson correlations between numeric columns. Sorted by strength, with labels and pair counts.
get_schema_drift Compare schema between two datasets. Detects added/removed columns, type changes, null-rate shifts.
get_data_hotspots Rank columns by data-quality risk: null rate, cardinality anomalies, numeric outlier spread.

Management

Tool What it does
summarize_dataset Regenerate NL summaries for an already-indexed dataset without re-parsing the source file.
embed_dataset Precompute column embeddings for semantic search. Optional warm-up to eliminate first-query latency.
delete_dataset Remove an indexed dataset and its SQLite store. Irreversible.
get_session_stats Cumulative token savings and cost avoided across the session. Lifetime stats persist across sessions.

Filter operators

get_rows, aggregate, and join_datasets accept structured filters:

{"column": "AREA NAME",    "op": "eq",      "value": "Hollywood"}
{"column": "Vict Age",     "op": "between", "value": [25, 35]}
{"column": "Crm Cd Desc",  "op": "contains","value": "ASSAULT"}
{"column": "Weapon Used Cd","op": "is_null","value": true}
{"column": "AREA",         "op": "in",      "value": [1, 2, 7]}
Operator Meaning
eq equals
neq not equals
gt, gte greater than (or equal)
lt, lte less than (or equal)
contains case-insensitive substring
in value in list
is_null null / not null check
between inclusive range [min, max]

Multiple filters are ANDed. No raw SQL accepted โ€” injection surface is zero.


Configuration

Variable Default Purpose
DATA_INDEX_PATH ~/.data-index/ Index storage location
JDATAMUNCH_MAX_ROWS 5,000,000 Row cap for indexing
JDATAMUNCH_MAX_RESPONSE_TOKENS 8,000 Token budget cap per response
JDATAMUNCH_SHARE_SAVINGS 1 Set 0 to disable anonymous token savings telemetry
ANTHROPIC_API_KEY โ€” AI column summaries via Claude
GOOGLE_API_KEY โ€” AI column summaries via Gemini
GITHUB_TOKEN โ€” Private repo access for index_repo
JDATAMUNCH_EMBED_MODEL โ€” Local sentence-transformers model for semantic search
GOOGLE_EMBED_MODEL โ€” Gemini embedding model for semantic search
OPENAI_API_KEY โ€” OpenAI embeddings for semantic search
OPENAI_EMBED_MODEL โ€” OpenAI embedding model for semantic search

When does it help?

Scenario Without jDataMunch With jDataMunch Measured savings
Orient on a 255 MB CSV Paste raw file โ†’ 111M tokens describe_dataset โ†’ 3,849 tokens 25,333ร—
Schema + column deep-dive Same 111M tokens describe_dataset + describe_column โ†’ ~4,400 tokens ~25,000ร—
Find the crime-type column Scan headers manually search_data("crime type") โ†’ column ID structural
Find column by meaning No way to search semantically search_data("where did it happen", semantic=true) โ†’ AREA NAME structural
Get Hollywood assault rows Load all 1M rows get_rows with 2 filters โ†’ matching rows only ~99%+
Crime count by area Return all rows, aggregate in LLM aggregate(group_by=["AREA NAME"]) โ†’ 21 rows ~99.9%
Understand weapon nulls Load column, count manually describe_column("Weapon Used Cd") โ†’ null_pct: 64.2% ~99.9%
Compare two dataset versions Load both files get_schema_drift(a, b) โ†’ breaking/additive assessment structural
Find correlated columns Export, pivot, eyeball get_correlations โ†’ ranked pairs with strength labels structural
Combine two datasets Load both into prompt join_datasets โ†’ SQL JOIN, only matching rows ~99%+
Re-query an unchanged file Re-load file every time Hash check โ†’ instant skip if unchanged 100% of re-read cost

The case where it doesn't help: you genuinely need every row for ML training or full exports. For that, read the file directly. For everything else โ€” exploration, filtering, aggregation, orientation โ€” structured retrieval wins every time.


ID scheme

Every column and row gets a stable ID:

{dataset}::{column_name}#column     โ†’  "lapd-crime::AREA NAME#column"
{dataset}::row_{rowid}#row          โ†’  "lapd-crime::row_4421#row"
{dataset}::{pk_col}={value}#row     โ†’  "lapd-crime::DR_NO=211507896#row"

Pass column IDs directly to describe_column. Row IDs are returned in get_rows results.


Part of the jMunch family

Product Domain Unit of retrieval PyPI
jcodemunch-mcp Source code Symbols (functions, classes) jcodemunch-mcp
jdocmunch-mcp Documentation Sections (headings) jdocmunch-mcp
jdatamunch-mcp Tabular data Columns, row slices, aggregations jdatamunch-mcp

All three implement jMRI โ€” the open retrieval interface spec. Same response envelope, same token tracking, same telemetry pattern.


Best for

  • analysts, finance, ops, and consultants working with large spreadsheets
  • AI agents that answer questions about CSV, Excel, Parquet, or JSONL data
  • anyone paying token costs to load files they query repeatedly
  • teams that want structured, auditable data access instead of raw file dumps
  • developers building data-aware agents who need a drop-in retrieval layer

New here?

Start with the QuickStart guide โ€” zero to indexed in three steps.

Or if you prefer learning by doing: index a file, run describe_dataset, and look at what comes back.

That single call โ€” 35 milliseconds, 3,849 tokens โ€” tells you everything that would have cost you 111 million tokens to read raw.

That's the whole idea...

Star History Chart

Release History

VersionChangesUrgencyDate
v1.13.0Reported by @AlexJ-StL on jcm#297: Google Antigravity caps MCP servers at 50 tools, but the full munch suite ships 81 + 60 + 35 = 176. Sibling-parity port of the same knobs jdoc gained in v1.64.0. ## New config knobs - `JDATAMUNCH_TOOL_PROFILE=core|standard|full` (default `full`). - `core` (10 tools): index + describe + row-retrieval essentials. - `standard` (~30 tools): core + analysis tools. - `full` (35 tools): everything, current behavior. - `JDATAMUNCH_DISABLED_TOOLS=tool1,tool2,...High5/14/2026
v1.5.0## v1.5.0 โ€” Cell-level redaction on tool output Tabular tools now scrub PII and credentials from cells before returning them to MCP clients. CSV / Excel / Parquet / JSONL data routinely carry emails, SSNs, credit-card numbers, API keys, and PEM bodies in raw columns โ€” those cells would otherwise flow straight into LLM context where they may be cached, logged, or reflected to a tool downstream. The default policy is **ON**; callers opt out per call with `redact=False`. ### New - **`src/jdatamuHigh5/12/2026
v1.4.0Closes the full Phase C list in todo.md. **317 tests passing.** Fully backward-compatible. With 1.4.0, the entire Phase A + B + C roadmap is shipped. ## Aggregation - **`aggregate(approximate=True)`** (C1) โ€” approximate-mode path: - `count_distinct` โ†’ HyperLogLog (~2% standard error) - `median` โ†’ t-digest (~1% accuracy) - `sum` / `avg` โ†’ sampled estimator with 95% confidence-interval half-width Whole-dataset only (no group_by/having/order_by). For very large joined datasets where exactHigh4/28/2026
v0.8.4### Documentation - Added "Works with" section to README with Hermes Agent integration config - Submitted optional skill PR to [NousResearch/hermes-agent#10413](https://github.com/NousResearch/hermes-agent/pull/10413)High4/15/2026
v0.8.3### New features - **`meta_fields` support** โ€” control which `_meta` fields appear in tool responses via `JDATAMUNCH_META_FIELDS` env var. Matches jcodemunch-mcp's `meta_fields` affordance. - Unset / `[]` = strip `_meta` entirely (default, maximum token savings) - `null` / `all` / `*` = include all fields - Comma-separated list = include only those fields (e.g. `timing_ms,powered_by`) ### Tests - 11 new tests (228 total) `pip install --upgrade jdatamunch-mcp`High4/9/2026
v0.8.2## Documentation - **README.md rewrite** โ€” all 18 tools organized by category (indexing, exploration, querying, analysis, management), file format table, semantic search, cross-dataset joins, correlations, NL summaries, data quality tools, built-in guardrails, full configuration reference - **QUICKSTART.md** โ€” new beginner-friendly guide: install, connect, index, query in three steps - **USER-MANUAL.md** โ€” comprehensive manual for non-developer users (analysts, finance, ops) covering all tools Medium4/8/2026
v0.8.1## New features - **`list_repos()` tool** โ€” list GitHub repositories indexed via `index_repo`. Shows repo name, HEAD SHA, dataset count, total rows, total size, and dataset names. ## Tests - 8 new tests (217 total, 10 skipped for optional deps) **Full Changelog**: https://github.com/jgravelle/jdatamunch-mcp/compare/v0.8.0...v0.8.1Medium4/8/2026
v0.8.0## New features - **Semantic / embedding search** โ€” `search_data` now supports `semantic=true` for embedding-based column search. Queries like "where did the crime happen" match `AREA NAME` even without keyword overlap. Three new parameters: `semantic`, `semantic_weight`, `semantic_only`. - **`embed_dataset(dataset)` tool** โ€” precompute column embeddings. Optional warm-up so first semantic query returns immediately. - **Three embedding providers** (first configured wins): - sentence-transformMedium4/8/2026
v0.7.1## New features - **`delete_dataset(dataset)` tool** โ€” remove an indexed dataset and its SQLite store, freeing disk space. Returns rows/columns removed and bytes freed. ## Bug fixes - Fixed unclosed SQLite connections in `create_table` and `create_indexes` that caused `PermissionError` on Windows when deleting datasets (WAL file locks) ## Tests - 26 new tests (177 total, 10 skipped for optional deps) ## Install ``` pip install --upgrade jdatamunch-mcp ```Medium4/8/2026
v0.7.0## New features - **`join_datasets(dataset_a, dataset_b, join_column_a, join_column_b)` tool** โ€” SQL JOIN across two indexed datasets via SQLite `ATTACH DATABASE`. Supports `inner`, `left`, `right`, and `cross` join types. Column projection (`columns_a`/`columns_b`), per-side filters (`filters_a`/`filters_b`), ordering, and pagination. Handles column-name collisions with `__b` suffix. Row limit capped at 500, 30 columns per side. ## Tests - 20 new tests (171 total, 10 skipped for optional depMedium4/8/2026
v0.6.0## What's new - **`get_correlations` tool** โ€” compute pairwise Pearson correlations between all numeric columns. Returns pairs sorted by |r| descending with human-readable strength labels, direction, and sample counts. - Configurable threshold (`min_abs_correlation`, default 0.3) - Optional column filter to restrict analysis - Caps at 50 numeric columns to avoid O(n^2) blowup - Pure SQLite computation โ€” no external deps - 13 new tests (151 total) **Full changelog:** https://github.com/Medium4/8/2026
v0.5.0## What's new - **`index_repo` tool** โ€” index data files directly from a GitHub repository. Discovers CSV, Excel, Parquet, and JSONL files, downloads them, and indexes each via the existing pipeline. - Incremental HEAD SHA caching โ€” skips entirely when repo unchanged - 50 MB per file, 20 files per repo - Concurrent downloads (5 parallel) - `GITHUB_TOKEN` support for private repos - 18 new tests (138 total) All three jMunch tools now have `index_repo` parity. **Full changelog:** https:Medium4/8/2026
v0.4.0## What's new - **Natural-language summaries** โ€” every `index_local` call now auto-generates a dataset-level summary and per-column summaries from profiled statistics. No external API calls needed. - **`summarize_dataset` tool** โ€” regenerate summaries for already-indexed datasets without re-parsing source files. - Column summaries include cardinality labels, null-rate warnings, and value previews. - 18 new tests (120 total). **Full changelog:** https://github.com/jgravelle/jdatamunch-mcp/blob/Medium4/8/2026
v0.3.0## New tools **get_schema_drift(dataset_a, dataset_b)** Compare schema metadata between two indexed datasets. Detects: - Added/removed columns - Type changes (e.g. integer -> string) - Null-rate shifts (>= 1% delta) Assessment field: 'identical' | 'additive' (only additions) | 'breaking' (removals or type changes). Pure in-memory comparison โ€” no re-reading source files. **get_data_hotspots(dataset, top_n=10)** Rank columns by composite data-quality risk combining: - Null rate - Cardinality anMedium4/2/2026
v0.2.1## Housekeeping - Added `LICENSE` file (dual-use: free for non-commercial, paid for commercial) **Full changelog:** https://github.com/jgravelle/jdatamunch-mcp/blob/master/CHANGELOG.mdMedium3/31/2026
v0.1.1## What's new Full Excel support. Both legacy (.xls) and modern (.xlsx) files can now be indexed and queried through all jDataMunch tools. ### Install pip install "jdatamunch-mcp[excel]" ### Details - .xlsx parsed via openpyxl (read_only mode for memory efficiency) - .xls parsed via xlrd 2.x (the binary Excel 97-2003 format) - Cell type dispatch: integers serialize without .0, dates convert to ISO strings, blank/error cells become empty string - sheet= parameter selects a named sheet (defauMedium3/26/2026
v0.1.0## jdatamunch-mcp v0.1.0 Token-efficient MCP server for tabular data exploration. Third leg of the [jMunch suite](https://github.com/jgravelle/) alongside jcodemunch-mcp and jdocmunch-mcp. ### What it does Indexes CSV and Excel files into a compact SQLite + JSON profile, then answers analytical questions in ~4,000 tokens instead of requiring the full file. **Benchmark (LAPD crime records โ€” 1,004,894 rows, 28 cols, 255 MB):** - Baseline (raw file): 111,028,360 tokens - jDataMunch (describe_daMedium3/26/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

AI-Skills๐Ÿค– Enhance AI capabilities with modular Skills that provide expert knowledge, workflows, and integrations for any project.main@2026-06-07
telegram-mcp๐Ÿค– Manage multiple Telegram accounts effortlessly with AI-driven tools for bulk messaging, scheduling, and more in one easy-to-use platform.main@2026-06-07
Enterprise-Multi-AI-Agent-Systems-๐Ÿค– Build and deploy scalable Multi-AI Agent systems with LangGraph and Groq LLMs to enhance intelligence across enterprise applications.main@2026-06-07
AIDomesticCoreAIJ๐Ÿ› ๏ธ Build a robust AI Kernel for stable, auditable, and sovereign AI systems, ensuring secure execution and compliance across various domains.main@2026-06-07
argus-mcp๐Ÿ” Enhance code quality with Argus MCP, an AI-driven code review server using a Zero-Trust model for safe and efficient development.main@2026-06-07

More from jgravelle

jcodemunch-mcpThe leading, most token-efficient MCP server for GitHub source code exploration via tree-sitter AST parsing
jdocmunch-mcpThe leading, most token-efficient MCP server for documentation exploration and retrieval via structured section indexing

More in MCP Servers

node9-proxyThe Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.
mcp-compressorAn MCP server wrapper for reducing tokens consumed by MCP tools.
claude-plugins-officialOfficial, Anthropic-managed directory of high quality Claude Code Plugins.
langchain4jLangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes impleme