Why this rank:Strong adoptionRecent releaseHealthy release cadence
Description
MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.
README
Kubeflow Spark History MCP Server
๐ค Connect AI agents to Apache Spark History Server for intelligent job analysis and performance monitoring
Transform your Spark infrastructure monitoring with AI! This Model Context Protocol (MCP) server enables AI agents to analyze job performance, identify bottlenecks, and provide intelligent insights from your Spark History Server data.
๐ฏ What is This?
Spark History Server MCP bridges AI agents with your existing Apache Spark infrastructure, enabling:
๐ Query job details through natural language
๐ Analyze performance metrics across applications
๐ Compare multiple jobs to identify regressions
๐จ Investigate failures with detailed error analysis
๐ Generate insights from historical execution data
graph TB
A[๐ค AI Agent/LLM] --> F[๐ก MCP Client]
B[๐ฆ LlamaIndex Agent] --> F
C[๐ LangGraph] --> F
D[๏ฟฝ๏ธ Claudep Desktop] --> F
E[๐ ๏ธ Amazon Q CLI] --> F
F --> G[โก Spark History MCP Server]
G --> H[๐ฅ Prod Spark History Server]
G --> I[๐ฅ Staging Spark History Server]
G --> J[๐ฅ Dev Spark History Server]
H --> K[๐ Prod Event Logs]
I --> L[๐ Staging Event Logs]
J --> M[๐ Dev Event Logs]
Loading
๐ Components:
๐ฅ Spark History Server: Your existing infrastructure serving Spark event data
โก MCP Server: This project - provides MCP tools for querying Spark data
๐ค AI Agents: LangChain, custom agents, or any MCP-compatible client
git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
# Install Task (if not already installed)
brew install go-task # macOS, see https://taskfile.dev/installation/ for others# Setup and start testing
task start-spark-bg # Start Spark History Server with sample data (default Spark 3.5.5)# Or specify a different Spark version:# task start-spark-bg spark_version=3.5.2
task start-mcp-bg # Start MCP Server# Optional: Opens MCP Inspector on http://localhost:6274 for interactive testing# Requires Node.js: 22.7.5+ (Check https://github.com/modelcontextprotocol/inspector for latest requirements)
task start-inspector-bg # Start MCP Inspector# When done, run `task stop-all`
If you just want to run the MCP server without cloning the repository:
# Run with uv without installing the module
uvx --from mcp-apache-spark-history-server spark-mcp
# OR run with pip and python. Use of venv is highly encouraged.
python3 -m venv spark-mcp &&source spark-mcp/bin/activate
pip install mcp-apache-spark-history-server
python3 -m spark_history_mcp.core.main
# Deactivate venv
deactivate
โ๏ธ Server Configuration
Edit config.yaml for your Spark History Server:
Config File Options:
Command line: --config /path/to/config.yaml or -c /path/to/config.yaml
Note: These tools are subject to change as we scale and improve the performance of the MCP server.
The MCP server provides 18 specialized tools organized by analysis patterns. LLMs can intelligently select and combine these tools based on user queries:
๐ Application Information
Basic application metadata and overview
๐ง Tool
๐ Description
list_applications
๐ Get a list of all applications available on the Spark History Server with optional filtering by status, date ranges, and limits
get_application
๐ Get detailed information about a specific Spark application including status, resource usage, duration, and attempt details
๐ Job Analysis
Job-level performance analysis and identification
๐ง Tool
๐ Description
list_jobs
๐ Get a list of all jobs for a Spark application with optional status filtering
list_slowest_jobs
โฑ๏ธ Get the N slowest jobs for a Spark application (excludes running jobs by default)
โก Stage Analysis
Stage-level performance deep dive and task metrics
๐ง Tool
๐ Description
list_stages
โก Get a list of all stages for a Spark application with optional status filtering and summaries
list_slowest_stages
๐ Get the N slowest stages for a Spark application (excludes running stages by default)
get_stage
๐ฏ Get information about a specific stage with optional attempt ID and summary metrics
get_stage_task_summary
๐ Get statistical distributions of task metrics for a specific stage (execution times, memory usage, I/O metrics)
๐ฅ๏ธ Executor & Resource Analysis
Resource utilization, executor performance, and allocation tracking
๐ง Tool
๐ Description
list_executors
๐ฅ๏ธ Get executor information with optional inactive executor inclusion
get_executor
๐ Get information about a specific executor including resource allocation, task statistics, and performance metrics
get_executor_summary
๐ Aggregates metrics across all executors (memory usage, disk usage, task counts, performance metrics)
get_resource_usage_timeline
๐ Get chronological view of resource allocation and usage patterns including executor additions/removals
โ๏ธ Configuration & Environment
Spark configuration, environment variables, and runtime settings
๐ง Tool
๐ Description
get_environment
โ๏ธ Get comprehensive Spark runtime configuration including JVM info, Spark properties, system properties, and classpath
๐ SQL & Query Analysis
SQL performance analysis and execution plan comparison
๐ง Tool
๐ Description
list_slowest_sql_queries
๐ Get the top N slowest SQL queries for an application with detailed execution metrics and optional plan descriptions
compare_sql_execution_plans
๐ Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution metrics
๐จ Performance & Bottleneck Analysis
Intelligent bottleneck identification and performance recommendations
๐ง Tool
๐ Description
get_job_bottlenecks
๐จ Identify performance bottlenecks by analyzing stages, tasks, and executors with actionable recommendations
๐ Comparative Analysis
Cross-application comparison for regression detection and optimization
๐ง Tool
๐ Description
compare_job_environments
โ๏ธ Compare Spark environment configurations between two jobs to identify differences in properties and settings
compare_job_performance
๐ Compare performance metrics between two Spark jobs including execution times, resource usage, and task distribution
๐ค How LLMs Use These Tools
Query Pattern Examples:
"Show me all applications between 12 AM and 1 AM on 2025-06-27" โ list_applications
"Why is my job slow?" โ get_job_bottlenecks + list_slowest_stages + get_executor_summary
"Compare today vs yesterday" โ compare_job_performance + compare_job_environments
"What's wrong with stage 5?" โ get_stage + get_stage_task_summary
"Show me resource usage over time" โ get_resource_usage_timeline + get_executor_summary
"Find my slowest SQL queries" โ list_slowest_sql_queries + compare_sql_execution_plans
๐ AWS Integration Guides
If you are an existing AWS user looking to analyze your Spark Applications, we provide detailed setup guides for:
SHS_MCP_PORT - Port for MCP server (default: 18888)
SHS_MCP_DEBUG - Enable debug mode (default: false)
SHS_MCP_ADDRESS - Address for MCP server (default: localhost)
SHS_MCP_TRANSPORT - MCP transport mode (default: streamable-http)
SHS_SERVERS_*_URL - URL for a specific server
SHS_SERVERS_*_AUTH_USERNAME - Username for a specific server
SHS_SERVERS_*_AUTH_PASSWORD - Password for a specific server
SHS_SERVERS_*_AUTH_TOKEN - Token for a specific server
SHS_SERVERS_*_VERIFY_SSL - Whether to verify SSL for a specific server (true/false)
SHS_SERVERS_*_TIMEOUT - HTTP request timeout in seconds for a specific server (default: 30)
SHS_SERVERS_*_EMR_CLUSTER_ARN - EMR cluster ARN for a specific server
SHS_SERVERS_*_INCLUDE_PLAN_DESCRIPTION - Whether to include SQL execution plans by default for a specific server (true/false, default: false)
## What's Changed 63f3ea4 Update docs (#189) 0b037df feat(cli): add threaddump subcommand (#188) 847cafc docs(cli): document logs subcommand in README (#187) 7a1092c CLI: add `logs` subcommand for executor/task log URLs (#177) f43cfd6 fix(cli): Use platform-prefixed flags for troubleshoot command (#185)
High
5/29/2026
cli/v0.2.0
## What's Changed 11ba3f4 fix(cli): Remove stale config reference from README (#182) ca1520a feat(cli): Add shs troubleshoot subcommand (#176)
High
5/13/2026
cli/v0.1.0
## What's Changed 7cff895 docs: add SHS CLI launch banner and fix README accuracy for v1.0.0 (#174) 9762755 CLI: add config command with default config values (#173) 9fc6b84 add GitHub release for CLI (#170) 7bc07d2 CLI: Add utility commands and flags (#167) 8feb232 docs: Add SHS CLI announcement and update skills/cli README (#169) ae72a6d CLI: add cli compare example (#168) 7d1a6f2 Add SHS CLI (#165)
High
4/28/2026
v0.1.5
## What's Changed * sync gh and pypi releases by @nabuskey in https://github.com/kubeflow/mcp-apache-spark-history-server/pull/114 * update q cli doc by @nabuskey in https://github.com/kubeflow/mcp-apache-spark-history-server/pull/115 * Make include_plan_description configurable by @zemin-piao in https://github.com/kubeflow/mcp-apache-spark-history-server/pull/118 * Mark executorMetricsDistributions.peakMemoryMetrics.quantiles Optional by @zemin-piao in https://github.com/kubeflow/mcp-apache-spa
Low
1/13/2026
v0.1.0
# ๐ MCP Apache Spark History Server v0.1.0 Release ## ๐ Initial Release Highlights We're excited to announce the first official release of the MCP Apache Spark History Server! This groundbreaking tool enables AI-powered debugging and optimization of Apache Spark jobs through natural language interactions. ### ๐ค AI Integration โข Claude Desktop Integration - Seamless AI-powered Spark job analysis โข Amazon Q CLI Support - Native AWS AI assistant integration โข LangGraph Example with O
Low
7/30/2025
Dependencies & License Audit
Loading dependencies...
Similar Packages
hybrid-orchestrator๐ค Implement hybrid human-AI orchestration patterns in Python to coordinate agents, manage sessions, and enable smooth AI-human handoffs.master@2026-06-02
sqltools_mcp๐ Access multiple databases seamlessly with SQLTools MCP, a versatile service supporting MySQL, PostgreSQL, SQL Server, DM8, and SQLite without multiple servers.main@2026-06-07
pipulateLocal First AI SEO Software on Nix, FastHTML & HTMXmain@2026-06-06
OmnispindleA comprehensive MCP-based todo management system, that serves as a central nervous system for Madness Interactive, a multi-project task coordination workshop.main@2026-06-05
rocketride-serverHigh-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale LLM workflows with 13+ model providers, 8+ vector databases, and agent orchestration, all fvscode-v1.2.0
More in MCP Servers
node9-proxyThe Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.
mcp-compressorAn MCP server wrapper for reducing tokens consumed by MCP tools.
claude-plugins-officialOfficial, Anthropic-managed directory of high quality Claude Code Plugins.
langchain4jLangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes impleme