docs: Update project documentation to reflect Orchestrator-Workers architecture

This commit is contained in:
Yunxiao Xu
2026-02-23 16:17:57 -08:00
parent 4d92c9aedb
commit 88a27f5a8d
2 changed files with 23 additions and 16 deletions

View File

@@ -4,6 +4,7 @@ A stateful, graph-based chatbot for election data analysis, built with LangGraph
## 🚀 Features ## 🚀 Features
- **Multi-Agent Orchestration**: Decomposes complex queries and delegates them to specialized sub-agents (Data Analyst, Researcher) using a robust feedback loop.
- **Intelligent Query Analysis**: Automatically determines if a query needs data analysis, web research, or clarification. - **Intelligent Query Analysis**: Automatically determines if a query needs data analysis, web research, or clarification.
- **Automated Data Analysis**: Generates and executes Python code to analyze election datasets and produce visualizations. - **Automated Data Analysis**: Generates and executes Python code to analyze election datasets and produce visualizations.
- **Web Research**: Integrates web search capabilities for general election-related questions. - **Web Research**: Integrates web search capabilities for general election-related questions.

View File

@@ -1,12 +1,13 @@
# Election Analytics Chatbot - Backend Guide # Election Analytics Chatbot - Backend Guide
## Overview ## Overview
The backend is a Python-based FastAPI application that leverages **LangGraph** to provide a stateful, agentic workflow for election data analysis. It handles complex queries by decomposing them into tasks such as data analysis, web research, or user clarification. The backend is a Python-based FastAPI application that leverages **LangGraph** to provide a stateful, hierarchical multi-agent workflow for election data analysis. It handles complex queries using an Orchestrator-Workers pattern, decomposing tasks and delegating them to specialized subgraphs (Data Analyst, Researcher) with built-in reflection and error recovery.
## 1. Architecture Overview ## 1. Architecture Overview
- **Framework**: LangGraph for workflow orchestration and state management. - **Framework**: LangGraph for hierarchical workflow orchestration and state management.
- **API**: FastAPI for providing REST and streaming (SSE) endpoints. - **API**: FastAPI for providing REST and streaming (SSE) endpoints.
- **State Management**: Persistent state using LangGraph's `StateGraph` with a PostgreSQL checkpointer. - **State Management**: Persistent state using LangGraph's `StateGraph` with a PostgreSQL checkpointer. Maintains global state (`AgentState`) and isolated worker states (`WorkerState`).
- **Virtual File System (VFS)**: An in-memory abstraction passed between nodes to manage intermediate artifacts (scripts, CSVs, charts) without bloating the context window.
- **Database**: PostgreSQL. - **Database**: PostgreSQL.
- Application data: Uses `users` table for local and OIDC users (String IDs). - Application data: Uses `users` table for local and OIDC users (String IDs).
- History: Persists chat history and artifacts. - History: Persists chat history and artifacts.
@@ -14,23 +15,28 @@ The backend is a Python-based FastAPI application that leverages **LangGraph** t
## 2. Core Components ## 2. Core Components
### 2.1. The Graph State (`src/ea_chatbot/graph/state.py`) ### 2.1. State Management (`src/ea_chatbot/graph/state.py` & `workers/*/state.py`)
The state tracks the conversation context, plan, generated code, execution results, and artifacts. - **Global State**: Tracks the conversation context, the high-level task `checklist`, execution progress (`current_step`), and the VFS.
- **Worker State**: Isolated snapshot for specialized subgraphs, tracking internal retry loops (`iterations`), worker-specific prompts, and raw results.
### 2.2. Nodes (The Actors) ### 2.2. The Orchestrator
Located in `src/ea_chatbot/graph/nodes/`: Located in `src/ea_chatbot/graph/nodes/`:
- **`query_analyzer`**: Analyzes the user query to determine the intent and required data. - **`query_analyzer`**: Analyzes the user query to determine the intent and required data. If ambiguous, routes to `clarification`.
- **`planner`**: Creates a step-by-step plan for data analysis. - **`planner`**: Decomposes the user request into a strategic `checklist` of sub-tasks assigned to specific workers.
- **`coder`**: Generates Python code based on the plan and dataset metadata. - **`delegate`**: The traffic controller. Routes the current task to the appropriate worker and enforces a strict retry budget to prevent infinite loops.
- **`executor`**: Safely executes the generated code and captures outputs (dataframes, plots). - **`reflector`**: The quality control node. Evaluates a worker's summary against the sub-task requirements. Can trigger a retry if unsatisfied.
- **`error_corrector`**: Fixes code if execution fails. - **`synthesizer`**: Aggregates all worker results into a final, cohesive response for the user.
- **`researcher`**: Performs web searches for general election information. - **`clarification`**: Asks the user for more information if the query is critically ambiguous.
- **`summarizer`**: Generates a natural language response based on the analysis results.
- **`clarification`**: Asks the user for more information if the query is ambiguous.
### 2.3. The Workflow (Graph) ### 2.3. Specialized Workers (Sub-Graphs)
The graph connects these nodes with conditional edges, allowing for iterative refinement and error correction. Located in `src/ea_chatbot/graph/workers/`:
- **`data_analyst`**: Generates Python/SQL code, executes it securely, and captures dataframes/plots. Contains an internal retry loop (`coder` -> `executor` -> error check -> `coder`).
- **`researcher`**: Performs web searches for general election information and synthesizes factual findings.
### 2.4. The Workflow
The global graph connects the Orchestrator nodes, wrapping the Worker subgraphs as self-contained nodes with mapped inputs and outputs.
## 3. Key Modules ## 3. Key Modules