docs: update project documentation and verification strategies

- Update GEMINI.md with verification steps and remove ignored docs reference - Update README.md to remove reference to local langchain-docs - Update backend/GEMINI.md with correct database schema (users table) and architecture details - Update frontend/GEMINI.md with latest project structure
2026-02-20 17:14:16 -08:00
parent cc927e2a90
commit b4f79ee052
4 changed files with 144 additions and 147 deletions
--- a/backend/GEMINI.md
+++ b/backend/GEMINI.md
@@ -1,162 +1,63 @@
 # Election Analytics Chatbot - Backend Guide

 ## Overview
-This document serves as a guide for the backend implementation of the Election Analytics Chatbot, specifically focusing on the transition from the "BambooAI" based system to a modern, stateful, and graph-based architecture using **LangGraph**.
+The backend is a Python-based FastAPI application that leverages **LangGraph** to provide a stateful, agentic workflow for election data analysis. It handles complex queries by decomposing them into tasks such as data analysis, web research, or user clarification.

-## 1. Migration Goals
- **Framework Switch**: Move from the custom linear `ChatBot` class (in `src/ea_chatbot/bambooai/core/chatbot.py`) to `LangGraph`.
- **State Management**: explicit state management using LangGraph's `StateGraph`.
- **Modularity**: Break down monolithic methods (`pd_agent_converse`, `execute_code`) into distinct Nodes.
- **Observability**: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).
+## 1. Architecture Overview
+- **Framework**: LangGraph for workflow orchestration and state management.
+- **API**: FastAPI for providing REST and streaming (SSE) endpoints.
+- **State Management**: Persistent state using LangGraph's `StateGraph` with a PostgreSQL checkpointer.
+- **Database**: PostgreSQL.
+  - Application data: Uses `users` table for local and OIDC users (String IDs).
+  - History: Persists chat history and artifacts.
+  - Election Data: Structured datasets for analysis.

-## 2. Architecture Proposal
+## 2. Core Components

-### 2.1. The Graph State
-The state will track the conversation and execution context.
-
-```python
-from typing import TypedDict, Annotated, List, Dict, Any, Optional
-from langchain_core.messages import BaseMessage
-import operator
-
-class AgentState(TypedDict):
-    # Conversation history
-    messages: Annotated[List[BaseMessage], operator.add]
-    
-    # Task context
-    question: str
-    
-    # Query Analysis (Decomposition results)
-    analysis: Optional[Dict[str, Any]] 
-    # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
-    
-    # Step-by-step reasoning
-    plan: Optional[str]
-    
-    # Code execution context
-    code: Optional[str]
-    code_output: Optional[str]
-    error: Optional[str]
-    
-    # Artifacts (for UI display)
-    plots: List[Figure] # Matplotlib figures
-    dfs: Dict[str, DataFrame] # Pandas DataFrames
-    
-    # Control flow
-    iterations: int
-    next_action: str # Routing hint: "clarify", "plan", "research", "end"
-```
+### 2.1. The Graph State (`src/ea_chatbot/graph/state.py`)
+The state tracks the conversation context, plan, generated code, execution results, and artifacts.

 ### 2.2. Nodes (The Actors)
-We will map existing logic to these nodes:
+Located in `src/ea_chatbot/graph/nodes/`:

-1.  **`query_analyzer_node`** (Router & Refiner):
-    *   **Logic**: Replaces `Expert Selector` and `Analyst Selector`.
-    *   **Function**: 
-        1. Decomposes the user's query into key elements (Data, Unknowns, Conditions).
-        2. Determines if the query is ambiguous or missing critical information.
-    *   **Output**: Updates `messages`. Returns routing decision:
-        *   `clarification_node` (if ambiguous).
-        *   `planner_node` (if clear data task).
-        *   `researcher_node` (if general/web task).
-
-2.  **`clarification_node`** (Human-in-the-loop):
-    *   **Logic**: Replaces `Theorist-Clarification`.
-    *   **Function**: Formulates a specific question to ask the user for missing details.
-    *   **Output**: Returns a message to the user and **interrupts** the graph execution to await user input.
-
-3.  **`researcher_node`** (Theorist):
-    *   **Logic**: Handles general queries or web searches.
-    *   **Function**: Uses `GoogleSearch` tool if necessary.
-    *   **Output**: Final answer.
-
-4.  **`planner_node`**:
-    *   **Logic**: Replaces `Planner`.
-    *   **Function**: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
-    *   **Output**: Updates `plan`.
-
-5.  **`coder_node`**:
-    *   **Logic**: Replaces `Code Generator` & `Error Corrector`.
-    *   **Function**: Generates Python code. If `error` exists in state, it attempts to fix it.
-    *   **Output**: Updates `code`.
-
-6.  **`executor_node`**:
-    *   **Logic**: Replaces `Code Executor`.
-    *   **Function**: Executes the Python code in a safe(r) environment. It needs access to the `DBClient`.
-    *   **Output**: Updates `code_output`, `plots`, `dfs`. If exception, updates `error`.
-
-7.  **`summarizer_node`**:
-    *   **Logic**: Replaces `Solution Summarizer`.
-    *   **Function**: Interprets the code output and generates a natural language response.
-    *   **Output**: Final response message.
+- **`query_analyzer`**: Analyzes the user query to determine the intent and required data.
+- **`planner`**: Creates a step-by-step plan for data analysis.
+- **`coder`**: Generates Python code based on the plan and dataset metadata.
+- **`executor`**: Safely executes the generated code and captures outputs (dataframes, plots).
+- **`error_corrector`**: Fixes code if execution fails.
+- **`researcher`**: Performs web searches for general election information.
+- **`summarizer`**: Generates a natural language response based on the analysis results.
+- **`clarification`**: Asks the user for more information if the query is ambiguous.

 ### 2.3. The Workflow (Graph)
+The graph connects these nodes with conditional edges, allowing for iterative refinement and error correction.

-```mermaid
-graph TD
-    Start --> QueryAnalyzer
-    QueryAnalyzer -->|Ambiguous| Clarification
-    Clarification -->|User Input| QueryAnalyzer
-    QueryAnalyzer -->|General/Web| Researcher
-    QueryAnalyzer -->|Data Analysis| Planner
-    Planner --> Coder
-    Coder --> Executor
-    Executor -->|Success| Summarizer
-    Executor -->|Error| Coder
-    Researcher --> End
-    Summarizer --> End
+## 3. Key Modules
+
+- **`src/ea_chatbot/api/`**: Contains FastAPI routers for authentication, conversation management, and the agent streaming endpoint.
+- **`src/ea_chatbot/graph/`**: Core LangGraph logic, including state definitions, node implementations, and the workflow graph.
+- **`src/ea_chatbot/history/`**: Manages persistent chat history and message mapping between application models and LangGraph state.
+- **`src/ea_chatbot/utils/`**: Utility functions for database inspection, LLM factory, and logging.
+
+## 4. Development & Execution
+
+### Entry Point
+The main entry point for the API is `src/ea_chatbot/api/main.py`.
+
+### Running the API
+```bash
+cd backend
+uv run python -m ea_chatbot.api.main
 ```

-## 3. Implementation Steps
-
-### Step 1: Dependencies
-Add the following packages to `pyproject.toml`:
-*   `langgraph`
-*   `langchain`
-*   `langchain-openai`
-*   `langchain-google-genai`
-*   `langchain-community`
-
-### Step 2: Directory Structure
-Create a new package for the graph logic to keep it separate from the old one during migration.
-
-```
-src/ea_chatbot/
-├── graph/
-│   ├── __init__.py
-│   ├── state.py       # State definition
-│   ├── nodes/         # Individual node implementations
-│   │   ├── __init__.py
-│   │   ├── router.py
-│   │   ├── planner.py
-│   │   ├── coder.py
-│   │   ├── executor.py
-│   │   └── ...
-│   ├── workflow.py    # Graph construction
-│   └── tools/         # DB and Search tools wrapped for LangChain
-└── ...
+### Database Migrations
+Handled by Alembic.
+```bash
+uv run alembic upgrade head
 ```

-### Step 3: Tool Wrapping
-Wrap the existing `DBClient` (from `src/ea_chatbot/bambooai/utils/db_client.py`) into a structure accessible by the `executor_node`. The `executor_node` will likely keep the existing `exec()` based approach initially for compatibility with the generated code, but structured as a graph node.
-
-### Step 4: Prompt Migration
-Port the prompts from `data/PROMPT_TEMPLATES.json` or `src/ea_chatbot/bambooai/prompts/strings.py` into the respective nodes. Use LangChain's `ChatPromptTemplate` for better management.
-
-### Step 5: Integration
-Update `src/ea_chatbot/app.py` to use the new `workflow.compile()` runnable.
-*   Instead of `chatbot.pd_agent_converse(...)`, use `app.stream(...)` (LangGraph app).
-*   Handle the streaming output to update the UI progressively.
-
-## 4. Key Considerations for Refactoring
-
-*   **Database Connection**: Ensure `DBClient` is initialized once and passed to the `Executor` node efficiently (e.g., via `configurable` parameters or closure).
-*   **Prompt Templating**: The current system uses simple `format` strings. Switching to LangChain templates allows for easier model switching and partial formatting.
-*   **Token Management**: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the `OutputManager` logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs.
-*   **Vector DB**: The current system has `PineconeWrapper` for RAG. This should be integrated into the `Planner` or `Coder` node to fetch few-shot examples or context.
-
-## 5. Next Actions
-1.  **Initialize**: Create the folder structure.
-2.  **Define State**: Create `src/ea_chatbot/graph/state.py`.
-3.  **Implement Router**: Create the first node to replicate `Expert Selector` logic.
-4.  **Implement Executor**: Port the `exec()` logic to a node.
+### Testing
+Tests are located in the `tests/` directory and use `pytest`.
+```bash
+uv run pytest
+```