# Election Analytics Chatbot - Project Guide ## Overview This document serves as a guide for rewriting the current "BambooAI" based chatbot system into a modern, stateful, and graph-based architecture using **LangGraph**. The goal is to improve maintainability, observability, and flexibility of the agentic workflows. ## 1. Migration Goals - **Framework Switch**: Move from the custom linear `ChatBot` class (in `src/ea_chatbot/bambooai/core/chatbot.py`) to `LangGraph`. - **State Management**: explicit state management using LangGraph's `StateGraph`. - **Modularity**: Break down monolithic methods (`pd_agent_converse`, `execute_code`) into distinct Nodes. - **Observability**: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing). ## 2. Architecture Proposal ### 2.1. The Graph State The state will track the conversation and execution context. ```python from typing import TypedDict, Annotated, List, Dict, Any, Optional from langchain_core.messages import BaseMessage import operator class AgentState(TypedDict): # Conversation history messages: Annotated[List[BaseMessage], operator.add] # Task context question: str # Query Analysis (Decomposition results) analysis: Optional[Dict[str, Any]] # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition" # Step-by-step reasoning plan: Optional[str] # Code execution context code: Optional[str] code_output: Optional[str] error: Optional[str] # Artifacts (for UI display) plots: List[Figure] # Matplotlib figures dfs: Dict[str, DataFrame] # Pandas DataFrames # Control flow iterations: int next_action: str # Routing hint: "clarify", "plan", "research", "end" ``` ### 2.2. Nodes (The Actors) We will map existing logic to these nodes: 1. **`query_analyzer_node`** (Router & Refiner): * **Logic**: Replaces `Expert Selector` and `Analyst Selector`. * **Function**: 1. Decomposes the user's query into key elements (Data, Unknowns, Conditions). 2. Determines if the query is ambiguous or missing critical information. * **Output**: Updates `messages`. Returns routing decision: * `clarification_node` (if ambiguous). * `planner_node` (if clear data task). * `researcher_node` (if general/web task). 2. **`clarification_node`** (Human-in-the-loop): * **Logic**: Replaces `Theorist-Clarification`. * **Function**: Formulates a specific question to ask the user for missing details. * **Output**: Returns a message to the user and **interrupts** the graph execution to await user input. 3. **`researcher_node`** (Theorist): * **Logic**: Handles general queries or web searches. * **Function**: Uses `GoogleSearch` tool if necessary. * **Output**: Final answer. 4. **`planner_node`**: * **Logic**: Replaces `Planner`. * **Function**: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology. * **Output**: Updates `plan`. 5. **`coder_node`**: * **Logic**: Replaces `Code Generator` & `Error Corrector`. * **Function**: Generates Python code. If `error` exists in state, it attempts to fix it. * **Output**: Updates `code`. 6. **`executor_node`**: * **Logic**: Replaces `Code Executor`. * **Function**: Executes the Python code in a safe(r) environment. It needs access to the `DBClient`. * **Output**: Updates `code_output`, `plots`, `dfs`. If exception, updates `error`. 7. **`summarizer_node`**: * **Logic**: Replaces `Solution Summarizer`. * **Function**: Interprets the code output and generates a natural language response. * **Output**: Final response message. ### 2.3. The Workflow (Graph) ```mermaid graph TD Start --> QueryAnalyzer QueryAnalyzer -->|Ambiguous| Clarification Clarification -->|User Input| QueryAnalyzer QueryAnalyzer -->|General/Web| Researcher QueryAnalyzer -->|Data Analysis| Planner Planner --> Coder Coder --> Executor Executor -->|Success| Summarizer Executor -->|Error| Coder Researcher --> End Summarizer --> End ``` ## 3. Implementation Steps ### Step 1: Dependencies Add the following packages to `pyproject.toml`: * `langgraph` * `langchain` * `langchain-openai` * `langchain-google-genai` * `langchain-community` ### Step 2: Directory Structure Create a new package for the graph logic to keep it separate from the old one during migration. ``` src/ea_chatbot/ ├── graph/ │ ├── __init__.py │ ├── state.py # State definition │ ├── nodes/ # Individual node implementations │ │ ├── __init__.py │ │ ├── router.py │ │ ├── planner.py │ │ ├── coder.py │ │ ├── executor.py │ │ └── ... │ ├── workflow.py # Graph construction │ └── tools/ # DB and Search tools wrapped for LangChain └── ... ``` ### Step 3: Tool Wrapping Wrap the existing `DBClient` (from `src/ea_chatbot/bambooai/utils/db_client.py`) into a structure accessible by the `executor_node`. The `executor_node` will likely keep the existing `exec()` based approach initially for compatibility with the generated code, but structured as a graph node. ### Step 4: Prompt Migration Port the prompts from `data/PROMPT_TEMPLATES.json` or `src/ea_chatbot/bambooai/prompts/strings.py` into the respective nodes. Use LangChain's `ChatPromptTemplate` for better management. ### Step 5: Streamlit Integration Update `src/ea_chatbot/app.py` to use the new `workflow.compile()` runnable. * Instead of `chatbot.pd_agent_converse(...)`, use `app.stream(...)` (LangGraph app). * Handle the streaming output to update the UI progressively. ## 4. Key Considerations for Refactoring * **Database Connection**: Ensure `DBClient` is initialized once and passed to the `Executor` node efficiently (e.g., via `configurable` parameters or closure). * **Prompt Templating**: The current system uses simple `format` strings. Switching to LangChain templates allows for easier model switching and partial formatting. * **Token Management**: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the `OutputManager` logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs. * **Vector DB**: The current system has `PineconeWrapper` for RAG. This should be integrated into the `Planner` or `Coder` node to fetch few-shot examples or context. ## 5. Next Actions 1. **Initialize**: Create the folder structure. 2. **Define State**: Create `src/ea_chatbot/graph/state.py`. 3. **Implement Router**: Create the first node to replicate `Expert Selector` logic. 4. **Implement Executor**: Port the `exec()` logic to a node. ## 6. Git Operations - Branches should be used for specific features or bug fixes. - New branches should be created from the `main` branch and `conductor` branch. - The conductor should always use the `conductor` branch and derived branches. - When a feature or fix is complete, use rebase to keep the commit history clean before merging. - The conductor related changes should never be merged into the `main` branch.