Election Analytics Chatbot - Project Guide

Overview

This document serves as a guide for rewriting the current "BambooAI" based chatbot system into a modern, stateful, and graph-based architecture using LangGraph. The goal is to improve maintainability, observability, and flexibility of the agentic workflows.

1. Migration Goals

Framework Switch: Move from the custom linear ChatBot class (in src/ea_chatbot/bambooai/core/chatbot.py) to LangGraph.
State Management: explicit state management using LangGraph's StateGraph.
Modularity: Break down monolithic methods (pd_agent_converse, execute_code) into distinct Nodes.
Observability: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).

2. Architecture Proposal

2.1. The Graph State

The state will track the conversation and execution context.

from typing import TypedDict, Annotated, List, Dict, Any, Optional
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # Conversation history
    messages: Annotated[List[BaseMessage], operator.add]
    
    # Task context
    question: str
    
    # Query Analysis (Decomposition results)
    analysis: Optional[Dict[str, Any]] 
    # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
    
    # Step-by-step reasoning
    plan: Optional[str]
    
    # Code execution context
    code: Optional[str]
    code_output: Optional[str]
    error: Optional[str]
    
    # Artifacts (for UI display)
    plots: List[Figure] # Matplotlib figures
    dfs: Dict[str, DataFrame] # Pandas DataFrames
    
    # Control flow
    iterations: int
    next_action: str # Routing hint: "clarify", "plan", "research", "end"

2.2. Nodes (The Actors)

We will map existing logic to these nodes:

query_analyzer_node (Router & Refiner):
- Logic: Replaces Expert Selector and Analyst Selector.
- Function:
  1. Decomposes the user's query into key elements (Data, Unknowns, Conditions).
  2. Determines if the query is ambiguous or missing critical information.
- Output: Updates messages. Returns routing decision:
  - clarification_node (if ambiguous).
  - planner_node (if clear data task).
  - researcher_node (if general/web task).
clarification_node (Human-in-the-loop):
- Logic: Replaces Theorist-Clarification.
- Function: Formulates a specific question to ask the user for missing details.
- Output: Returns a message to the user and interrupts the graph execution to await user input.
researcher_node (Theorist):
- Logic: Handles general queries or web searches.
- Function: Uses GoogleSearch tool if necessary.
- Output: Final answer.
planner_node:
- Logic: Replaces Planner.
- Function: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
- Output: Updates plan.
coder_node:
- Logic: Replaces Code Generator & Error Corrector.
- Function: Generates Python code. If error exists in state, it attempts to fix it.
- Output: Updates code.
executor_node:
- Logic: Replaces Code Executor.
- Function: Executes the Python code in a safe(r) environment. It needs access to the DBClient.
- Output: Updates code_output, plots, dfs. If exception, updates error.
summarizer_node:
- Logic: Replaces Solution Summarizer.
- Function: Interprets the code output and generates a natural language response.
- Output: Final response message.

2.3. The Workflow (Graph)

graph TD
    Start --> QueryAnalyzer
    QueryAnalyzer -->|Ambiguous| Clarification
    Clarification -->|User Input| QueryAnalyzer
    QueryAnalyzer -->|General/Web| Researcher
    QueryAnalyzer -->|Data Analysis| Planner
    Planner --> Coder
    Coder --> Executor
    Executor -->|Success| Summarizer
    Executor -->|Error| Coder
    Researcher --> End
    Summarizer --> End

3. Implementation Steps

Step 1: Dependencies

Add the following packages to pyproject.toml:

langgraph
langchain
langchain-openai
langchain-google-genai
langchain-community

Step 2: Directory Structure

Create a new package for the graph logic to keep it separate from the old one during migration.

src/ea_chatbot/
├── graph/
│   ├── __init__.py
│   ├── state.py       # State definition
│   ├── nodes/         # Individual node implementations
│   │   ├── __init__.py
│   │   ├── router.py
│   │   ├── planner.py
│   │   ├── coder.py
│   │   ├── executor.py
│   │   └── ...
│   ├── workflow.py    # Graph construction
│   └── tools/         # DB and Search tools wrapped for LangChain
└── ...

Step 3: Tool Wrapping

Wrap the existing DBClient (from src/ea_chatbot/bambooai/utils/db_client.py) into a structure accessible by the executor_node. The executor_node will likely keep the existing exec() based approach initially for compatibility with the generated code, but structured as a graph node.

Step 4: Prompt Migration

Port the prompts from data/PROMPT_TEMPLATES.json or src/ea_chatbot/bambooai/prompts/strings.py into the respective nodes. Use LangChain's ChatPromptTemplate for better management.

Step 5: Streamlit Integration

Update src/ea_chatbot/app.py to use the new workflow.compile() runnable.

Instead of chatbot.pd_agent_converse(...), use app.stream(...) (LangGraph app).
Handle the streaming output to update the UI progressively.

4. Key Considerations for Refactoring

Database Connection: Ensure DBClient is initialized once and passed to the Executor node efficiently (e.g., via configurable parameters or closure).
Prompt Templating: The current system uses simple format strings. Switching to LangChain templates allows for easier model switching and partial formatting.
Token Management: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the OutputManager logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs.
Vector DB: The current system has PineconeWrapper for RAG. This should be integrated into the Planner or Coder node to fetch few-shot examples or context.

5. Next Actions

Initialize: Create the folder structure.
Define State: Create src/ea_chatbot/graph/state.py.
Implement Router: Create the first node to replicate Expert Selector logic.
Implement Executor: Port the exec() logic to a node.

6. Git Operations

Branches should be used for specific features or bug fixes.
New branches should be created from the main branch and conductor branch.
The conductor should always use the conductor branch and derived branches.
When a feature or fix is complete, use rebase to keep the commit history clean before merging.
The conductor related changes should never be merged into the main branch.

7.2 KiB Raw Blame History