Files
ea-chatbot-lg/GEMINI.md

7.2 KiB

Election Analytics Chatbot - Project Guide

Overview

This document serves as a guide for rewriting the current "BambooAI" based chatbot system into a modern, stateful, and graph-based architecture using LangGraph. The goal is to improve maintainability, observability, and flexibility of the agentic workflows.

1. Migration Goals

  • Framework Switch: Move from the custom linear ChatBot class (in src/ea_chatbot/bambooai/core/chatbot.py) to LangGraph.
  • State Management: explicit state management using LangGraph's StateGraph.
  • Modularity: Break down monolithic methods (pd_agent_converse, execute_code) into distinct Nodes.
  • Observability: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).

2. Architecture Proposal

2.1. The Graph State

The state will track the conversation and execution context.

from typing import TypedDict, Annotated, List, Dict, Any, Optional
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # Conversation history
    messages: Annotated[List[BaseMessage], operator.add]
    
    # Task context
    question: str
    
    # Query Analysis (Decomposition results)
    analysis: Optional[Dict[str, Any]] 
    # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
    
    # Step-by-step reasoning
    plan: Optional[str]
    
    # Code execution context
    code: Optional[str]
    code_output: Optional[str]
    error: Optional[str]
    
    # Artifacts (for UI display)
    plots: List[Figure] # Matplotlib figures
    dfs: Dict[str, DataFrame] # Pandas DataFrames
    
    # Control flow
    iterations: int
    next_action: str # Routing hint: "clarify", "plan", "research", "end"

2.2. Nodes (The Actors)

We will map existing logic to these nodes:

  1. query_analyzer_node (Router & Refiner):

    • Logic: Replaces Expert Selector and Analyst Selector.
    • Function:
      1. Decomposes the user's query into key elements (Data, Unknowns, Conditions).
      2. Determines if the query is ambiguous or missing critical information.
    • Output: Updates messages. Returns routing decision:
      • clarification_node (if ambiguous).
      • planner_node (if clear data task).
      • researcher_node (if general/web task).
  2. clarification_node (Human-in-the-loop):

    • Logic: Replaces Theorist-Clarification.
    • Function: Formulates a specific question to ask the user for missing details.
    • Output: Returns a message to the user and interrupts the graph execution to await user input.
  3. researcher_node (Theorist):

    • Logic: Handles general queries or web searches.
    • Function: Uses GoogleSearch tool if necessary.
    • Output: Final answer.
  4. planner_node:

    • Logic: Replaces Planner.
    • Function: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
    • Output: Updates plan.
  5. coder_node:

    • Logic: Replaces Code Generator & Error Corrector.
    • Function: Generates Python code. If error exists in state, it attempts to fix it.
    • Output: Updates code.
  6. executor_node:

    • Logic: Replaces Code Executor.
    • Function: Executes the Python code in a safe(r) environment. It needs access to the DBClient.
    • Output: Updates code_output, plots, dfs. If exception, updates error.
  7. summarizer_node:

    • Logic: Replaces Solution Summarizer.
    • Function: Interprets the code output and generates a natural language response.
    • Output: Final response message.

2.3. The Workflow (Graph)

graph TD
    Start --> QueryAnalyzer
    QueryAnalyzer -->|Ambiguous| Clarification
    Clarification -->|User Input| QueryAnalyzer
    QueryAnalyzer -->|General/Web| Researcher
    QueryAnalyzer -->|Data Analysis| Planner
    Planner --> Coder
    Coder --> Executor
    Executor -->|Success| Summarizer
    Executor -->|Error| Coder
    Researcher --> End
    Summarizer --> End

3. Implementation Steps

Step 1: Dependencies

Add the following packages to pyproject.toml:

  • langgraph
  • langchain
  • langchain-openai
  • langchain-google-genai
  • langchain-community

Step 2: Directory Structure

Create a new package for the graph logic to keep it separate from the old one during migration.

src/ea_chatbot/
├── graph/
│   ├── __init__.py
│   ├── state.py       # State definition
│   ├── nodes/         # Individual node implementations
│   │   ├── __init__.py
│   │   ├── router.py
│   │   ├── planner.py
│   │   ├── coder.py
│   │   ├── executor.py
│   │   └── ...
│   ├── workflow.py    # Graph construction
│   └── tools/         # DB and Search tools wrapped for LangChain
└── ...

Step 3: Tool Wrapping

Wrap the existing DBClient (from src/ea_chatbot/bambooai/utils/db_client.py) into a structure accessible by the executor_node. The executor_node will likely keep the existing exec() based approach initially for compatibility with the generated code, but structured as a graph node.

Step 4: Prompt Migration

Port the prompts from data/PROMPT_TEMPLATES.json or src/ea_chatbot/bambooai/prompts/strings.py into the respective nodes. Use LangChain's ChatPromptTemplate for better management.

Step 5: Streamlit Integration

Update src/ea_chatbot/app.py to use the new workflow.compile() runnable.

  • Instead of chatbot.pd_agent_converse(...), use app.stream(...) (LangGraph app).
  • Handle the streaming output to update the UI progressively.

4. Key Considerations for Refactoring

  • Database Connection: Ensure DBClient is initialized once and passed to the Executor node efficiently (e.g., via configurable parameters or closure).
  • Prompt Templating: The current system uses simple format strings. Switching to LangChain templates allows for easier model switching and partial formatting.
  • Token Management: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the OutputManager logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs.
  • Vector DB: The current system has PineconeWrapper for RAG. This should be integrated into the Planner or Coder node to fetch few-shot examples or context.

5. Next Actions

  1. Initialize: Create the folder structure.
  2. Define State: Create src/ea_chatbot/graph/state.py.
  3. Implement Router: Create the first node to replicate Expert Selector logic.
  4. Implement Executor: Port the exec() logic to a node.

6. Git Operations

  • Branches should be used for specific features or bug fixes.
  • New branches should be created from the main branch and conductor branch.
  • The conductor should always use the conductor branch and derived branches.
  • When a feature or fix is complete, use rebase to keep the commit history clean before merging.
  • The conductor related changes should never be merged into the main branch.