7.2 KiB
Election Analytics Chatbot - Project Guide
Overview
This document serves as a guide for rewriting the current "BambooAI" based chatbot system into a modern, stateful, and graph-based architecture using LangGraph. The goal is to improve maintainability, observability, and flexibility of the agentic workflows.
1. Migration Goals
- Framework Switch: Move from the custom linear
ChatBotclass (insrc/ea_chatbot/bambooai/core/chatbot.py) toLangGraph. - State Management: explicit state management using LangGraph's
StateGraph. - Modularity: Break down monolithic methods (
pd_agent_converse,execute_code) into distinct Nodes. - Observability: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).
2. Architecture Proposal
2.1. The Graph State
The state will track the conversation and execution context.
from typing import TypedDict, Annotated, List, Dict, Any, Optional
from langchain_core.messages import BaseMessage
import operator
class AgentState(TypedDict):
# Conversation history
messages: Annotated[List[BaseMessage], operator.add]
# Task context
question: str
# Query Analysis (Decomposition results)
analysis: Optional[Dict[str, Any]]
# Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
# Step-by-step reasoning
plan: Optional[str]
# Code execution context
code: Optional[str]
code_output: Optional[str]
error: Optional[str]
# Artifacts (for UI display)
plots: List[Figure] # Matplotlib figures
dfs: Dict[str, DataFrame] # Pandas DataFrames
# Control flow
iterations: int
next_action: str # Routing hint: "clarify", "plan", "research", "end"
2.2. Nodes (The Actors)
We will map existing logic to these nodes:
-
query_analyzer_node(Router & Refiner):- Logic: Replaces
Expert SelectorandAnalyst Selector. - Function:
- Decomposes the user's query into key elements (Data, Unknowns, Conditions).
- Determines if the query is ambiguous or missing critical information.
- Output: Updates
messages. Returns routing decision:clarification_node(if ambiguous).planner_node(if clear data task).researcher_node(if general/web task).
- Logic: Replaces
-
clarification_node(Human-in-the-loop):- Logic: Replaces
Theorist-Clarification. - Function: Formulates a specific question to ask the user for missing details.
- Output: Returns a message to the user and interrupts the graph execution to await user input.
- Logic: Replaces
-
researcher_node(Theorist):- Logic: Handles general queries or web searches.
- Function: Uses
GoogleSearchtool if necessary. - Output: Final answer.
-
planner_node:- Logic: Replaces
Planner. - Function: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
- Output: Updates
plan.
- Logic: Replaces
-
coder_node:- Logic: Replaces
Code Generator&Error Corrector. - Function: Generates Python code. If
errorexists in state, it attempts to fix it. - Output: Updates
code.
- Logic: Replaces
-
executor_node:- Logic: Replaces
Code Executor. - Function: Executes the Python code in a safe(r) environment. It needs access to the
DBClient. - Output: Updates
code_output,plots,dfs. If exception, updateserror.
- Logic: Replaces
-
summarizer_node:- Logic: Replaces
Solution Summarizer. - Function: Interprets the code output and generates a natural language response.
- Output: Final response message.
- Logic: Replaces
2.3. The Workflow (Graph)
graph TD
Start --> QueryAnalyzer
QueryAnalyzer -->|Ambiguous| Clarification
Clarification -->|User Input| QueryAnalyzer
QueryAnalyzer -->|General/Web| Researcher
QueryAnalyzer -->|Data Analysis| Planner
Planner --> Coder
Coder --> Executor
Executor -->|Success| Summarizer
Executor -->|Error| Coder
Researcher --> End
Summarizer --> End
3. Implementation Steps
Step 1: Dependencies
Add the following packages to pyproject.toml:
langgraphlangchainlangchain-openailangchain-google-genailangchain-community
Step 2: Directory Structure
Create a new package for the graph logic to keep it separate from the old one during migration.
src/ea_chatbot/
├── graph/
│ ├── __init__.py
│ ├── state.py # State definition
│ ├── nodes/ # Individual node implementations
│ │ ├── __init__.py
│ │ ├── router.py
│ │ ├── planner.py
│ │ ├── coder.py
│ │ ├── executor.py
│ │ └── ...
│ ├── workflow.py # Graph construction
│ └── tools/ # DB and Search tools wrapped for LangChain
└── ...
Step 3: Tool Wrapping
Wrap the existing DBClient (from src/ea_chatbot/bambooai/utils/db_client.py) into a structure accessible by the executor_node. The executor_node will likely keep the existing exec() based approach initially for compatibility with the generated code, but structured as a graph node.
Step 4: Prompt Migration
Port the prompts from data/PROMPT_TEMPLATES.json or src/ea_chatbot/bambooai/prompts/strings.py into the respective nodes. Use LangChain's ChatPromptTemplate for better management.
Step 5: Streamlit Integration
Update src/ea_chatbot/app.py to use the new workflow.compile() runnable.
- Instead of
chatbot.pd_agent_converse(...), useapp.stream(...)(LangGraph app). - Handle the streaming output to update the UI progressively.
4. Key Considerations for Refactoring
- Database Connection: Ensure
DBClientis initialized once and passed to theExecutornode efficiently (e.g., viaconfigurableparameters or closure). - Prompt Templating: The current system uses simple
formatstrings. Switching to LangChain templates allows for easier model switching and partial formatting. - Token Management: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the
OutputManagerlogic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs. - Vector DB: The current system has
PineconeWrapperfor RAG. This should be integrated into thePlannerorCodernode to fetch few-shot examples or context.
5. Next Actions
- Initialize: Create the folder structure.
- Define State: Create
src/ea_chatbot/graph/state.py. - Implement Router: Create the first node to replicate
Expert Selectorlogic. - Implement Executor: Port the
exec()logic to a node.
6. Git Operations
- Branches should be used for specific features or bug fixes.
- New branches should be created from the
mainbranch andconductorbranch. - The conductor should always use the
conductorbranch and derived branches. - When a feature or fix is complete, use rebase to keep the commit history clean before merging.
- The conductor related changes should never be merged into the
mainbranch.