feat(backend): Implement /api/v1 prefix and HttpOnly cookie-based auth

2026-02-11 21:57:29 -08:00
parent 7a69133e26
commit 49a9da7c0c
5 changed files with 318 additions and 16 deletions
--- a/backend/GEMINI.md
+++ b/backend/GEMINI.md
@@ -0,0 +1,162 @@
+# Election Analytics Chatbot - Backend Guide
+
+## Overview
+This document serves as a guide for the backend implementation of the Election Analytics Chatbot, specifically focusing on the transition from the "BambooAI" based system to a modern, stateful, and graph-based architecture using **LangGraph**.
+
+## 1. Migration Goals
+- **Framework Switch**: Move from the custom linear `ChatBot` class (in `src/ea_chatbot/bambooai/core/chatbot.py`) to `LangGraph`.
+- **State Management**: explicit state management using LangGraph's `StateGraph`.
+- **Modularity**: Break down monolithic methods (`pd_agent_converse`, `execute_code`) into distinct Nodes.
+- **Observability**: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).
+
+## 2. Architecture Proposal
+
+### 2.1. The Graph State
+The state will track the conversation and execution context.
+
+```python
+from typing import TypedDict, Annotated, List, Dict, Any, Optional
+from langchain_core.messages import BaseMessage
+import operator
+
+class AgentState(TypedDict):
+    # Conversation history
+    messages: Annotated[List[BaseMessage], operator.add]
+    
+    # Task context
+    question: str
+    
+    # Query Analysis (Decomposition results)
+    analysis: Optional[Dict[str, Any]] 
+    # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
+    
+    # Step-by-step reasoning
+    plan: Optional[str]
+    
+    # Code execution context
+    code: Optional[str]
+    code_output: Optional[str]
+    error: Optional[str]
+    
+    # Artifacts (for UI display)
+    plots: List[Figure] # Matplotlib figures
+    dfs: Dict[str, DataFrame] # Pandas DataFrames
+    
+    # Control flow
+    iterations: int
+    next_action: str # Routing hint: "clarify", "plan", "research", "end"
+```
+
+### 2.2. Nodes (The Actors)
+We will map existing logic to these nodes:
+
+1.  **`query_analyzer_node`** (Router & Refiner):
+    *   **Logic**: Replaces `Expert Selector` and `Analyst Selector`.
+    *   **Function**: 
+        1. Decomposes the user's query into key elements (Data, Unknowns, Conditions).
+        2. Determines if the query is ambiguous or missing critical information.
+    *   **Output**: Updates `messages`. Returns routing decision:
+        *   `clarification_node` (if ambiguous).
+        *   `planner_node` (if clear data task).
+        *   `researcher_node` (if general/web task).
+
+2.  **`clarification_node`** (Human-in-the-loop):
+    *   **Logic**: Replaces `Theorist-Clarification`.
+    *   **Function**: Formulates a specific question to ask the user for missing details.
+    *   **Output**: Returns a message to the user and **interrupts** the graph execution to await user input.
+
+3.  **`researcher_node`** (Theorist):
+    *   **Logic**: Handles general queries or web searches.
+    *   **Function**: Uses `GoogleSearch` tool if necessary.
+    *   **Output**: Final answer.
+
+4.  **`planner_node`**:
+    *   **Logic**: Replaces `Planner`.
+    *   **Function**: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
+    *   **Output**: Updates `plan`.
+
+5.  **`coder_node`**:
+    *   **Logic**: Replaces `Code Generator` & `Error Corrector`.
+    *   **Function**: Generates Python code. If `error` exists in state, it attempts to fix it.
+    *   **Output**: Updates `code`.
+
+6.  **`executor_node`**:
+    *   **Logic**: Replaces `Code Executor`.
+    *   **Function**: Executes the Python code in a safe(r) environment. It needs access to the `DBClient`.
+    *   **Output**: Updates `code_output`, `plots`, `dfs`. If exception, updates `error`.
+
+7.  **`summarizer_node`**:
+    *   **Logic**: Replaces `Solution Summarizer`.
+    *   **Function**: Interprets the code output and generates a natural language response.
+    *   **Output**: Final response message.
+
+### 2.3. The Workflow (Graph)
+
+```mermaid
+graph TD
+    Start --> QueryAnalyzer
+    QueryAnalyzer -->|Ambiguous| Clarification
+    Clarification -->|User Input| QueryAnalyzer
+    QueryAnalyzer -->|General/Web| Researcher
+    QueryAnalyzer -->|Data Analysis| Planner
+    Planner --> Coder
+    Coder --> Executor
+    Executor -->|Success| Summarizer
+    Executor -->|Error| Coder
+    Researcher --> End
+    Summarizer --> End
+```
+
+## 3. Implementation Steps
+
+### Step 1: Dependencies
+Add the following packages to `pyproject.toml`:
+*   `langgraph`
+*   `langchain`
+*   `langchain-openai`
+*   `langchain-google-genai`
+*   `langchain-community`
+
+### Step 2: Directory Structure
+Create a new package for the graph logic to keep it separate from the old one during migration.
+
+```
+src/ea_chatbot/
+├── graph/
+│   ├── __init__.py
+│   ├── state.py       # State definition
+│   ├── nodes/         # Individual node implementations
+│   │   ├── __init__.py
+│   │   ├── router.py
+│   │   ├── planner.py
+│   │   ├── coder.py
+│   │   ├── executor.py
+│   │   └── ...
+│   ├── workflow.py    # Graph construction
+│   └── tools/         # DB and Search tools wrapped for LangChain
+└── ...
+```
+
+### Step 3: Tool Wrapping
+Wrap the existing `DBClient` (from `src/ea_chatbot/bambooai/utils/db_client.py`) into a structure accessible by the `executor_node`. The `executor_node` will likely keep the existing `exec()` based approach initially for compatibility with the generated code, but structured as a graph node.
+
+### Step 4: Prompt Migration
+Port the prompts from `data/PROMPT_TEMPLATES.json` or `src/ea_chatbot/bambooai/prompts/strings.py` into the respective nodes. Use LangChain's `ChatPromptTemplate` for better management.
+
+### Step 5: Integration
+Update `src/ea_chatbot/app.py` to use the new `workflow.compile()` runnable.
+*   Instead of `chatbot.pd_agent_converse(...)`, use `app.stream(...)` (LangGraph app).
+*   Handle the streaming output to update the UI progressively.
+
+## 4. Key Considerations for Refactoring
+
+*   **Database Connection**: Ensure `DBClient` is initialized once and passed to the `Executor` node efficiently (e.g., via `configurable` parameters or closure).
+*   **Prompt Templating**: The current system uses simple `format` strings. Switching to LangChain templates allows for easier model switching and partial formatting.
+*   **Token Management**: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the `OutputManager` logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs.
+*   **Vector DB**: The current system has `PineconeWrapper` for RAG. This should be integrated into the `Planner` or `Coder` node to fetch few-shot examples or context.
+
+## 5. Next Actions
+1.  **Initialize**: Create the folder structure.
+2.  **Define State**: Create `src/ea_chatbot/graph/state.py`.
+3.  **Implement Router**: Create the first node to replicate `Expert Selector` logic.
+4.  **Implement Executor**: Port the `exec()` logic to a node.