docs: update project documentation and verification strategies
- Update GEMINI.md with verification steps and remove ignored docs reference - Update README.md to remove reference to local langchain-docs - Update backend/GEMINI.md with correct database schema (users table) and architecture details - Update frontend/GEMINI.md with latest project structure
This commit is contained in:
15
GEMINI.md
15
GEMINI.md
@@ -43,9 +43,22 @@ The frontend is a modern SPA (Single Page Application) designed for data-heavy i
|
|||||||
- **Real-time Visualization**: Supports streaming text responses and immediate rendering of base64-encoded or binary-retrieved analysis plots.
|
- **Real-time Visualization**: Supports streaming text responses and immediate rendering of base64-encoded or binary-retrieved analysis plots.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
- **[README](./README.md)**: Main project documentation and setup guide.
|
||||||
- **[Backend Guide](./backend/GEMINI.md)**: Detailed information about the backend architecture, migration goals, and implementation steps.
|
- **[Backend Guide](./backend/GEMINI.md)**: Detailed information about the backend architecture, migration goals, and implementation steps.
|
||||||
- **[Frontend Guide](./frontend/GEMINI.md)**: Frontend development guide and technology stack.
|
- **[Frontend Guide](./frontend/GEMINI.md)**: Frontend development guide and technology stack.
|
||||||
- **LangChain Docs**: See the `langchain-docs/` folder for local LangChain and LangGraph documentation.
|
|
||||||
|
## Verification Strategy
|
||||||
|
When making changes, always verify using the following commands:
|
||||||
|
|
||||||
|
### Backend
|
||||||
|
- **Test**: `cd backend && uv run pytest`
|
||||||
|
- **Lint/Format**: `cd backend && uv run ruff check .`
|
||||||
|
- **Type Check**: `cd backend && uv run mypy .` (if configured)
|
||||||
|
|
||||||
|
### Frontend
|
||||||
|
- **Test**: `cd frontend && npm run test`
|
||||||
|
- **Lint**: `cd frontend && npm run lint`
|
||||||
|
- **Build**: `cd frontend && npm run build` (to ensure no compilation errors)
|
||||||
|
|
||||||
## Git Operations
|
## Git Operations
|
||||||
- All new feature and bug-fix branches must be created from the `develop` branch except hot-fix.
|
- All new feature and bug-fix branches must be created from the `develop` branch except hot-fix.
|
||||||
|
|||||||
77
README.md
Normal file
77
README.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
# Election Analytics Chatbot
|
||||||
|
|
||||||
|
A stateful, graph-based chatbot for election data analysis, built with LangGraph, FastAPI, and React.
|
||||||
|
|
||||||
|
## 🚀 Features
|
||||||
|
|
||||||
|
- **Intelligent Query Analysis**: Automatically determines if a query needs data analysis, web research, or clarification.
|
||||||
|
- **Automated Data Analysis**: Generates and executes Python code to analyze election datasets and produce visualizations.
|
||||||
|
- **Web Research**: Integrates web search capabilities for general election-related questions.
|
||||||
|
- **Stateful Conversations**: Maintains context across multiple turns using LangGraph's persistent checkpointing.
|
||||||
|
- **Real-time Streaming**: Streams reasoning steps, code execution outputs, and plots to the UI.
|
||||||
|
- **Secure Authentication**: Traditional login and OIDC/SSO support with HttpOnly cookies.
|
||||||
|
- **History Management**: Persistent storage and management of chat history and generated artifacts.
|
||||||
|
|
||||||
|
## 🏗️ Project Structure
|
||||||
|
|
||||||
|
- `backend/`: Python FastAPI application using LangGraph.
|
||||||
|
- `frontend/`: React SPA built with TypeScript, Vite, and Tailwind CSS.
|
||||||
|
|
||||||
|
## 🛠️ Prerequisites
|
||||||
|
|
||||||
|
- Python 3.11+
|
||||||
|
- Node.js 18+
|
||||||
|
- PostgreSQL
|
||||||
|
- Docker (optional, for Postgres/PgAdmin)
|
||||||
|
- API Keys: OpenAI/Google Gemini, Google Search (if using research tools).
|
||||||
|
|
||||||
|
## 📥 Getting Started
|
||||||
|
|
||||||
|
### Backend Setup
|
||||||
|
|
||||||
|
1. Navigate to the backend directory:
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
```
|
||||||
|
2. Install dependencies:
|
||||||
|
```bash
|
||||||
|
uv sync
|
||||||
|
```
|
||||||
|
3. Set up environment variables:
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your configuration and API keys
|
||||||
|
```
|
||||||
|
4. Run database migrations:
|
||||||
|
```bash
|
||||||
|
uv run alembic upgrade head
|
||||||
|
```
|
||||||
|
5. Start the server:
|
||||||
|
```bash
|
||||||
|
uv run python -m ea_chatbot.api.main
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Setup
|
||||||
|
|
||||||
|
1. Navigate to the frontend directory:
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
```
|
||||||
|
2. Install dependencies:
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
3. Start the development server:
|
||||||
|
```bash
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📖 Documentation
|
||||||
|
|
||||||
|
- **[Top-level GEMINI.md](./GEMINI.md)**: General project overview.
|
||||||
|
- **[Backend Guide](./backend/GEMINI.md)**: Detailed backend architecture and implementation details.
|
||||||
|
- **[Frontend Guide](./frontend/GEMINI.md)**: Frontend development guide and technology stack.
|
||||||
|
|
||||||
|
## 📜 License
|
||||||
|
|
||||||
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||||
@@ -1,162 +1,63 @@
|
|||||||
# Election Analytics Chatbot - Backend Guide
|
# Election Analytics Chatbot - Backend Guide
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
This document serves as a guide for the backend implementation of the Election Analytics Chatbot, specifically focusing on the transition from the "BambooAI" based system to a modern, stateful, and graph-based architecture using **LangGraph**.
|
The backend is a Python-based FastAPI application that leverages **LangGraph** to provide a stateful, agentic workflow for election data analysis. It handles complex queries by decomposing them into tasks such as data analysis, web research, or user clarification.
|
||||||
|
|
||||||
## 1. Migration Goals
|
## 1. Architecture Overview
|
||||||
- **Framework Switch**: Move from the custom linear `ChatBot` class (in `src/ea_chatbot/bambooai/core/chatbot.py`) to `LangGraph`.
|
- **Framework**: LangGraph for workflow orchestration and state management.
|
||||||
- **State Management**: explicit state management using LangGraph's `StateGraph`.
|
- **API**: FastAPI for providing REST and streaming (SSE) endpoints.
|
||||||
- **Modularity**: Break down monolithic methods (`pd_agent_converse`, `execute_code`) into distinct Nodes.
|
- **State Management**: Persistent state using LangGraph's `StateGraph` with a PostgreSQL checkpointer.
|
||||||
- **Observability**: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing).
|
- **Database**: PostgreSQL.
|
||||||
|
- Application data: Uses `users` table for local and OIDC users (String IDs).
|
||||||
|
- History: Persists chat history and artifacts.
|
||||||
|
- Election Data: Structured datasets for analysis.
|
||||||
|
|
||||||
## 2. Architecture Proposal
|
## 2. Core Components
|
||||||
|
|
||||||
### 2.1. The Graph State
|
### 2.1. The Graph State (`src/ea_chatbot/graph/state.py`)
|
||||||
The state will track the conversation and execution context.
|
The state tracks the conversation context, plan, generated code, execution results, and artifacts.
|
||||||
|
|
||||||
```python
|
|
||||||
from typing import TypedDict, Annotated, List, Dict, Any, Optional
|
|
||||||
from langchain_core.messages import BaseMessage
|
|
||||||
import operator
|
|
||||||
|
|
||||||
class AgentState(TypedDict):
|
|
||||||
# Conversation history
|
|
||||||
messages: Annotated[List[BaseMessage], operator.add]
|
|
||||||
|
|
||||||
# Task context
|
|
||||||
question: str
|
|
||||||
|
|
||||||
# Query Analysis (Decomposition results)
|
|
||||||
analysis: Optional[Dict[str, Any]]
|
|
||||||
# Expected keys: "requires_dataset", "expert", "data", "unknown", "condition"
|
|
||||||
|
|
||||||
# Step-by-step reasoning
|
|
||||||
plan: Optional[str]
|
|
||||||
|
|
||||||
# Code execution context
|
|
||||||
code: Optional[str]
|
|
||||||
code_output: Optional[str]
|
|
||||||
error: Optional[str]
|
|
||||||
|
|
||||||
# Artifacts (for UI display)
|
|
||||||
plots: List[Figure] # Matplotlib figures
|
|
||||||
dfs: Dict[str, DataFrame] # Pandas DataFrames
|
|
||||||
|
|
||||||
# Control flow
|
|
||||||
iterations: int
|
|
||||||
next_action: str # Routing hint: "clarify", "plan", "research", "end"
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2.2. Nodes (The Actors)
|
### 2.2. Nodes (The Actors)
|
||||||
We will map existing logic to these nodes:
|
Located in `src/ea_chatbot/graph/nodes/`:
|
||||||
|
|
||||||
1. **`query_analyzer_node`** (Router & Refiner):
|
- **`query_analyzer`**: Analyzes the user query to determine the intent and required data.
|
||||||
* **Logic**: Replaces `Expert Selector` and `Analyst Selector`.
|
- **`planner`**: Creates a step-by-step plan for data analysis.
|
||||||
* **Function**:
|
- **`coder`**: Generates Python code based on the plan and dataset metadata.
|
||||||
1. Decomposes the user's query into key elements (Data, Unknowns, Conditions).
|
- **`executor`**: Safely executes the generated code and captures outputs (dataframes, plots).
|
||||||
2. Determines if the query is ambiguous or missing critical information.
|
- **`error_corrector`**: Fixes code if execution fails.
|
||||||
* **Output**: Updates `messages`. Returns routing decision:
|
- **`researcher`**: Performs web searches for general election information.
|
||||||
* `clarification_node` (if ambiguous).
|
- **`summarizer`**: Generates a natural language response based on the analysis results.
|
||||||
* `planner_node` (if clear data task).
|
- **`clarification`**: Asks the user for more information if the query is ambiguous.
|
||||||
* `researcher_node` (if general/web task).
|
|
||||||
|
|
||||||
2. **`clarification_node`** (Human-in-the-loop):
|
|
||||||
* **Logic**: Replaces `Theorist-Clarification`.
|
|
||||||
* **Function**: Formulates a specific question to ask the user for missing details.
|
|
||||||
* **Output**: Returns a message to the user and **interrupts** the graph execution to await user input.
|
|
||||||
|
|
||||||
3. **`researcher_node`** (Theorist):
|
|
||||||
* **Logic**: Handles general queries or web searches.
|
|
||||||
* **Function**: Uses `GoogleSearch` tool if necessary.
|
|
||||||
* **Output**: Final answer.
|
|
||||||
|
|
||||||
4. **`planner_node`**:
|
|
||||||
* **Logic**: Replaces `Planner`.
|
|
||||||
* **Function**: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology.
|
|
||||||
* **Output**: Updates `plan`.
|
|
||||||
|
|
||||||
5. **`coder_node`**:
|
|
||||||
* **Logic**: Replaces `Code Generator` & `Error Corrector`.
|
|
||||||
* **Function**: Generates Python code. If `error` exists in state, it attempts to fix it.
|
|
||||||
* **Output**: Updates `code`.
|
|
||||||
|
|
||||||
6. **`executor_node`**:
|
|
||||||
* **Logic**: Replaces `Code Executor`.
|
|
||||||
* **Function**: Executes the Python code in a safe(r) environment. It needs access to the `DBClient`.
|
|
||||||
* **Output**: Updates `code_output`, `plots`, `dfs`. If exception, updates `error`.
|
|
||||||
|
|
||||||
7. **`summarizer_node`**:
|
|
||||||
* **Logic**: Replaces `Solution Summarizer`.
|
|
||||||
* **Function**: Interprets the code output and generates a natural language response.
|
|
||||||
* **Output**: Final response message.
|
|
||||||
|
|
||||||
### 2.3. The Workflow (Graph)
|
### 2.3. The Workflow (Graph)
|
||||||
|
The graph connects these nodes with conditional edges, allowing for iterative refinement and error correction.
|
||||||
|
|
||||||
```mermaid
|
## 3. Key Modules
|
||||||
graph TD
|
|
||||||
Start --> QueryAnalyzer
|
- **`src/ea_chatbot/api/`**: Contains FastAPI routers for authentication, conversation management, and the agent streaming endpoint.
|
||||||
QueryAnalyzer -->|Ambiguous| Clarification
|
- **`src/ea_chatbot/graph/`**: Core LangGraph logic, including state definitions, node implementations, and the workflow graph.
|
||||||
Clarification -->|User Input| QueryAnalyzer
|
- **`src/ea_chatbot/history/`**: Manages persistent chat history and message mapping between application models and LangGraph state.
|
||||||
QueryAnalyzer -->|General/Web| Researcher
|
- **`src/ea_chatbot/utils/`**: Utility functions for database inspection, LLM factory, and logging.
|
||||||
QueryAnalyzer -->|Data Analysis| Planner
|
|
||||||
Planner --> Coder
|
## 4. Development & Execution
|
||||||
Coder --> Executor
|
|
||||||
Executor -->|Success| Summarizer
|
### Entry Point
|
||||||
Executor -->|Error| Coder
|
The main entry point for the API is `src/ea_chatbot/api/main.py`.
|
||||||
Researcher --> End
|
|
||||||
Summarizer --> End
|
### Running the API
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
uv run python -m ea_chatbot.api.main
|
||||||
```
|
```
|
||||||
|
|
||||||
## 3. Implementation Steps
|
### Database Migrations
|
||||||
|
Handled by Alembic.
|
||||||
### Step 1: Dependencies
|
```bash
|
||||||
Add the following packages to `pyproject.toml`:
|
uv run alembic upgrade head
|
||||||
* `langgraph`
|
|
||||||
* `langchain`
|
|
||||||
* `langchain-openai`
|
|
||||||
* `langchain-google-genai`
|
|
||||||
* `langchain-community`
|
|
||||||
|
|
||||||
### Step 2: Directory Structure
|
|
||||||
Create a new package for the graph logic to keep it separate from the old one during migration.
|
|
||||||
|
|
||||||
```
|
|
||||||
src/ea_chatbot/
|
|
||||||
├── graph/
|
|
||||||
│ ├── __init__.py
|
|
||||||
│ ├── state.py # State definition
|
|
||||||
│ ├── nodes/ # Individual node implementations
|
|
||||||
│ │ ├── __init__.py
|
|
||||||
│ │ ├── router.py
|
|
||||||
│ │ ├── planner.py
|
|
||||||
│ │ ├── coder.py
|
|
||||||
│ │ ├── executor.py
|
|
||||||
│ │ └── ...
|
|
||||||
│ ├── workflow.py # Graph construction
|
|
||||||
│ └── tools/ # DB and Search tools wrapped for LangChain
|
|
||||||
└── ...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 3: Tool Wrapping
|
### Testing
|
||||||
Wrap the existing `DBClient` (from `src/ea_chatbot/bambooai/utils/db_client.py`) into a structure accessible by the `executor_node`. The `executor_node` will likely keep the existing `exec()` based approach initially for compatibility with the generated code, but structured as a graph node.
|
Tests are located in the `tests/` directory and use `pytest`.
|
||||||
|
```bash
|
||||||
### Step 4: Prompt Migration
|
uv run pytest
|
||||||
Port the prompts from `data/PROMPT_TEMPLATES.json` or `src/ea_chatbot/bambooai/prompts/strings.py` into the respective nodes. Use LangChain's `ChatPromptTemplate` for better management.
|
```
|
||||||
|
|
||||||
### Step 5: Integration
|
|
||||||
Update `src/ea_chatbot/app.py` to use the new `workflow.compile()` runnable.
|
|
||||||
* Instead of `chatbot.pd_agent_converse(...)`, use `app.stream(...)` (LangGraph app).
|
|
||||||
* Handle the streaming output to update the UI progressively.
|
|
||||||
|
|
||||||
## 4. Key Considerations for Refactoring
|
|
||||||
|
|
||||||
* **Database Connection**: Ensure `DBClient` is initialized once and passed to the `Executor` node efficiently (e.g., via `configurable` parameters or closure).
|
|
||||||
* **Prompt Templating**: The current system uses simple `format` strings. Switching to LangChain templates allows for easier model switching and partial formatting.
|
|
||||||
* **Token Management**: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the `OutputManager` logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs.
|
|
||||||
* **Vector DB**: The current system has `PineconeWrapper` for RAG. This should be integrated into the `Planner` or `Coder` node to fetch few-shot examples or context.
|
|
||||||
|
|
||||||
## 5. Next Actions
|
|
||||||
1. **Initialize**: Create the folder structure.
|
|
||||||
2. **Define State**: Create `src/ea_chatbot/graph/state.py`.
|
|
||||||
3. **Implement Router**: Create the first node to replicate `Expert Selector` logic.
|
|
||||||
4. **Implement Executor**: Port the `exec()` logic to a node.
|
|
||||||
|
|||||||
@@ -15,11 +15,13 @@ This document serves as a guide for the frontend implementation of the Election
|
|||||||
## Project Structure
|
## Project Structure
|
||||||
- `src/components/`:
|
- `src/components/`:
|
||||||
- `auth/`: Login, Registration, and OIDC callback forms/pages.
|
- `auth/`: Login, Registration, and OIDC callback forms/pages.
|
||||||
|
- `chat/`: Core chat interface components, including message list and plot rendering.
|
||||||
- `layout/`: Main application layout including the sidebar navigation.
|
- `layout/`: Main application layout including the sidebar navigation.
|
||||||
- `ui/`: Reusable primitive components (buttons, cards, inputs, etc.) via Shadcn.
|
- `ui/`: Reusable primitive components (buttons, cards, inputs, etc.) via Shadcn.
|
||||||
- `src/services/`:
|
- `src/services/`:
|
||||||
- `api.ts`: Axios instance configuration with `/api/v1` base URL and interceptors.
|
- `api.ts`: Axios instance configuration with `/api/v1` base URL and interceptors.
|
||||||
- `auth.ts`: Authentication logic (Login, Logout, OIDC, User Profile).
|
- `auth.ts`: Authentication logic (Login, Logout, OIDC, User Profile).
|
||||||
|
- `chat.ts`: Service for interacting with the agent streaming endpoint.
|
||||||
- `src/lib/`:
|
- `src/lib/`:
|
||||||
- `validations/`: Zod schemas for form validation.
|
- `validations/`: Zod schemas for form validation.
|
||||||
- `utils.ts`: Core utility functions.
|
- `utils.ts`: Core utility functions.
|
||||||
@@ -42,3 +44,7 @@ The frontend communicates with the backend's `/api/v1` endpoints:
|
|||||||
- `npm run dev`: Start development server.
|
- `npm run dev`: Start development server.
|
||||||
- `npm run build`: Build for production.
|
- `npm run build`: Build for production.
|
||||||
- `npm run test`: Run Vitest unit tests.
|
- `npm run test`: Run Vitest unit tests.
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
- **[README](../README.md)**: Main project documentation and setup guide.
|
||||||
|
- **[Backend Guide](../backend/GEMINI.md)**: Backend implementation details.
|
||||||
|
|||||||
Reference in New Issue
Block a user