From b4f79ee052a8b074a25bac3c0b47ca817c2d5a18 Mon Sep 17 00:00:00 2001 From: Yunxiao Xu Date: Fri, 20 Feb 2026 17:14:16 -0800 Subject: [PATCH] docs: update project documentation and verification strategies - Update GEMINI.md with verification steps and remove ignored docs reference - Update README.md to remove reference to local langchain-docs - Update backend/GEMINI.md with correct database schema (users table) and architecture details - Update frontend/GEMINI.md with latest project structure --- GEMINI.md | 15 +++- README.md | 77 ++++++++++++++++++ backend/GEMINI.md | 193 +++++++++++---------------------------------- frontend/GEMINI.md | 6 ++ 4 files changed, 144 insertions(+), 147 deletions(-) create mode 100644 README.md diff --git a/GEMINI.md b/GEMINI.md index 97e9f57..2e6e2d0 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -43,9 +43,22 @@ The frontend is a modern SPA (Single Page Application) designed for data-heavy i - **Real-time Visualization**: Supports streaming text responses and immediate rendering of base64-encoded or binary-retrieved analysis plots. ## Documentation +- **[README](./README.md)**: Main project documentation and setup guide. - **[Backend Guide](./backend/GEMINI.md)**: Detailed information about the backend architecture, migration goals, and implementation steps. - **[Frontend Guide](./frontend/GEMINI.md)**: Frontend development guide and technology stack. -- **LangChain Docs**: See the `langchain-docs/` folder for local LangChain and LangGraph documentation. + +## Verification Strategy +When making changes, always verify using the following commands: + +### Backend +- **Test**: `cd backend && uv run pytest` +- **Lint/Format**: `cd backend && uv run ruff check .` +- **Type Check**: `cd backend && uv run mypy .` (if configured) + +### Frontend +- **Test**: `cd frontend && npm run test` +- **Lint**: `cd frontend && npm run lint` +- **Build**: `cd frontend && npm run build` (to ensure no compilation errors) ## Git Operations - All new feature and bug-fix branches must be created from the `develop` branch except hot-fix. diff --git a/README.md b/README.md new file mode 100644 index 0000000..07bbe00 --- /dev/null +++ b/README.md @@ -0,0 +1,77 @@ +# Election Analytics Chatbot + +A stateful, graph-based chatbot for election data analysis, built with LangGraph, FastAPI, and React. + +## 🚀 Features + +- **Intelligent Query Analysis**: Automatically determines if a query needs data analysis, web research, or clarification. +- **Automated Data Analysis**: Generates and executes Python code to analyze election datasets and produce visualizations. +- **Web Research**: Integrates web search capabilities for general election-related questions. +- **Stateful Conversations**: Maintains context across multiple turns using LangGraph's persistent checkpointing. +- **Real-time Streaming**: Streams reasoning steps, code execution outputs, and plots to the UI. +- **Secure Authentication**: Traditional login and OIDC/SSO support with HttpOnly cookies. +- **History Management**: Persistent storage and management of chat history and generated artifacts. + +## 🏗️ Project Structure + +- `backend/`: Python FastAPI application using LangGraph. +- `frontend/`: React SPA built with TypeScript, Vite, and Tailwind CSS. + +## 🛠️ Prerequisites + +- Python 3.11+ +- Node.js 18+ +- PostgreSQL +- Docker (optional, for Postgres/PgAdmin) +- API Keys: OpenAI/Google Gemini, Google Search (if using research tools). + +## 📥 Getting Started + +### Backend Setup + +1. Navigate to the backend directory: + ```bash + cd backend + ``` +2. Install dependencies: + ```bash + uv sync + ``` +3. Set up environment variables: + ```bash + cp .env.example .env + # Edit .env with your configuration and API keys + ``` +4. Run database migrations: + ```bash + uv run alembic upgrade head + ``` +5. Start the server: + ```bash + uv run python -m ea_chatbot.api.main + ``` + +### Frontend Setup + +1. Navigate to the frontend directory: + ```bash + cd frontend + ``` +2. Install dependencies: + ```bash + npm install + ``` +3. Start the development server: + ```bash + npm run dev + ``` + +## 📖 Documentation + +- **[Top-level GEMINI.md](./GEMINI.md)**: General project overview. +- **[Backend Guide](./backend/GEMINI.md)**: Detailed backend architecture and implementation details. +- **[Frontend Guide](./frontend/GEMINI.md)**: Frontend development guide and technology stack. + +## 📜 License + +This project is licensed under the MIT License - see the LICENSE file for details. diff --git a/backend/GEMINI.md b/backend/GEMINI.md index 103a117..59e5640 100644 --- a/backend/GEMINI.md +++ b/backend/GEMINI.md @@ -1,162 +1,63 @@ # Election Analytics Chatbot - Backend Guide ## Overview -This document serves as a guide for the backend implementation of the Election Analytics Chatbot, specifically focusing on the transition from the "BambooAI" based system to a modern, stateful, and graph-based architecture using **LangGraph**. +The backend is a Python-based FastAPI application that leverages **LangGraph** to provide a stateful, agentic workflow for election data analysis. It handles complex queries by decomposing them into tasks such as data analysis, web research, or user clarification. -## 1. Migration Goals -- **Framework Switch**: Move from the custom linear `ChatBot` class (in `src/ea_chatbot/bambooai/core/chatbot.py`) to `LangGraph`. -- **State Management**: explicit state management using LangGraph's `StateGraph`. -- **Modularity**: Break down monolithic methods (`pd_agent_converse`, `execute_code`) into distinct Nodes. -- **Observability**: Easier debugging of the decision process (Routing -> Planning -> Coding -> Executing). +## 1. Architecture Overview +- **Framework**: LangGraph for workflow orchestration and state management. +- **API**: FastAPI for providing REST and streaming (SSE) endpoints. +- **State Management**: Persistent state using LangGraph's `StateGraph` with a PostgreSQL checkpointer. +- **Database**: PostgreSQL. + - Application data: Uses `users` table for local and OIDC users (String IDs). + - History: Persists chat history and artifacts. + - Election Data: Structured datasets for analysis. -## 2. Architecture Proposal +## 2. Core Components -### 2.1. The Graph State -The state will track the conversation and execution context. - -```python -from typing import TypedDict, Annotated, List, Dict, Any, Optional -from langchain_core.messages import BaseMessage -import operator - -class AgentState(TypedDict): - # Conversation history - messages: Annotated[List[BaseMessage], operator.add] - - # Task context - question: str - - # Query Analysis (Decomposition results) - analysis: Optional[Dict[str, Any]] - # Expected keys: "requires_dataset", "expert", "data", "unknown", "condition" - - # Step-by-step reasoning - plan: Optional[str] - - # Code execution context - code: Optional[str] - code_output: Optional[str] - error: Optional[str] - - # Artifacts (for UI display) - plots: List[Figure] # Matplotlib figures - dfs: Dict[str, DataFrame] # Pandas DataFrames - - # Control flow - iterations: int - next_action: str # Routing hint: "clarify", "plan", "research", "end" -``` +### 2.1. The Graph State (`src/ea_chatbot/graph/state.py`) +The state tracks the conversation context, plan, generated code, execution results, and artifacts. ### 2.2. Nodes (The Actors) -We will map existing logic to these nodes: +Located in `src/ea_chatbot/graph/nodes/`: -1. **`query_analyzer_node`** (Router & Refiner): - * **Logic**: Replaces `Expert Selector` and `Analyst Selector`. - * **Function**: - 1. Decomposes the user's query into key elements (Data, Unknowns, Conditions). - 2. Determines if the query is ambiguous or missing critical information. - * **Output**: Updates `messages`. Returns routing decision: - * `clarification_node` (if ambiguous). - * `planner_node` (if clear data task). - * `researcher_node` (if general/web task). - -2. **`clarification_node`** (Human-in-the-loop): - * **Logic**: Replaces `Theorist-Clarification`. - * **Function**: Formulates a specific question to ask the user for missing details. - * **Output**: Returns a message to the user and **interrupts** the graph execution to await user input. - -3. **`researcher_node`** (Theorist): - * **Logic**: Handles general queries or web searches. - * **Function**: Uses `GoogleSearch` tool if necessary. - * **Output**: Final answer. - -4. **`planner_node`**: - * **Logic**: Replaces `Planner`. - * **Function**: Generates a step-by-step plan based on the decomposed query elements and dataframe ontology. - * **Output**: Updates `plan`. - -5. **`coder_node`**: - * **Logic**: Replaces `Code Generator` & `Error Corrector`. - * **Function**: Generates Python code. If `error` exists in state, it attempts to fix it. - * **Output**: Updates `code`. - -6. **`executor_node`**: - * **Logic**: Replaces `Code Executor`. - * **Function**: Executes the Python code in a safe(r) environment. It needs access to the `DBClient`. - * **Output**: Updates `code_output`, `plots`, `dfs`. If exception, updates `error`. - -7. **`summarizer_node`**: - * **Logic**: Replaces `Solution Summarizer`. - * **Function**: Interprets the code output and generates a natural language response. - * **Output**: Final response message. +- **`query_analyzer`**: Analyzes the user query to determine the intent and required data. +- **`planner`**: Creates a step-by-step plan for data analysis. +- **`coder`**: Generates Python code based on the plan and dataset metadata. +- **`executor`**: Safely executes the generated code and captures outputs (dataframes, plots). +- **`error_corrector`**: Fixes code if execution fails. +- **`researcher`**: Performs web searches for general election information. +- **`summarizer`**: Generates a natural language response based on the analysis results. +- **`clarification`**: Asks the user for more information if the query is ambiguous. ### 2.3. The Workflow (Graph) +The graph connects these nodes with conditional edges, allowing for iterative refinement and error correction. -```mermaid -graph TD - Start --> QueryAnalyzer - QueryAnalyzer -->|Ambiguous| Clarification - Clarification -->|User Input| QueryAnalyzer - QueryAnalyzer -->|General/Web| Researcher - QueryAnalyzer -->|Data Analysis| Planner - Planner --> Coder - Coder --> Executor - Executor -->|Success| Summarizer - Executor -->|Error| Coder - Researcher --> End - Summarizer --> End +## 3. Key Modules + +- **`src/ea_chatbot/api/`**: Contains FastAPI routers for authentication, conversation management, and the agent streaming endpoint. +- **`src/ea_chatbot/graph/`**: Core LangGraph logic, including state definitions, node implementations, and the workflow graph. +- **`src/ea_chatbot/history/`**: Manages persistent chat history and message mapping between application models and LangGraph state. +- **`src/ea_chatbot/utils/`**: Utility functions for database inspection, LLM factory, and logging. + +## 4. Development & Execution + +### Entry Point +The main entry point for the API is `src/ea_chatbot/api/main.py`. + +### Running the API +```bash +cd backend +uv run python -m ea_chatbot.api.main ``` -## 3. Implementation Steps - -### Step 1: Dependencies -Add the following packages to `pyproject.toml`: -* `langgraph` -* `langchain` -* `langchain-openai` -* `langchain-google-genai` -* `langchain-community` - -### Step 2: Directory Structure -Create a new package for the graph logic to keep it separate from the old one during migration. - -``` -src/ea_chatbot/ -├── graph/ -│ ├── __init__.py -│ ├── state.py # State definition -│ ├── nodes/ # Individual node implementations -│ │ ├── __init__.py -│ │ ├── router.py -│ │ ├── planner.py -│ │ ├── coder.py -│ │ ├── executor.py -│ │ └── ... -│ ├── workflow.py # Graph construction -│ └── tools/ # DB and Search tools wrapped for LangChain -└── ... +### Database Migrations +Handled by Alembic. +```bash +uv run alembic upgrade head ``` -### Step 3: Tool Wrapping -Wrap the existing `DBClient` (from `src/ea_chatbot/bambooai/utils/db_client.py`) into a structure accessible by the `executor_node`. The `executor_node` will likely keep the existing `exec()` based approach initially for compatibility with the generated code, but structured as a graph node. - -### Step 4: Prompt Migration -Port the prompts from `data/PROMPT_TEMPLATES.json` or `src/ea_chatbot/bambooai/prompts/strings.py` into the respective nodes. Use LangChain's `ChatPromptTemplate` for better management. - -### Step 5: Integration -Update `src/ea_chatbot/app.py` to use the new `workflow.compile()` runnable. -* Instead of `chatbot.pd_agent_converse(...)`, use `app.stream(...)` (LangGraph app). -* Handle the streaming output to update the UI progressively. - -## 4. Key Considerations for Refactoring - -* **Database Connection**: Ensure `DBClient` is initialized once and passed to the `Executor` node efficiently (e.g., via `configurable` parameters or closure). -* **Prompt Templating**: The current system uses simple `format` strings. Switching to LangChain templates allows for easier model switching and partial formatting. -* **Token Management**: LangGraph provides built-in tracing (if LangSmith is enabled), but we should ensure the `OutputManager` logic (printing costs/tokens) is preserved or adapted if still needed for the CLI/Logs. -* **Vector DB**: The current system has `PineconeWrapper` for RAG. This should be integrated into the `Planner` or `Coder` node to fetch few-shot examples or context. - -## 5. Next Actions -1. **Initialize**: Create the folder structure. -2. **Define State**: Create `src/ea_chatbot/graph/state.py`. -3. **Implement Router**: Create the first node to replicate `Expert Selector` logic. -4. **Implement Executor**: Port the `exec()` logic to a node. +### Testing +Tests are located in the `tests/` directory and use `pytest`. +```bash +uv run pytest +``` diff --git a/frontend/GEMINI.md b/frontend/GEMINI.md index fd9d421..e0eff68 100644 --- a/frontend/GEMINI.md +++ b/frontend/GEMINI.md @@ -15,11 +15,13 @@ This document serves as a guide for the frontend implementation of the Election ## Project Structure - `src/components/`: - `auth/`: Login, Registration, and OIDC callback forms/pages. + - `chat/`: Core chat interface components, including message list and plot rendering. - `layout/`: Main application layout including the sidebar navigation. - `ui/`: Reusable primitive components (buttons, cards, inputs, etc.) via Shadcn. - `src/services/`: - `api.ts`: Axios instance configuration with `/api/v1` base URL and interceptors. - `auth.ts`: Authentication logic (Login, Logout, OIDC, User Profile). + - `chat.ts`: Service for interacting with the agent streaming endpoint. - `src/lib/`: - `validations/`: Zod schemas for form validation. - `utils.ts`: Core utility functions. @@ -42,3 +44,7 @@ The frontend communicates with the backend's `/api/v1` endpoints: - `npm run dev`: Start development server. - `npm run build`: Build for production. - `npm run test`: Run Vitest unit tests. + +## Documentation +- **[README](../README.md)**: Main project documentation and setup guide. +- **[Backend Guide](../backend/GEMINI.md)**: Backend implementation details.