An intelligent Retrieval-Augmented Generation system that autonomously routes user queries across websites, PDFs, YouTube video transcripts, and documents — powered by OpenAI Agents SDK and Gemini 2.0 Flash.
This project presents a complete Agentic AI Retrieval-Augmented Generation (RAG) System built to answer user questions intelligently from multiple knowledge sources. Unlike traditional single-source RAG systems, this project integrates four distinct data types into one unified intelligent interface.
Modern businesses store knowledge across multiple formats — internal policy documents, product documentation, training videos, and reports. Traditional chatbots and Q&A systems fail because they are single-source and cannot route intelligently between knowledge bases.
Traditional systems are trained or indexed on only one type of data — making cross-format queries impossible.
Systems cannot autonomously decide which knowledge base to query based on the user's question intent.
Standard keyword-based retrieval misses contextually relevant results — semantic understanding is required.
Hours of training videos and recorded content remain unsearchable — critical knowledge is trapped in audio form.
The Challenge: Build a unified intelligent system that reads from all these sources, stores them semantically, and answers any question from the correct source — automatically.
The Agentic AI RAG System unifies all knowledge into one conversational interface. An agentic decision layer powered by the OpenAI Agents SDK autonomously selects the correct knowledge base per query and uses Gemini 2.0 Flash to generate accurate, context-aware responses.
The system is built in two phases: a one-time setup phase for data ingestion and embedding, and a real-time query phase where the agent routes and retrieves.
┌─────────────────────────────────────────────────────────────┐
│ USER QUERY (Streamlit UI) │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AGENTIC LAYER (OpenAI Agents SDK) │
│ │
│ ┌─────────────────┐ ┌──────────────────────────┐ │
│ │ answer_query │ │ answer_from_video │ │
│ │ (Tool 1) │ │ (Tool 2) │ │
│ └────────┬────────┘ └────────────┬─────────────┘ │
└────────────┼─────────────────────────────┼────────────────-┘
│ │
▼ ▼
┌────────────────────┐ ┌───────────────────────────┐
│ FAISS Index │ │ ChromaDB Vector Store │
│ (Website Docs + │ │ (Video Transcript │
│ PDF/Documents) │ │ Embeddings) │
└────────────────────┘ └───────────────────────────┘
│ │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Gemini 2.0 Flash LLM │
│ (via OpenAI-compatible API)│
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ FINAL ANSWER │
│ (Streamed to Streamlit) │
└─────────────────────────────┘
Each knowledge source has a dedicated ingestion pipeline that processes raw content into searchable vector embeddings.
BeautifulSoup4 recursively scrapes all pages of the OpenAI Agent SDK documentation. Extracts text from p, li, code, pre, h1–h3 tags, follows all internal links, and saves scraped pages as scraped_pages.pkl.
Text is chunked into 500-character segments with 50-character overlap to prevent context loss at boundaries. Embedded using all-MiniLM-L6-v2 (HuggingFace) and stored in a FAISS IndexFlatL2 for exact L2-distance retrieval.
YouTube videos are downloaded with yt-dlp, then transcribed locally using faster-whisper (Whisper base, int8 quantized — CPU friendly). Transcript chunks are embedded and stored separately in ChromaDB for clean domain separation.
Business PDFs (HR policies, financial reports, internal SOPs) are processed via PyPDF2 or LangChain's PyPDFLoader. Extracted text follows the same 500/50-char chunking strategy and is embedded into the FAISS index alongside website data — making all document types queryable through the same agent interface.
The core of the system is an agent built with the OpenAI Agents SDK that reasons about which tool to call based on query intent — rather than using a simple retrieval loop.
answer_query(query) performs semantic search across the FAISS index of website documentation and PDF content. Retrieved chunks are passed as context to Gemini Flash for answer generation.
answer_from_video(query) embeds the query, performs semantic search in ChromaDB, retrieves top-5 transcript chunks, and generates a response using Gemini Flash — keeping video knowledge fully accessible.
The agent reads the user's question and selects the right tool based on intent — documentation questions go to FAISS, video questions go to ChromaDB. For cross-source questions, the agent calls both tools and Gemini synthesizes a unified answer.
The system handles a wide range of real-world queries by autonomously routing to the right knowledge source.
@function_tool decorator with examples| Layer | Technology | Purpose |
|---|---|---|
| Web Scraping | BeautifulSoup4, Requests | Extract text from documentation website |
| Video Download | yt-dlp | Download YouTube videos as MP4 |
| Audio Transcription | faster-whisper (base, int8) | Convert video speech to text on CPU |
| PDF Processing | PyPDF2, LangChain | Extract text from PDF documents |
| Text Chunking | Custom Python (500 chars, 50 overlap) | Split text for vector retrieval |
| Embedding Model | all-MiniLM-L6-v2 (HuggingFace) | Generate 384-dim semantic embeddings |
| Vector DB 1 | FAISS (IndexFlatL2) | Store & search website & document embeddings |
| Vector DB 2 | ChromaDB (PersistentClient) | Store & search video transcript embeddings |
| Agent Framework | OpenAI Agents SDK | Intelligent tool routing and orchestration |
| LLM | Gemini 2.0 Flash (OpenAI-compatible API) | Natural language response generation |
| Frontend | Streamlit | Interactive web UI with streaming responses |
| Environment | python-dotenv, uv | Secure config and dependency management |
answer_from_video for video-related questions." This dramatically improved routing accuracy.
Runner.run_streamed() with an async event loop inside Streamlit. Each ResponseTextDeltaEvent updates the placeholder in real-time, creating a word-by-word streaming effect.
AsyncOpenAI with Gemini's base URL and API key, the agent code required zero changes — a true drop-in replacement.
The system solves a real business problem: organizations store knowledge in multiple formats, and employees waste time searching across systems. This RAG agent unifies all knowledge into one conversational interface — reducing information retrieval from hours to seconds.
Fully activate the RagPDF module with PyPDF2 and LangChain loaders for batch document ingestion.
Whisper supports 99 languages — extend transcript processing for non-English video content.
Add cross-encoder re-ranking after vector retrieval to improve precision on complex queries.
Combine BM25 keyword search with semantic vector search for improved recall on exact-match queries.
Allow users to upload PDFs directly through Streamlit — no code changes needed to add new content.
Background processing for new documents so the system remains online while ingesting new knowledge.
We build custom Agentic AI and RAG systems that unify your knowledge base and make information retrieval instant.