Building an AI-Powered Search Engine Using LangChain and Agents
The rise of Generative AI has transformed how we interact with information. Large Language Models (LLMs) like Llama3-8b-8192 can process complex queries, but they lack real-time search capabilities. To address this, I built an AI-powered search engine using LangChain Agents, integrating Wikipedia, Arxiv, and DuckDuckGo search tools to provide real-time, context-aware responses.
This project not only improves search accuracy but also showcases the power of Retrieval-Augmented Generation (RAG)—a technique that enhances LLMs by fetching live information before generating responses.
💡 How This Search Engine Works
🔹 Core Components Used
This project integrates multiple AI and search technologies:
| Component | Purpose |
|---|---|
| LangChain | Framework to build AI applications with LLMs |
| Streamlit | Web interface for interactive AI search |
| DuckDuckGo API | Fetches real-time web results |
| Wikipedia API | Retrieves encyclopedic knowledge |
| Arxiv API | Fetches academic research papers |
| FAISS | Stores vectorized text for efficient retrieval |
| Hugging Face Transformers | Embedding models for RAG |
| MySQL + SQLAlchemy | Stores search logs for analytics |
🔹 Tech Stack & Dependencies
To build this project, I used the following Python libraries:
langchain, langchain-community, langchain-openai, langchain-groq, langchain_huggingface
streamlit, python-dotenv, pypdf, arxiv, wikipedia, sentence_transformers, faiss-cpu
chromadb, duckdb, pandas, mysql-connector-python, SQLAlchemy, validators, pytube
These tools help integrate LLMs, search APIs, document retrieval, vector storage, and database management.
🚀 Project Overview: How It Works
Step 1: Setting Up Search Tools (Wikipedia, Arxiv, and DuckDuckGo)
To retrieve live data, we use Wikipedia API, Arxiv API, and DuckDuckGo search:
from langchain_community.utilities import ArxivAPIWrapper, WikipediaAPIWrapper
from langchain_community.tools import ArxivQueryRun, WikipediaQueryRun, DuckDuckGoSearchRun
# Setup Wikipedia tool
wiki_api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=200)
wiki = WikipediaQueryRun(api_wrapper=wiki_api_wrapper)
# Setup Arxiv tool
arxiv_wrapper = ArxivAPIWrapper(top_k_results=1, doc_content_chars_max=200)
arxiv = ArxivQueryRun(api_wrapper=arxiv_wrapper)
# Setup DuckDuckGo web search tool
search = DuckDuckGoSearchRun(name="Search")
🔹 What This Does:
- Wikipedia Tool → Fetches summarized encyclopedic content.
- Arxiv Tool → Finds academic research papers.
- DuckDuckGo Tool → Searches the live web for fresh content.
Step 2: Creating a Search Agent (The AI Brain)
We need an Agent that can think step by step and decide which tool to use.
from langchain_groq import ChatGroq
from langchain.agents import initialize_agent, AgentType
# Load API Key from .env file
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("GROQ_API_KEY")
# Initialize LLM Model (Llama3-8b-8192)
llm = ChatGroq(groq_api_key=api_key, model_name="Llama3-8b-8192", streaming=True)
# Define available tools
tools = [search, arxiv, wiki]
# Create an AI Agent that can decide which tool to use
search_agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, handle_parsing_errors=True)
🔹 What This Does:
- Loads the AI model (Llama3-8b-8192) to process search queries.
- Combines all search tools into one system.
- Uses an Agent to decide whether to search Wikipedia, Arxiv, or the Web.
Step 3: Building an Interactive UI with Streamlit
To allow users to interact with the AI Search Engine, we built a Streamlit UI:
import streamlit as st
from langchain.callbacks import StreamlitCallbackHandler
# Streamlit UI Setup
st.title("🔎 AI-Powered Search Engine")
st.sidebar.title("Settings")
api_key = st.sidebar.text_input("Enter your Groq API Key:", type="password")
if "messages" not in st.session_state:
st.session_state["messages"] = [{"role": "assistant", "content": "Hi! I can search the web. Ask me anything."}]
for msg in st.session_state.messages:
st.chat_message(msg["role"]).write(msg["content"])
if prompt := st.chat_input("What do you want to search?"):
st.session_state.messages.append({"role": "user", "content": prompt})
st.chat_message("user").write(prompt)
# Invoke the AI Search Agent
with st.chat_message("assistant"):
st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
response = search_agent.run(st.session_state.messages, callbacks=[st_cb])
st.session_state.messages.append({'role': 'assistant', "content": response})
st.write(response)
🔹 What This Does:
- Displays chat history between user and AI.
- Allows real-time user input.
- AI processes the query and returns results.
🎯 Key Features & Benefits
✅ Retrieval-Augmented Generation (RAG) → Combines AI with live search results.
✅ Multi-Source Search → Wikipedia, Arxiv, and DuckDuckGo provide accurate, up-to-date answers.
✅ AI Decision-Making → The Agent chooses the right tool for each query.
✅ Fast & Scalable → Uses FAISS vector storage and LangChain agents.
🔗 Live Demo & GitHub Repository
- GitHub Code: [Insert GitHub Link Here]
- Live Demo: [Insert Deployed Link Here]
📌 Final Thoughts: What I Learned
🔹 AI alone is not enough → Real-time retrieval tools enhance accuracy.
🔹 LangChain makes it easy → Agents, Tools, and Executors simplify AI workflows.
🔹 Deployment matters → Hosting AI-powered search engines can help businesses make data-driven decisions.
🚀 Next Steps:
I plan to enhance this project by:
1️⃣ Adding PDF & YouTube Video Summarization 📑.
2️⃣ Improving the search accuracy with embeddings 🧠.
3️⃣ Deploying a full-scale API 🌍.
What do you think of this project? Feel free to share your feedback! 😊
🚀 Want to Build Your Own AI Search Engine?
If you’re interested in building your own AI-powered search tool, let’s connect on LinkedIn!
👉 [LinkedIn Profile]
👉 [GitHub Repository]
😊
Comments
Post a Comment