Creating Custom Tools for LangChain Agents: A Practical Guide
Learn how to build custom tools that extend your LangChain agents' capabilities with this step-by-step guide including practical examples for API integration, data processing, and more
Large language models are impressive, but they have a fundamental limitation: their knowledge is frozen at their training cutoff date, and they can’t access your private data. Retrieval-Augmented Generation (RAG) solves this by giving your AI agent the ability to search through documents and use that information to answer questions accurately.
In this tutorial, we’ll build a complete RAG agent using LangChain that can ingest documents, store them in a vector database, and answer questions based on their content.
By the end of this tutorial, you’ll be able to:
Before we start, make sure you have:
Estimated time: 30 minutes
Create a new project directory and install the required packages:
mkdir rag-agent
cd rag-agent
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the dependencies:
pip install langchain langchain-openai langchain-community chromadb python-dotenv
We’re using ChromaDB as our vector database because it’s lightweight and requires no external setup. For production, you might consider Pinecone, Weaviate, or PostgreSQL with pgvector.
Create a .env file:
OPENAI_API_KEY=your-api-key-here
Checkpoint: Run pip list | grep langchain to verify the packages installed correctly.
Before we write code, let’s understand how RAG works:
The key insight is that embeddings capture semantic meaning. Two sentences about the same topic will have similar embeddings, even if they use different words. This allows the system to find relevant information even when the question doesn’t exactly match the document text.
Create a file called rag_agent.py:
import os
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
load_dotenv()
# Initialize embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Configure text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
separators=["\n\n", "\n", ". ", " ", ""]
)
def ingest_documents(texts: list[str], metadatas: list[dict] = None) -> Chroma:
"""Split texts into chunks and store in vector database."""
# Create Document objects
documents = []
for i, text in enumerate(texts):
metadata = metadatas[i] if metadatas else {"source": f"doc_{i}"}
documents.append(Document(page_content=text, metadata=metadata))
# Split into chunks
chunks = text_splitter.split_documents(documents)
print(f"Split {len(texts)} documents into {len(chunks)} chunks")
# Create vector store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
return vectorstore
The RecursiveCharacterTextSplitter is smart about splitting. It tries to split on paragraph boundaries first, then sentences, then words. The chunk_overlap ensures context isn’t lost at chunk boundaries.
Now let’s build the retrieval component that finds relevant chunks:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def create_rag_chain(vectorstore: Chroma):
"""Create a RAG chain that retrieves context and generates answers."""
# Create retriever (returns top 3 most similar chunks)
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 3}
)
# Define the prompt template
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant that answers questions based on the provided context.
Use the following context to answer the user's question. If the context doesn't contain relevant information, say so clearly rather than making up an answer.
Context:
{context}"""),
("human", "{input}")
])
# Create the chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
return rag_chain
The create_stuff_documents_chain “stuffs” all retrieved documents into the prompt. For longer contexts, you might use map-reduce or refinement strategies, but stuffing works well for most use cases.
RAG agents are more useful when they remember previous questions. Let’s add conversation history:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
def create_conversational_rag_chain(vectorstore: Chroma):
"""Create a RAG chain with conversation memory."""
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant that answers questions based on the provided context and conversation history.
Context from documents:
{context}
Use the context to answer questions. Reference the conversation history for follow-up questions. If you cannot answer from the context, say so clearly."""),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}")
])
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
return rag_chain
class ConversationalRAGAgent:
"""RAG agent that maintains conversation history."""
def __init__(self, vectorstore: Chroma):
self.chain = create_conversational_rag_chain(vectorstore)
self.chat_history = []
def ask(self, question: str) -> str:
"""Ask a question and get an answer with context."""
result = self.chain.invoke({
"input": question,
"chat_history": self.chat_history
})
# Update history
self.chat_history.append(HumanMessage(content=question))
self.chat_history.append(AIMessage(content=result["answer"]))
return result["answer"]
def get_sources(self, question: str) -> list[str]:
"""Get the source documents used for a question."""
result = self.chain.invoke({
"input": question,
"chat_history": []
})
return [doc.page_content for doc in result["context"]]
Here’s a complete example that demonstrates the full RAG workflow:
# Example usage
if __name__ == "__main__":
# Sample documents (in practice, load from files)
documents = [
"""AI agents are autonomous systems that can perceive their environment,
make decisions, and take actions to achieve specific goals. Unlike simple
chatbots, agents can use tools, maintain state, and execute multi-step plans.""",
"""LangChain is a framework for developing applications powered by language
models. It provides tools for building chains, agents, and retrieval systems.
The framework supports multiple LLM providers including OpenAI and Anthropic.""",
"""Vector databases store data as high-dimensional vectors, enabling semantic
search. Popular options include Pinecone, Weaviate, ChromaDB, and pgvector.
They are essential for RAG applications because they allow finding similar
content based on meaning rather than exact keyword matches."""
]
# Ingest documents
vectorstore = ingest_documents(documents)
# Create agent
agent = ConversationalRAGAgent(vectorstore)
# Ask questions
print("Q: What is LangChain?")
print(f"A: {agent.ask('What is LangChain?')}\n")
print("Q: What providers does it support?")
print(f"A: {agent.ask('What providers does it support?')}\n")
print("Q: Why are vector databases important for RAG?")
print(f"A: {agent.ask('Why are vector databases important for RAG?')}")
Run with python rag_agent.py and watch your agent answer questions using the ingested documents.
Symptom: Poor retrieval quality—either too much irrelevant information or missing context.
Solution: Experiment with chunk_size. Start with 500-1000 characters. Larger chunks provide more context but may include irrelevant content. Smaller chunks are more precise but may lose context.
Symptom: Errors about maximum context length when retrieving many documents.
Solution: Reduce k in retriever search_kwargs, or use a different chain strategy like map-reduce for summarizing many documents.
Symptom: Retrieved chunks aren’t relevant to the question.
Solution: Try a different embedding model. OpenAI’s text-embedding-3-large is more accurate but more expensive. Also ensure your chunks contain complete thoughts—don’t split mid-sentence.
You’ve built a functional RAG agent. Here’s how to extend it:
RAG is one of the most practical AI agent patterns because it solves the knowledge freshness problem without fine-tuning. As your document collection grows, your agent automatically becomes more knowledgeable.
Want to learn more about AI agents? Check out our LangGraph tutorial, explore Understanding Agent Memory Systems, or see our Complete Guide to AI Agent Frameworks for framework comparisons.
Learn how to build custom tools that extend your LangChain agents' capabilities with this step-by-step guide including practical examples for API integration, data processing, and more
A comprehensive comparison of LangChain and LlamaIndex for AI agent development, covering architecture, data handling, agent capabilities, and when to use each framework
An in-depth exploration of LangChain's architecture, components, and best practices for building production-ready AI agents