A small Retrieval-Augmented Generation (RAG) project that:
- loads PDF files
- chunks PDF text
- generates embeddings with
sentence-transformers - stores vectors in
chromadb - retrieves the most relevant chunks for a query
- uses retrieved context for a simple RAG prompt
pyproject.toml— project dependencies and Python settingsrequirements.txt— pinned dependency listnotebook/document.ipynb— main pipeline notebookdata/pdf/— source PDF filesdata/vector_store/— persistent Chroma vector store
- macOS Intel (
x86_64) - Python
3.11.x(recommended) uvpackage manager installed- A valid Groq API key if you want LLM answers via Groq
cd RAG
uv add -r requirements.txt