Upload any product image → Get specs, find alternatives, compare prices
Powered by vision LLMs, semantic search, and conversational AI — all in one app.
ShopLens AI is a production-ready multimodal RAG (Retrieval-Augmented Generation) system that combines computer vision, vector search, and large language models to build an intelligent e-commerce shopping assistant.
A user uploads a product photo — a pair of sneakers, a laptop, headphones — and the system:
- Understands the image using Llama 4 Scout vision model
- Finds similar products using hybrid semantic + keyword search over a FAISS vector database
- Answers questions in natural language using ChatGroq (LLaMA 3.3 70B)
- Remembers context across a multi-turn conversation
Think of it as ChatGPT + Google Lens + Amazon Search combined into one interface.
User uploads product image
│
▼
┌─────────────────────┐
│ vision.py │ Llama 4 Scout (Groq) → Rich product description
│ Image → Text │ Caches results by image hash
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ retriever.py │ FAISS semantic search
│ Hybrid Search │ + BM25 keyword search
│ (RRF Fusion) │ → Reciprocal Rank Fusion
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ model.py │ ChatGroq LLaMA 3.3 70B
│ RAG Generation │ + Conversation memory
│ │ → Natural language answer
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ app.py │ Streamlit dark UI
│ Web Interface │ Multi-turn chat + product cards
└─────────────────────┘
| Component | Technology | Purpose |
|---|---|---|
| Vision LLM | Llama 4 Scout 17B (Groq) | Image → product description |
| Chat LLM | LLaMA 3.3 70B (Groq) | Answer generation |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
Text → vectors |
| Vector DB | FAISS (CPU) | Semantic similarity search |
| Keyword Search | BM25 (rank-bm25) | Exact brand/model matching |
| Fusion | Reciprocal Rank Fusion | Hybrid retrieval ranking |
| Framework | LangChain 0.3 | LLM orchestration |
| UI | Streamlit | Web interface |
rag_prod/
│
├── app.py → Streamlit web UI (dark premium theme)
├── pipeline.py → Main orchestrator + ChatSession (multi-turn)
│
├── vision.py → Image loading, format conversion, Groq vision call
├── retriever.py → Hybrid FAISS + BM25 search with RRF fusion
├── embeddings.py → HuggingFace model + FAISS index build/load/search
├── model.py → ChatGroq LLM — answer generation + comparison table
│
├── catalog.py → Product data + searchable text builder
├── ingest.py → One-time index builder CLI
├── config.py → All settings loaded from .env
│
├── requirements.txt
├── .env.example
└── README.md
git clone https://github.com/yourusername/shoplens-ai.git
cd shoplens-ai/rag_prodpython -m venv freshenv
# Windows
freshenv\Scripts\activate
# Mac / Linux
source freshenv/bin/activatepip install --upgrade pip
pip install -r requirements.txt
⚠️ NumPy note: If you get a NumPy 2.x compatibility error with FAISS, run:pip install "numpy<2" && pip install faiss-cpu --force-reinstall
cp .env.example .envEdit .env and add your Groq API key:
GROQ_API_KEY=gsk_your_actual_key_hereGet a free key at console.groq.com/keys — no credit card required.
python ingest.pyExpected output:
INFO - Building FAISS index from 8 documents...
INFO - Index saved to faiss_product_index ✅
✅ Done: 8 products indexed and ready.
streamlit run app.pyOpen http://localhost:8501 in your browser.
- Multimodal input — Upload product images (JPG, PNG, WEBP, AVIF auto-converted)
- Vision understanding — Llama 4 Scout extracts category, brand, color, features, use case
- Hybrid retrieval — FAISS semantic search + BM25 keyword search fused with RRF
- Conversational AI — Multi-turn chat with windowed conversation memory
- Price filtering — "under $300" auto-detected from natural language
- Reciprocal Rank Fusion (RRF) — Merges semantic and keyword rankings without score normalization
- Image description caching — MD5 hash-based cache avoids redundant Vision API calls
- Format normalization — AVIF, BMP, TIFF automatically converted to JPEG before processing
- Post-retrieval filtering — Metadata filters applied after embedding search
- Dark premium theme —
#0a0a0fbackground with purple/orange accent gradients - Product cards — Match score, price, category, and direct product links
- Suggestion chips — One-click common queries
- Comparison mode — Auto-detects "compare" intent and generates markdown tables
- Session management — Clear conversation without page reload
| Search Type | Strength | Weakness |
|---|---|---|
| Semantic (FAISS) | Understands meaning — "running shoes" matches "athletic footwear" | Misses exact brand names |
| Keyword (BM25) | Finds exact matches — "Nike Air Max 270" | No semantic understanding |
| Hybrid (RRF) | Best of both worlds | — |
RRF combines two ranked lists without requiring score normalization:
RRF_score(doc) = Σ 1 / (k + rank_i)
where k=60 is the standard constant from the original paper.
Query = image_description + user_question
↓
Retrieve top-K products from vector store
↓
Format as context: [Product name, specs, price, link]
↓
LLM generates answer grounded in retrieved context
All settings are in config.py, overridable via .env:
| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
— | Required. Groq API key |
GROQ_MODEL |
llama-3.3-70b-versatile |
Chat LLM |
VISION_MODEL |
meta-llama/llama-4-scout-17b-16e-instruct |
Vision LLM |
EMBEDDING_MODEL |
sentence-transformers/all-MiniLM-L6-v2 |
Embedding model |
TOP_K_RESULTS |
5 |
Products to retrieve |
EMBEDDING_DEVICE |
cpu |
cpu or cuda |
LLM_TEMPERATURE |
0.3 |
Lower = more factual |
From a JSON file:
python ingest.py --source path/to/catalog.jsonJSON format:
[
{
"id": "P001",
"name": "Product Name",
"category": "Category",
"brand": "Brand",
"price": 199,
"specs": "Technical specifications...",
"description": "Product description...",
"url": "https://example.com/product"
}
]Upgrade embedding model for better accuracy:
# In config.py
EMBEDDING_MODEL = "BAAI/bge-large-en-v1.5" # 1024-dim, higher accuracy| Error | Fix |
|---|---|
GROQ_API_KEY not found |
Check .env file is in the rag_prod/ folder |
No FAISS index found |
Run python ingest.py before starting the app |
numpy.core.multiarray failed |
Run pip install "numpy<2" |
fbgemm.dll not found |
Install VC++ Redistributable |
proxies TypeError (groq) |
Run pip install groq==0.9.0 httpx==0.27.0 |
| Vision AVIF error | Run pip install pillow-avif-plugin |
| Port 8501 in use | Run streamlit run app.py --server.port 8502 |
Contributions are welcome! Fork this repository and submit a pull request.
This project is licensed under the MIT License — see the LICENSE file for details.............
