Build RAG pipelines with embeddings + LLM through one key
Embed, retrieve, generate. JJAPI gives you OpenAI / Voyage / Cohere embeddings and any LLM behind a single endpoint.
Recommended models
Best general-purpose embeddings β 3072 dims, top-tier accuracy.
Newer model with better multilingual + domain-specific retrieval.
Excellent at synthesizing retrieved chunks with long context.
Click any model to see the current IDs and release dates at our live catalog.
Embeddings + generation in one bill
Traditional RAG forces you to manage OpenAI for embeddings and a separate vendor for generation. JJAPI handles both β your bill, your audit log, and your rate limits are unified.
Mix embedding models per index
Use text-embedding-3-large for English docs, voyage-3-multilingual for non-English, and cohere-embed-multilingual-v3 for cross-lingual retrieval. All through the same endpoint with model swap.
Reranking for better recall
After vector search, rerank top-50 candidates with /v1/rerank using Cohere or Voyage rerankers. Typically improves recall@5 by 15-30%.
RAG pipeline minimal example
from openai import OpenAI
client = OpenAI(base_url="https://api.jjapi.net/v1", api_key="sk-jjapi-...")
# 1. Embed user query
q_vec = client.embeddings.create(
model="text-embedding-3-large",
input="What's the refund policy?",
).data[0].embedding
# 2. (Your vector store search returns top-5 chunks here)
chunks = vector_db.search(q_vec, k=5)
# 3. Generate answer with retrieved context
resp = client.chat.completions.create(
model="claude-3-5-sonnet",
messages=[
{"role": "system", "content": "Answer using only the provided context."},
{"role": "user", "content": f"Context:\n{chunks}\n\nQuestion: refund policy?"},
],
)