📚

Build RAG pipelines with embeddings + LLM through one key

Embed, retrieve, generate. JJAPI gives you OpenAI / Voyage / Cohere embeddings and any LLM behind a single endpoint.

Start building → View docs

Recommended models

text-embedding-3-large →

Best general-purpose embeddings — 3072 dims, top-tier accuracy.

Voyage 3 family

Newer model with better multilingual + domain-specific retrieval.

Claude Sonnet (latest) →

Excellent at synthesizing retrieved chunks with long context.

Click any model to see the current IDs and release dates at our live catalog.

Embeddings + generation in one bill

Traditional RAG forces you to manage OpenAI for embeddings and a separate vendor for generation. JJAPI handles both — your bill, your audit log, and your rate limits are unified.

Mix embedding models per index

Use text-embedding-3-large for English docs, voyage-3-multilingual for non-English, and cohere-embed-multilingual-v3 for cross-lingual retrieval. All through the same endpoint with model swap.

Reranking for better recall

After vector search, rerank top-50 candidates with /v1/rerank using Cohere or Voyage rerankers. Typically improves recall@5 by 15-30%.

RAG pipeline minimal example

Example

from openai import OpenAI
client = OpenAI(base_url="https://api.jjapi.net/v1", api_key="sk-jjapi-...")

# 1. Embed user query
q_vec = client.embeddings.create(
    model="text-embedding-3-large",
    input="What's the refund policy?",
).data[0].embedding

# 2. (Your vector store search returns top-5 chunks here)
chunks = vector_db.search(q_vec, k=5)

# 3. Generate answer with retrieved context
resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "Answer using only the provided context."},
        {"role": "user", "content": f"Context:\n{chunks}\n\nQuestion: refund policy?"},
    ],
)

Start building →