Developer Guide

Getting started with Olive

From zero to your first inference job in minutes. Create an API key, call the Python SDK, and understand how Olive separates the people who own an account from the developers who build on it.

Step 1

Get an API key

An API key is how your code proves who it is. Every key belongs to an Olive account and inherits that account's models, quotas, and billing.

1

Sign in to the customer portal

Open the Olive customer portal and sign in. If this is your organization’s first time, the person who signs up becomes the account Admin — more on what that means in Admins & developers.

2

Create a key in the API Keys tab

Go to API Keys → Create API key, give it a label you’ll recognize later (e.g. publishing-pipeline), and copy the value. Keys begin with olv_ and are shown only once — Olive stores a hash, never the key itself, so if you lose it you simply revoke and mint a new one.

3

Store it as an environment variable

Treat the key like a password. Keep it out of source control and inject it through the environment or your secrets manager:

shell
# Never hard-code keys. Keep them in the environment.
export OLIVE_API_KEY="olv_your_key_here"

Step 2

Install the SDK

The Olive Python SDK wraps the REST API with typed clients, retries, and helpful errors. It needs Python 3.9 or newer.

shell
pip install olive-compute

Step 3

Run your first inference job

One import, one client, one call. Pass the key from the environment variable you set above; the call blocks until the network returns a result.

first_job.py
import os
from olive import OliveClient

client = OliveClient(api_key=os.environ["OLIVE_API_KEY"])

# Run a single inference call against the default chat model.
# The call blocks until the job completes on the network.
reply = client.inference(
    "Summarize the benefits of print-on-demand for a small publisher.",
    max_tokens=256,
)
print(reply)

That’s a complete job: the SDK submits it to the Olive network, a provider device runs the model, and the generated text comes back as a string. No servers to manage, no cloud capacity to provision.

inference()blocks until that string comes back. For longer work, or if you don’t want your process blocked while it waits, submit the job and poll separately — see Going further below.

Step 4

Going further

Pin a model, generate embeddings, run long jobs asynchronously, and handle failures cleanly.

Pin a model & pick a compute tier

Omit model= to use the default, or pin any catalog model. The compute tier controls the hardware your job lands on.

pin_model.py
# Browse the catalog and pin a specific model.
for m in client.list_models(modality="chat"):
    print(m["id"], "·", m["pricing"]["input_per_1m_tokens_usd"], "USD / 1M tokens")

reply = client.inference(
    "Draft a back-cover blurb for a regional cookbook.",
    model="meta/llama-3.2-3b-instruct",
    compute="medium",     # light · medium · heavy
    temperature=0.4,
)
print(reply)
TierResourcesBest for
light1 core · 2 GBEmbeddings, short inputs
medium2 cores · 4 GBStandard inference (default)
heavy4 cores · 8 GBLong context, large batches

Generate embeddings

Turn text into vectors for search, clustering, or deduplication across a catalog.

embeddings.py
# Embeddings for search, clustering, or dedup across a catalog.
vectors = client.embeddings(
    ["The Hudson Valley Baker", "Seasonal Preserves & Pickles"],
    model="baai/bge-small-en-v1.5",
)
print(len(vectors), "vectors ·", len(vectors[0]), "dims")

Run long jobs asynchronously

submit_job() returns a handle immediately so your process keeps moving; call job.wait() when you need the result.

async_job.py
import json

# For long-running work, submit and poll separately so your
# process isn't blocked while the job runs on the network.
# input_data is a JSON-encoded string — same shape inference() builds for you.
job = client.submit_job(
    workload_type="inference",
    input_data=json.dumps({"prompt": "Write a 400-word author bio.", "max_tokens": 600}),
    model="meta/llama-3.2-3b-instruct",
    compute="heavy",
)
print(job.id, job.status)          # e3b2a1c0-... running

result = job.wait(timeout=300)     # blocks until done, or raises JobError on failure/timeout
output = json.loads(result["output_data"])
print(output["text"])              # same "text" field inference() unwraps for you

Handle errors

The SDK raises typed exceptions so you can separate an auth problem from a rate limit from a failed job. It retries transient network and server errors for you.

errors.py
import os
from olive import OliveClient, AuthError, JobError, RateLimitError

try:
    client = OliveClient(api_key=os.environ["OLIVE_API_KEY"])
    reply = client.inference("Hello, Olive.")
except AuthError:
    # Bad or revoked key — mint a new one in the portal.
    print("Check OLIVE_API_KEY")
except RateLimitError as e:
    print(f"Slow down — retry after {e.retry_after}s")
except JobError as e:
    # The job ran but failed or timed out on the network.
    print(f"Job failed: {e}")

Step 5

Admins & developers

Olive separates two responsibilities on every account. Understanding them now means your setup scales cleanly as your team grows.

Admin

Owns the account. Manages billing, controls which models and compute tiers are enabled, and issues or revokes API keys.

  • · Billing & payment method
  • · Create / revoke API keys
  • · Account-wide settings

Developer

Builds on the account. Uses an API key to run inference, embeddings, and jobs against the enabled models — without touching billing or account settings.

  • · Run jobs via the SDK / API
  • · Read the model catalog
  • · Inspect their own job history

During private beta

Each account has one user login, so the Admin and the Developer are usually the same person — you. The distinction still matters: an API key is a developer credential. Issue one key per application or environment (e.g. staging vs prod), label them clearly, and revoke a key the moment it’s no longer needed. That habit is exactly what the multi-developer model below builds on.

Step 6

What's coming next

These features ship after private beta. They're documented here so you can design your integration around them today.

On the roadmap · Multi-developer support with role-based access

Invite teammates to a single account with named seats. Each developer gets their own login and their own keys, the Admin assigns roles (e.g. Admin, Developer, read-only Viewer), and every job is attributable to the developer who ran it. The Admin/Developer split you use today is the foundation — nothing in your integration needs to change.

On the roadmap · Spend caps at the developer and account level

Set a hard ceiling on spend for the whole account, and per-developer sub-limits within it. A runaway script hits its cap and stops instead of surprising you on the invoice, while other developers keep working. Caps will be configurable in the portal and readable via the API so you can wire them into your own dashboards.

Building against Olive during private beta and want early access to these? Mention it when you reach out — we’re prioritizing based on what customers actually need.

Ready to run your first job?

Create a key, pip install olive-compute, and make your first call.

Get your API key →