How to tell if you're getting the real model
When you call a large language model through someone else's endpoint — a gateway, an aggregator, a cheap relay — you're trusting that what comes back is the model you asked for, at full strength. Usually you can't see inside the box. And the parties most tempted to cut corners are the ones whose margins depend on it: a relay running on reverse-engineered endpoints and shared accounts has every incentive to quietly route you to something cheaper. The good news is you don't have to take anyone's word for it. Below are concrete, vendor-neutral checks you can run on any provider — including us.
Model degradation (also called dilution) is when the model you actually receive is weaker than the one you asked for: a smaller or quantized model, a shrunken context window, or a version silently swapped underneath you.
Why degradation happens
A model call is opaque by design: you send text, you get text back. Whoever controls the endpoint controls what actually runs behind it. A provider on official channels has no reason to swap anything — it passes your call straight through. But a relay that bought access cheaply, or reverse-engineered an endpoint, profits from the gap between what you pay for and what it actually serves; the cheaper the headline price, the stronger that incentive.
Six checks you can run
None of these require special tooling. Pick two or three, automate them, and run them on a schedule — degradation is rarely a one-time event; it drifts with time and load.
Pin the model and version
Read the model field the API returns, not just the one you sent. Does it match exactly, version and all? Watch for silent fallbacks to an older or smaller variant, especially under load.
Keep a capability probe
Hold a small set of tasks only the full model reliably passes — a tricky multi-step reasoning problem, a long instruction-following task, a hard edge case. Re-run them on a schedule; when pass rates slip, something changed.
Test the full context window
Place a unique marker near the start of a long input and ask for it at the very end. Relays that truncate context to save cost will lose the marker; the full model finds it.
Baseline against the official API
Where you can, send the same prompt at temperature 0 to both your provider and the model's official API; behavior should match. Persistent divergence is a signal the model underneath isn't the one you think.
Probe at peak hours
Shared-account setups degrade and rate-limit under load. Run your probe at quiet times and at peak: consistent quality when everyone else is busy is a good sign; quality that sags at 3pm is not.
Ask the hard questions
Official channel or reverse-engineered? A dedicated account or a shared pool? Is the model version pinned? Will "never captured, sold, or trained on" go into the contract? A provider serving the real thing welcomes every one of these.
What "real" looks like
Put together, the checks describe the kind of provider that passes them by construction:
- Official channels, not reverse-engineered — your call goes straight to the model, no swaps.
- Pinned versions — you get the exact model you selected, not a silent substitute.
- Independent nodes, not a shared account pool — capacity that doesn't sag when the crowd arrives.
- Data boundaries in writing — never captured, sold, or trained on, in the contract.
- An accountable, incorporated entity — someone real to verify, and to reach when it matters.
Solunar Gateway
We built Solunar Gateway to pass exactly these checks — and we'd rather you verify than trust. It runs on official channels, pins the model versions you select, serves from independent nodes, keeps your data yours, and is operated by Solunar AI Inc., an incorporated company in British Columbia, Canada that you can look up. Run the probes on us. Access is invite-only.