Solunar AI / Blog / Getting the real model

How to tell if you're getting the real model

Insights · June 8, 2026 · 7 min read

When you call a large language model through someone else's endpoint — a gateway, an aggregator, a cheap relay — you're trusting that what comes back is the model you asked for, at full strength. Usually you can't see inside the box. And the parties most tempted to cut corners are the ones whose margins depend on it: a relay running on reverse-engineered endpoints and shared accounts has every incentive to quietly route you to something cheaper. The good news is you don't have to take anyone's word for it. Below are concrete, vendor-neutral checks you can run on any provider — including us.

First, the term

Model degradation (also called dilution) is when the model you actually receive is weaker than the one you asked for: a smaller or quantized model, a shrunken context window, or a version silently swapped underneath you.

Why degradation happens

A model call is opaque by design: you send text, you get text back. Whoever controls the endpoint controls what actually runs behind it. A provider on official channels has no reason to swap anything — it passes your call straight through. But a relay that bought access cheaply, or reverse-engineered an endpoint, profits from the gap between what you pay for and what it actually serves; the cheaper the headline price, the stronger that incentive.

Go deeper: official channels vs cheap relays →

Six checks you can run

None of these require special tooling. Pick two or three, automate them, and run them on a schedule — degradation is rarely a one-time event; it drifts with time and load.

01

Pin the model and version

Read the model field the API returns, not just the one you sent. Does it match exactly, version and all? Watch for silent fallbacks to an older or smaller variant, especially under load.

02

Keep a capability probe

Hold a small set of tasks only the full model reliably passes — a tricky multi-step reasoning problem, a long instruction-following task, a hard edge case. Re-run them on a schedule; when pass rates slip, something changed.

03

Test the full context window

Place a unique marker near the start of a long input and ask for it at the very end. Relays that truncate context to save cost will lose the marker; the full model finds it.

04

Baseline against the official API

Where you can, send the same prompt at temperature 0 to both your provider and the model's official API; behavior should match. Persistent divergence is a signal the model underneath isn't the one you think.

05

Probe at peak hours

Shared-account setups degrade and rate-limit under load. Run your probe at quiet times and at peak: consistent quality when everyone else is busy is a good sign; quality that sags at 3pm is not.

06

Ask the hard questions

Official channel or reverse-engineered? A dedicated account or a shared pool? Is the model version pinned? Will "never captured, sold, or trained on" go into the contract? A provider serving the real thing welcomes every one of these.

What "real" looks like

Put together, the checks describe the kind of provider that passes them by construction:

Solunar Gateway

We built Solunar Gateway to pass exactly these checks — and we'd rather you verify than trust. It runs on official channels, pins the model versions you select, serves from independent nodes, keeps your data yours, and is operated by Solunar AI Inc., an incorporated company in British Columbia, Canada that you can look up. Run the probes on us. Access is invite-only.

FAQ

Can a relay really swap models without me knowing?
Yes. If a provider controls the endpoint, it controls what actually runs and what the response reports. That's exactly why verification beats trust — the checks above surface a swap that the response text alone would hide.
What's the single most useful check?
A capability probe. Keep a handful of hard tasks only the full model passes and re-run them on a schedule. It's cheap, automatable, and a drop in pass rate is the clearest early warning of degradation.
Does going through a gateway mean I can't verify the model?
No — the opposite, if the gateway is honest. A real gateway uses official channels and pins versions, so your checks pass and you can even baseline against the official API. The problem isn't gateways; it's relays that hide what they serve.
Isn't this overkill?
For mission-critical work, it's just good engineering. A silent downgrade ships wrong answers to your users and you hear about it from them. A periodic probe costs a few calls a day and catches it first.

Keep exploring