Solunar AI / Blog / Data boundaries

Where your data goes when you call an LLM

Q: Does using an LLM mean my data trains the model?

Not necessarily — it depends entirely on the provider's policy. Some retain and train on inputs; others retain nothing beyond serving the call. The only way to know is to ask, and to get the answer in the contract.

Q: Is a gateway safer or riskier than calling the model directly?

It depends on the gateway. An official-channel gateway adds governance and a clear data boundary without inserting an untrusted party. A reverse-engineered relay does the opposite — it becomes an extra hop you can't see. The architecture matters more than the label.

Q: What should be in writing?

Retention (what's kept and for how long), training (used or not), sharing (sold or not), residency (where it's processed), and the accountable entity. If it's only on a webpage, it can change without telling you.

Q: We handle sensitive or regulated data — what's the bar?

A named entity, a signed agreement, no training on your data, minimal retention, and clear residency. Anything that can't be put in a contract shouldn't be trusted with regulated data.

Insights · June 8, 2026 · 6 min read

Every LLM call ships your text to someone else's machine. The questions that decide whether that's safe are simple to ask and easy to skip: who sees it, where is it kept, for how long, and — the one that matters most — is it ever used to train a model or passed to a third party? For sensitive work, "it depends on the provider" isn't an answer. You want the answers in writing.

First, the term

A data boundary is the set of rules and controls that decide who can see, store, reuse, or train on the text you send to and receive from a model — your prompts, your context, your outputs.

The path your data takes

A single call makes more hops than it looks. Your application sends the text to an access layer — a gateway or a relay — which forwards it to the model provider, which runs it and sends a response back along the same path. Every hop is a place text can be logged, retained, or — with the wrong intermediary — copied. The fewer and the more accountable those hops, the smaller your exposure. A reverse-engineered relay quietly adds an extra, untrusted hop you didn't choose.

The four questions to ask

Get clear answers to these four, and a data boundary goes from "probably fine" to "spelled out":

Is it logged or retained?

Ask what's stored, where, and for how long. "We keep request logs indefinitely" is a very different answer from "we retain nothing beyond what's needed to serve the call."

Is it used for training?

The line that matters most. Is your text — prompts, context, outputs — ever used to train or improve any model, the provider's or anyone's? For proprietary or regulated data, the only safe answer is no.

Is it sold or shared?

Is any of it passed to third parties, brokers, or analytics partners? Your prompts can carry trade secrets, customer data, unreleased work — none of which should become someone else's dataset.

Who is accountable?

Is there a real, named entity behind the service, and does the data policy live in a contract — not just a webpage that can change overnight? Accountability is what makes the other three answers worth anything.

There's one more layer: official channels vs reverse relays. Going straight through an official channel keeps the hops few and named. A cheap reverse relay inserts itself between you and the model as an extra party — one more place your data sits, run by whoever you couldn't see.

Go deeper: official channels vs cheap relays →

What a clean data boundary looks like

Nothing retained beyond what's needed to serve the call.
Never used to train any model.
Never sold or shared with third parties.
Official channels — no extra, unaccountable hop.
A real, incorporated entity standing behind it.
The data policy written into the contract, not just posted on a page.

A note for research & regulated work

For research institutes, labs, and regulated businesses, the technology is only half the question. Traceability — a named entity, a signed agreement, clear residency and retention terms — is what an audit, an ethics board, or a data-protection officer actually asks for. The provider that can point to a contract clause, not a blog promise, is the one that survives that conversation.

Solunar Gateway

With Solunar Gateway, your data stays yours: not retained beyond serving the call, never sold, never used to train any model. Calls go through official channels — no extra reverse-engineered hop — and the service is operated by Solunar AI Inc., an incorporated company in British Columbia, Canada. Those boundaries belong in the contract, not just on a page. Access is invite-only.

Request access → Official vs cheap relays

FAQ

Does using an LLM mean my data trains the model?

Not necessarily — it depends entirely on the provider's policy. Some retain and train on inputs; others retain nothing beyond serving the call. The only way to know is to ask, and to get the answer in the contract.

Is a gateway safer or riskier than calling direct?

It depends on the gateway. An official-channel gateway adds governance and a clear data boundary without inserting an untrusted party. A reverse-engineered relay does the opposite — it becomes an extra hop you can't see. The architecture matters more than the label.

What should be in writing?

Retention (what's kept and how long), training (used or not), sharing (sold or not), residency (where it's processed), and the accountable entity. If it's only on a webpage, it can change without telling you.

We handle regulated data — what's the bar?

A named entity, a signed agreement, no training on your data, minimal retention, and clear residency. Anything that can't be put in a contract shouldn't be trusted with regulated data.

Keep exploring

Official vs cheap relays For research institutes Glossary Back to Blog