Where your data goes when you call an LLM
Every LLM call ships your text to someone else's machine. The questions that decide whether that's safe are simple to ask and easy to skip: who sees it, where is it kept, for how long, and — the one that matters most — is it ever used to train a model or passed to a third party? For sensitive work, "it depends on the provider" isn't an answer. You want the answers in writing.
A data boundary is the set of rules and controls that decide who can see, store, reuse, or train on the text you send to and receive from a model — your prompts, your context, your outputs.
The path your data takes
A single call makes more hops than it looks. Your application sends the text to an access layer — a gateway or a relay — which forwards it to the model provider, which runs it and sends a response back along the same path. Every hop is a place text can be logged, retained, or — with the wrong intermediary — copied. The fewer and the more accountable those hops, the smaller your exposure. A reverse-engineered relay quietly adds an extra, untrusted hop you didn't choose.
The four questions to ask
Get clear answers to these four, and a data boundary goes from "probably fine" to "spelled out":
Is it logged or retained?
Ask what's stored, where, and for how long. "We keep request logs indefinitely" is a very different answer from "we retain nothing beyond what's needed to serve the call."
Is it used for training?
The line that matters most. Is your text — prompts, context, outputs — ever used to train or improve any model, the provider's or anyone's? For proprietary or regulated data, the only safe answer is no.
Is it sold or shared?
Is any of it passed to third parties, brokers, or analytics partners? Your prompts can carry trade secrets, customer data, unreleased work — none of which should become someone else's dataset.
Who is accountable?
Is there a real, named entity behind the service, and does the data policy live in a contract — not just a webpage that can change overnight? Accountability is what makes the other three answers worth anything.
What a clean data boundary looks like
- Nothing retained beyond what's needed to serve the call.
- Never used to train any model.
- Never sold or shared with third parties.
- Official channels — no extra, unaccountable hop.
- A real, incorporated entity standing behind it.
- The data policy written into the contract, not just posted on a page.
A note for research & regulated work
For research institutes, labs, and regulated businesses, the technology is only half the question. Traceability — a named entity, a signed agreement, clear residency and retention terms — is what an audit, an ethics board, or a data-protection officer actually asks for. The provider that can point to a contract clause, not a blog promise, is the one that survives that conversation.
Solunar Gateway
With Solunar Gateway, your data stays yours: not retained beyond serving the call, never sold, never used to train any model. Calls go through official channels — no extra reverse-engineered hop — and the service is operated by Solunar AI Inc., an incorporated company in British Columbia, Canada. Those boundaries belong in the contract, not just on a page. Access is invite-only.