There’s a class of bugs that took me a long time to recognize because they don’t look like bugs at all.
Nothing crashes.
Nothing fails tests.
Nothing even looks wrong.
And yet, a critical assumption about the system is no longer true.
These are the bugs that happen when a guardrail is silently removed—intentionally or accidentally—and the system keeps working as if nothing changed.
I’ve started thinking of these as “you don’t know what you don’t know” bugs.
The motivating example (but not the real point)
The cleanest example is multi-tenant scoping.
Imagine a Laravel application where tenant isolation is handled by a global scope:
class Invoice extends Model
{
protected static function booted()
{
static::addGlobalScope('tenant', function ($query) {
$query->where('tenant_id', Tenant::current()->id);
});
}
}
Every query is implicitly scoped. Tests assume this. Developers trust this.
Now imagine someone writes:
Invoice::withoutGlobalScopes()
->where('status', 'overdue')
->get();
Or more subtly:
DB::table('invoices')
->where('status', 'overdue')
->get();
The feature works.
The tests pass.
The output looks correct.
But the invariant—“invoices are tenant-isolated”—is no longer enforced.
This example is unsettling because the blast radius is obvious, but multi-tenancy isn’t the real issue. It’s just the easiest one to explain.
The real pattern: invisible invariant erosion
What’s actually happening is more general:
- The system relies on an invariant
- That invariant is encoded indirectly (abstractions, defaults, conventions)
- Legitimate work introduces exceptions
- An exception weakens or bypasses the invariant
- Nothing signals that the invariant no longer holds
The system still behaves correctly in the modeled world.
It just no longer behaves correctly in the assumed world.
That’s the bug.
Why tests don’t catch this (even good ones)
Tests validate behavior under the conditions you model.
These bugs live in conditions you didn’t model.
In the tenant example:
- The test database often has a single tenant
- Fixtures are clean and isolated
- Assertions check “did I get the right record,” not “did I exclude the wrong ones”
Removing the constraint doesn’t change the result set, so the test suite has no reason to fail.
Even when tests could catch this, they require a different shape—negative or adversarial tests:
$this->assertFalse(
$results->contains(fn ($r) => $r->tenant_id !== $tenantA->id)
);
These tests are:
- harder to think of
- feel redundant
- don’t map cleanly to a single feature
- easy to skip under time pressure
And critically, they rely on everyone remembering to write them everywhere.
This happens far beyond multi-tenancy
Once you start looking for this pattern, it shows up constantly.
Authorization and ownership
Invariant: “Only owners can do this.”
Exceptions:
- admins
- support tools
- migrations
- background jobs
Failure mode:
- a new code path skips the check
- tests pass because the fixture user is always authorized
Soft deletes and archival rules
Invariant: “Deleted records are invisible.”
Exceptions:
- reporting
- audits
- restores
Failure mode:
- deleted records accidentally reappear
- tests don’t include mixed record states
Feature flags and rollouts
Invariant: “This behavior is gated.”
Exceptions:
- internal users
- canary customers
- backfills
Failure mode:
- the flag is removed or assumed always-on
- tests pass because the flag defaults to true
Performance and safety constraints
Invariant: “This query is bounded.”
Exceptions:
- exports
- admin dashboards
- debugging
Failure mode:
- unbounded queries in production
- tests pass because datasets are small
Nothing breaks. That’s the problem.
Most of these failures aren’t caused by bad engineers or missing tests.
They’re caused by this:
The system allows critical assumptions to disappear without making noise.
When that happens, developers reasonably trust the system they’re working in.
Why “just write better tests” isn’t enough
Negative tests help. They’re real. They matter.
But they don’t solve the underlying issue, which is institutional memory.
Invariants tend to live:
- in abstractions
- in conventions
- in documentation
- in people’s heads
And people constantly bypass guardrails for good reasons.
That’s normal software evolution—not negligence.
So is this actually a major problem?
Not in frequency.
In impact.
These bugs are:
- low-frequency
- high-impact
- silent
- often discovered late, or externally
They don’t page you.
They don’t break builds.
They don’t show up in metrics.
They quietly change what the system allows.
A possible direction (still speculative)
I don’t yet have a strong opinion on what the right solution looks like.
But I keep coming back to this idea: these bugs are about intent, not syntax.
The system doesn’t know that:
- tenant isolation is supposed to hold here
- authorization is supposed to be enforced there
- this query was only ever meant to run in an admin context
Those expectations are implicit.
That’s exactly where traditional tools struggle:
- linters are rigid
- static analysis needs explicit rules
- tests arrive too late
This feels like an area where large language models might help—not by writing code, but by reviewing code against a clear specification.
Imagine:
- a short, explicit description of an invariant
- a few concrete examples
- and a tool that runs before code is committed and asks:
“Given this specification, does this change violate an assumption the system relies on?”
The goal wouldn’t be perfection.
It would be early signal.
A nudge that says:
“This looks like it might bypass a guardrail you probably didn’t mean to remove.”
Whether LLMs are actually the right mechanism here is still an open question.
But the shape of the problem—clear requirements, clear examples, abstract code review—at least aligns with what they’re good at.
Closing thought
I don’t think the answer is simply “write more tests.”
Negative tests help, but they rely on remembering to write them everywhere.
What feels missing is a way for the system itself to notice when one of its assumptions quietly stops being enforced.
Some of the most dangerous bugs aren’t caused by code doing the wrong thing.
They’re caused by code silently stopping doing something you forgot it was doing.
Those are the “you don’t know what you don’t know” bugs.
And I suspect we don’t have great tools for them yet.