The "You Don't Know What You Don't Know" Bug

There’s a class of bugs that took me a long time to recognize because they don’t look like bugs at all.

Nothing crashes.
Nothing fails tests.
Nothing even looks wrong.

And yet, a critical assumption about the system is no longer true.

These are the bugs that happen when a guardrail is silently removed—intentionally or accidentally—and the system keeps working as if nothing changed.

I’ve started thinking of these as “you don’t know what you don’t know” bugs.

The motivating example (but not the real point)

The cleanest example is multi-tenant scoping.

Imagine a Laravel application where tenant isolation is handled by a global scope:

class Invoice extends Model
{
    protected static function booted()
    {
        static::addGlobalScope('tenant', function ($query) {
            $query->where('tenant_id', Tenant::current()->id);
        });
    }
}

Every query is implicitly scoped. Tests assume this. Developers trust this.

Now imagine someone writes:

Invoice::withoutGlobalScopes()
    ->where('status', 'overdue')
    ->get();

Or more subtly:

DB::table('invoices')
    ->where('status', 'overdue')
    ->get();

The feature works.
The tests pass.
The output looks correct.

But the invariant—“invoices are tenant-isolated”—is no longer enforced.

This example is unsettling because the blast radius is obvious, but multi-tenancy isn’t the real issue. It’s just the easiest one to explain.

The real pattern: invisible invariant erosion

What’s actually happening is more general:

The system relies on an invariant
That invariant is encoded indirectly (abstractions, defaults, conventions)
Legitimate work introduces exceptions
An exception weakens or bypasses the invariant
Nothing signals that the invariant no longer holds

The system still behaves correctly in the modeled world.
It just no longer behaves correctly in the assumed world.

That’s the bug.

Why tests don’t catch this (even good ones)

Tests validate behavior under the conditions you model.

These bugs live in conditions you didn’t model.

In the tenant example:

The test database often has a single tenant
Fixtures are clean and isolated
Assertions check “did I get the right record,” not “did I exclude the wrong ones”

Removing the constraint doesn’t change the result set, so the test suite has no reason to fail.

Even when tests could catch this, they require a different shape—negative or adversarial tests:

$this->assertFalse(
    $results->contains(fn ($r) => $r->tenant_id !== $tenantA->id)
);

These tests are:

harder to think of
feel redundant
don’t map cleanly to a single feature
easy to skip under time pressure

And critically, they rely on everyone remembering to write them everywhere.

This happens far beyond multi-tenancy

Once you start looking for this pattern, it shows up constantly.

Authorization and ownership

Invariant: “Only owners can do this.”

Exceptions:

admins
support tools
migrations
background jobs

Failure mode:

a new code path skips the check
tests pass because the fixture user is always authorized

Soft deletes and archival rules

Invariant: “Deleted records are invisible.”

Exceptions:

reporting
audits
restores

Failure mode:

deleted records accidentally reappear
tests don’t include mixed record states

Feature flags and rollouts

Invariant: “This behavior is gated.”

Exceptions:

internal users
canary customers
backfills

Failure mode:

the flag is removed or assumed always-on
tests pass because the flag defaults to true

Performance and safety constraints

Invariant: “This query is bounded.”

Exceptions:

exports
admin dashboards
debugging

Failure mode:

unbounded queries in production
tests pass because datasets are small

Nothing breaks. That’s the problem.

Most of these failures aren’t caused by bad engineers or missing tests.

They’re caused by this:

The system allows critical assumptions to disappear without making noise.

When that happens, developers reasonably trust the system they’re working in.

Why “just write better tests” isn’t enough

Negative tests help. They’re real. They matter.

But they don’t solve the underlying issue, which is institutional memory.

Invariants tend to live:

in abstractions
in conventions
in documentation
in people’s heads

And people constantly bypass guardrails for good reasons.

That’s normal software evolution—not negligence.

So is this actually a major problem?

Not in frequency.

In impact.

These bugs are:

low-frequency
high-impact
silent
often discovered late, or externally

They don’t page you.
They don’t break builds.
They don’t show up in metrics.

They quietly change what the system allows.

A possible direction (still speculative)

I don’t yet have a strong opinion on what the right solution looks like.

But I keep coming back to this idea: these bugs are about intent, not syntax.

The system doesn’t know that:

tenant isolation is supposed to hold here
authorization is supposed to be enforced there
this query was only ever meant to run in an admin context

Those expectations are implicit.

That’s exactly where traditional tools struggle:

linters are rigid
static analysis needs explicit rules
tests arrive too late

This feels like an area where large language models might help—not by writing code, but by reviewing code against a clear specification.

Imagine:

a short, explicit description of an invariant
a few concrete examples
and a tool that runs before code is committed and asks:

“Given this specification, does this change violate an assumption the system relies on?”

The goal wouldn’t be perfection.
It would be early signal.

A nudge that says:

“This looks like it might bypass a guardrail you probably didn’t mean to remove.”

Whether LLMs are actually the right mechanism here is still an open question.
But the shape of the problem—clear requirements, clear examples, abstract code review—at least aligns with what they’re good at.

Closing thought

I don’t think the answer is simply “write more tests.”

Negative tests help, but they rely on remembering to write them everywhere.

What feels missing is a way for the system itself to notice when one of its assumptions quietly stops being enforced.

Some of the most dangerous bugs aren’t caused by code doing the wrong thing.
They’re caused by code silently stopping doing something you forgot it was doing.

Those are the “you don’t know what you don’t know” bugs.

And I suspect we don’t have great tools for them yet.

The motivating example (but not the real point)#

The real pattern: invisible invariant erosion#

Why tests don’t catch this (even good ones)#

This happens far beyond multi-tenancy#

Authorization and ownership#

Soft deletes and archival rules#

Feature flags and rollouts#

Performance and safety constraints#

Nothing breaks. That’s the problem.#

Why “just write better tests” isn’t enough#

So is this actually a major problem?#

A possible direction (still speculative)#

Closing thought#

The motivating example (but not the real point)

The real pattern: invisible invariant erosion

Why tests don’t catch this (even good ones)

This happens far beyond multi-tenancy

Authorization and ownership

Soft deletes and archival rules

Feature flags and rollouts

Performance and safety constraints

Nothing breaks. That’s the problem.

Why “just write better tests” isn’t enough

So is this actually a major problem?

A possible direction (still speculative)

Closing thought