Laravel on Lockdown
I use Claude to operate a fleet of Laravel applications. Call my entry point Watchtower: a single repo of instructions and shared configuration covering how to organize, test, secure, and run a Laravel app, according to a set of opinions I will get to.
From Watchtower, Claude can inspect, change, or monitor any of the projects (call them the Fleet). The goal of the arrangement is that every one of my Laravel projects converges on one organizational system. Same guard rails, same baseline configuration, same architectural rules, same deterministic constraints (complexity ceilings, a clean PHPStan baseline, and the rest). When all the apps obey the same rules, managing them gets close to easy. When I learn something on one project, I fold it back into Watchtower, then have it bring the other apps up to the new standard.
That is the table stakes. The interesting part is the lockdown.
Locking it down
The release-not-release of Claude Mythos (and Fable 5) stirred a lot of panic among security researchers. I am not going to weigh in on the event itself, but it did push me to start ratcheting my security in ways I had been putting off. Security is always important, sure, but “good enough” is an easy bar to clear, and I do not mean to be good enough. I want my cluster, my servers, and my applications locked down hard enough that it is obvious I have done the work.
The edge: no open ports
Start at the front door. There is no NAT hole-punching. Instead of opening a port that any IP can scan, or leaning on geo-blocking to limit which bad guys get to scan it, I run cloudflared to open an outbound tunnel to Cloudflare’s edge. Nothing behind my router listens for inbound connections, because there is nothing to listen: the tunnel dials out. The world at large, absent the tunnel, cannot hit any open port at my address. There are none.
There is a bonus in routing through Cloudflare. Every request lands on their edge first, which terminates TLS and runs a WAF I do not have to operate myself, before anything reaches the tunnel.
My domains are wildcard DNS records, so the only name that is actually public is the apex. That apex is a statically generated HTML site with no backend to compromise. Every other service lives on a subdomain that is listed nowhere, so an attacker has to guess names to reach an app, and a guessed name I have not wired to a service does not resolve to anything useful: it lands on the ingress controller and gets a 404, not an app. It is security by obscurity, but it is a layer, and it costs me nothing.
The host: Talos Linux
So the only way in from outside is the tunnel. What is waiting on the other end?
The cluster runs on Talos Linux, which is the same idea as my web containers applied to the operating system itself. Talos is an immutable, API-driven Kubernetes OS. There is no SSH. There is no shell. There is no package manager and no general-purpose userland to log into. You do not administer a Talos node by getting a prompt on it, because there is no prompt to get. The only way to talk to a node is a gRPC API authenticated with mutual TLS, and the machine config is declarative: I hand each node a config file, the node reconciles to it, and the root filesystem stays read-only. It runs as a six-node cluster on Proxmox, three control-plane nodes behind a floating virtual IP and three workers, with the apps scheduled onto the workers.
That buys two things. The first is the same dead end I built into the edge containers, one layer down: anyone who lands on a node finds no shell and no tools to pivot with, on the host this time and not just in the pod. The second is that the host cannot drift. There is no console to make a quick fix on, so configuration changes the only way anything changes around here, by editing a file and reconciling it. The cluster’s signing keys and secrets live entirely off the repo, never committed; the repo holds only the non-sensitive per-node network settings. Same rule as the application secrets: the thing that would hurt you if it leaked is the thing that is never in git.
The containers: small where it is exposed
The apps run in containers built from a dual-purpose base image, extended by the nature of the task.
Web tasks, the ones exposed to the internet, are built for the smallest possible attack surface. The app is a statically compiled FrankenPHP binary on an Alpine Linux container with the shells, the package managers, and the usual CLI utilities removed. An adversary who does manage an RCE against the application lands somewhere with no OS tooling to pivot from: nothing to explore the environment with, nothing to pull more tools down with, nothing to move laterally with.
Cron and queue-worker tasks are not exposed to untrusted inbound traffic, so they keep the usual shell and tools. Backend work goes on without being hamstrung by a missing toolchain, and the surface an attacker could actually reach stays bare.
The code: a machine owns the number
The thread tying all of this together is that I do not want quality to be a matter of opinion. Opinions are inconsistent, they get tired, and they get overruled, and Claude is very good at producing code that looks fine to a tired reviewer. A number does not get tired. Complexity over ten fails. Coverage under eighty fails. A taint flow from user input into a SQL string fails. I do not argue with Claude about whether the code is clean enough; the machine owns the number, and the machine does not care who is in a hurry.
The rest of my approach to AI code quality is layering deterministic tool results and putting gates in CI. On a pull request, my systems automatically run the checks below. The last column is honest about how far each control has rolled out: a “converged” app is one I have brought all the way up to the current Watchtower standard, and not every app is there yet.
| Layer | Tool | Constant it owns | Gate | Attainment across the four apps |
|---|---|---|---|---|
| Static | PHPStan | level 8 + strict-rules | CI static | All four at level 8; strict-rules and zero-baseline on the converged apps, trailing app level 8 only |
| Static | PHPMD | cyclomatic 10, methods 20, public-methods 15 | CI static | Converged apps |
| Static | jscpd | minTokens 50, 10% target | CI static | Per-app values ratcheting toward 10% |
| Static | Pint, ESLint, Prettier, tsc, knip, composer-normalize | format and lint clean | CI static | Converged apps |
| Architecture | Pest arch tiers | universal tiers byte-identical | CI tests + arch-drift | Universal tiers on the converged apps; opt-in tiers where the namespace exists |
| Behavioral | Pest on Postgres | coverage --min=80, type-coverage --min=95 | CI tests | Converged apps |
| Behavioral | Infection, Playwright | mutation and e2e | Out of band | Converged apps |
| Security (SAST) | Psalm taint | 0 findings | CI static | Converged apps |
| Security (SCA) | roave, composer audit, npm audit | no known-vulnerable deps at solve time | CI static | roave on two apps; audit fleet-wide |
| Security (IaC) | Trivy config | floor HIGH | k8s repo CI | Cluster repo, first-party workloads at 0 HIGH |
| Security (image) | Trivy image | CRITICAL/HIGH exit 1 | one app’s deploy | One app only |
| Security (DAST) | ZAP baseline | advisory | Out of band | Local, advisory |
| Runtime | SecurityHeaders middleware | CSP without unsafe-eval, headers set | Application | Two apps |
| Runtime | FrankenPHP image | uid 33, :8080, shell-less edge | Build target | Edge/console split on the launch app |
| Cluster | securityContext | drop ALL, RuntimeDefault, readOnlyRootFilesystem | Pod spec, Trivy-gated | All deployed apps |
| Cluster | Trivy config re-scan | posture cannot drift | k8s repo CI | Cluster repo |
| Deploy | ArgoCD | image tag = git SHA, selfHeal | GitOps reconcile | All deployed apps |
The layers are additive and overlapping. I do not claim they catch anything and everything. What they do is specific:
- Enforce structure, through the architecture tests.
- Enforce simplicity, with PHPMD complexity ceilings.
- Reduce duplication (DRY).
- Standardize formatting across all code.
- Enforce a floor for strict typing, and use that floor as a smoke test.
- Run the usual software testing suites.
- Layer on mutation testing, to catch the variant bugs.
- Catch known-vulnerable dependencies before they ship.
- Keep insecure cluster configurations from ever being merged.
- Take most of the OWASP Top 10 off the table by construction.
These deterministic guard rails run across the whole fleet the same way, so I can context-switch between apps without stopping to remember what is different about this one. I get to think about business logic, and let the guard rails keep Claude honest.
What it doesn’t do
Will this produce perfect code? No. Nothing will. What it does is make the easy attacks fail and the hard ones expensive, on a platform built to be adversary-resistant, not adversary-proof. A determined human with time and motivation is a different threat model than a bot sweeping the whole IPv4 range for an open port, and I am explicitly optimizing against the second one. The goal is to be enough of a pain that almost nobody decides I am worth the afternoon.
I know where the soft spots are, because I keep a written, prioritized security roadmap, and it is never empty. That is the design, not a confession. Every time I harden one layer, the next weakest thing becomes the most interesting thing, and onto the list it goes. I am not going to publish the list. The blog you are reading runs on the same cluster a curious reader might be tempted to poke, and a current inventory of the spots I have not reached yet is exactly the document I should not hand out. Its existence is the point. The day it is empty is the day I stopped paying attention.
What I will say is what this kind of layering can and cannot do in general. It contains blast radius: persistence, privilege escalation, lateral movement. It does not shrink what a compromised application is fundamentally allowed to do. And no gate in here has an opinion about intent. A linter will not catch the right query run for the wrong user; static analysis reasons about code, not about authorization. The stack takes whole classes of mistake off the table. It does not turn passing every check into being correct, and a wall of green that lulls a reviewer into skimming is its own quiet risk. The gates exist to spend my attention well, not to let me stop paying it.