trustyourinbox
← All articles

How we update DNS records on your behalf, safely

When you click ‘Fix this for me’ on a DMARC, DKIM, or SPF issue, we don't write to your DNS immediately. We stage the change, email you a paper trail, wait five minutes, then apply — and verify the new value actually took effect. Within 24 hours you can undo the whole thing with one click. Here's why each piece of that flow exists.

Why "auto-fix" is dangerous if you do it wrong

DNS edits to your authoritative records affect mail delivery for everyone who sends to your domain. A wrong DMARC policy can quarantine real mail at receivers. A bad SPF record can make your mail look unauthorized. A typo in a DKIM TXT record means signatures fail verification — every receiver treats your messages as if you didn't sign them at all.

That's why most DMARC monitoring tools stop at "here's what's wrong" and leave the fix to you. The risk profile is asymmetric: the value of saving you a copy-paste is small, the cost of breaking your mail is huge. So they punt.

We think you should get the convenience without the risk — but only if the safety story is genuinely thought through. "Click to apply" with no guardrails is set-and-pray. We do four things differently.

Layer 1: A five-minute cancel window before anything is written

When you click Fix this for me, we don't make the API call immediately. The change goes into a queue with a scheduled apply time five minutes in the future. A background job wakes up every minute and applies whatever's due. If you cancel before that five-minute mark — from the dashboard or from the email link — the API call never happens.

Why five minutes specifically? It's the goldilocks window. Long enough that you'd notice an unexpected change in your inbox and have time to react. Short enough that the happy path still feels instant. We picked it deliberately and we don't shorten it. (You can sit and watch the countdown if you want; the dashboard shows it live.)

Layer 2: A paper-trail email with a one-click cancel

The moment a change is staged, the workspace owner gets an email with the full diff — what record we're touching, what it was, what it'll be, who clicked the button, and exactly when it applies. There's a big Cancel button that works without signing in.

This email is not a permission gate. You already authenticated to click the button; we're not going to make you click again. The email exists for two specific reasons:

  • Paper trail in your inbox. If anyone ever asks "did we really authorize that DNS change?" — there's a receipt with a timestamp, the diff, and your email address as initiator. Auditors love it; you'll probably never need it.
  • Out-of-band cancel for the awkward cases. If you click the button from your phone, walk away, then realize you meant a different domain — the email gives you a way to back out without scrambling to re-authenticate. Same logic if someone else with workspace access clicked something they shouldn't have.

The cancel link uses a one-time token tied to the specific change. It can only cancel that change. It can't be replayed for a different one.

Layer 3: A 30-second read-back verification

Five minutes pass. The job wakes up, decrypts your DNS provider token, makes the API call to publish the new record. The provider returns success.

That's not the same as your record actually being live and answerable on the public DNS. CF's edge has caching layers; resolvers around the world cache records for as long as the record's TTL says they can. So we wait 30 seconds and then query your record from the public DNS-over-HTTPS endpoint at Cloudflare 1.1.1.1 — same path a real receiving mail server would take — and compare what we see to what we wrote.

If the read-back matches: we log success and move on. If it doesn't match — DNS still serving the old value, or something completely different — we don't auto-rollback (a mismatch can be benign caching, and rolling back blindly creates a worse mess). Instead we write a readback_mismatch entry in your audit log and email the founder so a human can investigate. You see the warning on your dashboard within seconds.

Layer 4: A 24-hour undo button

After the change applies, the dashboard surfaces an Undo button for the next 24 hours. Clicking it stages a reversal — same five-minute window, same email, same read-back — that restores the previous value of the record. Audit log captures both directions.

24 hours is generous on purpose. "I noticed yesterday morning that something looked off" is a normal human cadence; we don't want the safety net to snap away while you're still figuring out whether anything actually broke.

One V1 limit worth knowing: if the original change created a record from nothing (the SPF create-from-scratch case, for example), Undo will tell you it can't roll back automatically. Our apply path can write records but doesn't delete them — that's a deliberate guardrail to keep us from ever accidentally erasing something we shouldn't. To remove a created record you delete it via your DNS provider directly.

The other guardrails you don't see

On top of the four visible layers there's a bunch of defensive plumbing underneath:

  • Defense-in-depth re-parse. Right before the API call, we re-parse the new value as the protocol it's supposed to be. A "DMARC fix" that doesn't start with v=DMARC1 never makes it out the door. A "DKIM publish" with an empty p= tag (the explicit revocation signal) is refused. A "SPF create" that doesn't start with v=spf1 is refused.
  • Apex multi-record awareness. Your domain's apex commonly has unrelated TXT records — Google site verification tokens, MX-related markers, vendor verifications. SPF create uses a primitive that addsalongside them. DMARC and DKIM use a different primitive that treats their record name as single-record-per-name (because that's how the specs require it). The two paths are not interchangeable, and we picked the right one per fix type.
  • Rate limit. A maximum of five staged changes per workspace per 24 hours. Defends against accidental button-mash and gives us a circuit breaker if something in our scanning logic ever loops.
  • Token re-verification at stage time. Every fix re-verifies your DNS provider token before queuing. If you've revoked it at the provider (which is your right and we can't prevent), the change is refused immediately — never staged into a later failure.
  • Audit log on every transition. dns_change.staged, dns_change.canceled, dns_change.applied, dns_change.failed, dns_change.rolled_back, dns_change.readback_mismatch — each carries the change ID, the old/new values, and a timestamp. The trail is complete even if the underlying queue row is later purged.

What we don't do, and why

We don't apply silently in the background

Every change has a human in the loop on our side (you clicked the button) and on yours (you got the email). We don't run a "we noticed your DKIM key is 1024-bit, we rotated it for you" cron. The model is one-click convenience, not autonomous agent.

We don't shorten the delay for "low-stakes" fixes

Every fix runs through the same 5-minute window. There's no "expert mode" that skips safety — first, because the wrong call about what's "low-stakes" tends to be the call that costs you a customer; second, because uniform timing means the safety story is something you can describe in one sentence to anyone asking how it works.

We don't auto-rollback on a read-back mismatch

A mismatch can mean "DNS hasn't propagated yet" (benign) or "we wrote to the wrong record" (urgent). Auto-rollback would make benign cases churn — and a churn cycle on your DMARC record is far worse than the original mismatch. We log loudly and let a human decide.

We don't store your provider token in plaintext, ever

When you paste your Cloudflare API token, it's encrypted with AES-256-GCM before it touches our database. The encryption key lives only in our Worker Secrets, not in any commit. Decrypt happens at apply time, in memory, and the plaintext is never logged. You can revoke the token at Cloudflare any time — the next apply attempt will fail cleanly and we'll email you to reconnect.

What this actually feels like in practice

On a healthy domain, the whole flow is a button click, a confirmation in your inbox, and a green "applied" card on the dashboard about five minutes later. You probably won't read the email; you definitely won't need to click the cancel link. The safety layers exist to be there the day you need them.

That day might be when a teammate clicks the button on the wrong domain. Or when you're testing on a staging zone and forget which tab you have open. Or when something downstream broke and you need to roll back fast. The cost of having the layers when you don't need them is approximately zero. The cost of not having them when you do is your whole day.

Related

Stop guessing — start monitoring.

Free for 1 domain. Set up in 5 minutes. We handle the report parsing, you read plain-English summaries.

Run a free audit