Learn how to design an email risk score by combining syntax, domain health, disposable flags, and past outcomes into a simple, usable rubric.

An email risk score is a simple number (or a label like low, medium, high) that estimates how likely an email address is to create problems for your business. “Problems” usually means fake signups, bounced emails, spam complaints, or users you can’t reach again.
It’s not a verdict on whether someone is “good” or “bad.” It’s a quick, consistent summary of signals you already have.
A score helps when pass/fail is too blunt. Many addresses look valid on the surface but still carry risk. An email can have correct syntax and a real domain, yet still be disposable, misconfigured, or tied to patterns that previously led to abuse.
With a risk score, you can make consistent decisions without turning every edge case into a debate. Typical actions look like:
Explainability matters. Non-technical teams should be able to answer, “Why was this high risk?” Aim for plain reasons like “disposable provider,” “domain can’t receive mail,” or “failed domain checks.”
A simple way to keep everyone aligned is to tie each score band to a clear policy. For example, 0-30 means low risk and auto-approve, 31-70 means moderate risk and require verification, and 71-100 means high risk and block or review. The goal isn’t a perfect number. The goal is a decision you can explain, measure, and adjust.
Start with a small set of signals that are easy to explain and hard to game. You can always add more later.
Begin with strict syntax checks. This is more than “has an @”. RFC-style parsing catches missing parts, illegal characters, double dots, and tricky formats that many systems mis-handle. Syntax checks are cheap and stop obvious junk early, but they don’t tell you whether the mailbox can actually receive mail.
Next, check domain health. Two checks do most of the work: does the domain exist (DNS resolves), and does it publish MX records. MX isn’t perfect (some domains accept mail without it), but it’s a strong reachability clue. A brand-new domain with no mail setup is often higher risk at signup.
Disposable provider flags matter most at account creation. Disposable inboxes show up in free-trial abuse, coupon hunting, and fake lead capture. You don’t always need to block them, but you should score them.
You can also include reputation-style signals, carefully. Blocklists and spam-trap indicators can reduce bounces, but false positives happen. Treat these as high-confidence inputs only, and prefer softer actions (like extra verification) over hard blocks.
Finally, add context you already have. Email alone rarely tells the whole story. Useful inputs include where the signup came from, unusual signup velocity, repeated patterns, and what happened with similar signups before.
Example: during a promotion, a syntactically valid address on a domain with no MX plus a disposable flag and high signup velocity is a stronger signal than any one of those checks alone.
A risk score only works if everyone agrees on what “risk” means. Start by writing down the decision it supports. Do you want to block bad signups, reduce bounces, or cut support load? If the score tries to do everything at once, it becomes confusing and hard to tune.
These are related, but not the same.
Deliverability risk is about whether you can reach the address. It maps to outcomes like hard bounces, repeated soft bounces, or damage to sender reputation.
Fraud or abuse risk is about the user behind the address. It maps to outcomes like chargebacks, coupon abuse, fake accounts, spam reports, or unusually low lifetime value.
You can handle this in two simple ways: keep two separate scores, or keep one score but label it clearly (for example, “signup abuse risk”).
Pick one primary outcome to predict, then add secondary ones later. Good starting targets include:
Then decide what a false positive costs you. If you block a real customer, you lose revenue and trust. If you let a risky signup through, you might pay in chargebacks, support time, or deliverability damage. Your tolerance for those tradeoffs should shape thresholds.
Finally, choose a score range and what it means. A 0-100 scale is easy to communicate, but only if you define bands like 0-24 low risk, 25-59 medium, 60-100 high. Tie each band to an action so the score is more than a number.
A risk score is only useful if people trust it. That means you should be able to answer two questions in plain words: why did this signup get this score, and what should we do next?
A points-based rubric is usually the easiest place to start. It behaves like a checklist with a total score, and it’s simple to write into a policy doc. A weighted average or a small model can be more precise, but it’s harder to explain when someone asks, “Why was this blocked?”
Here’s a simple weighting approach using four core signals. Keep the math boring and the outcomes clear:
That adds up to 100. In this setup, higher score means higher risk.
Missing data will happen, especially with DNS timeouts or temporary lookup failures. Decide upfront whether “unknown” should be neutral, mildly risky, or a reason to retry:
Calibrate for your product. A B2B invite flow can be stricter on domain health and MX (company emails are expected). A consumer signup can allow more free email domains but should be tougher on disposable flags.
A good score starts by making messy signals comparable. Don’t try to score every tiny detail. Convert each signal into a few clear buckets that a non-technical teammate can recognize and explain.
Pick 3 to 4 buckets per signal. For example: syntax (valid, questionable, invalid), domain health (healthy, unknown, broken), disposable flag (no, maybe, yes), and historical outcomes (good history, mixed, bad). Keep the bucket names plain.
Then assign points with a simple pattern: good gets low points, unsure gets medium, bad gets high. If disposable emails are a top abuse driver for you, give that signal more weight than minor syntax quirks.
A practical flow:
The score only matters if it drives a consistent action. Keep the bands few and obvious:
Logging isn’t optional. Save the input buckets (not just the final score) and the action taken. If you use an email validation API, store the key outputs you relied on so you can compare scores to real outcomes later.
A simple email risk score works best when it’s easy to explain to support, fraud, and marketing. Start with 0 points (lowest risk) and add points when a signal increases the chance the address will bounce, be fake, or lead to abuse.
| Signal (from your email validation signals) | Rule | Points |
|---|---|---|
| Syntax | RFC-valid | +0 |
| Syntax | Invalid or suspicious (extra dots, bad quoting) | +40 |
| Domain exists | Domain resolves | +0 |
| Domain exists | NXDOMAIN / cannot resolve | +30 |
| MX records | MX present | +0 |
| MX records | No MX | +25 |
| Disposable flag | Not disposable | +0 |
| Disposable flag | Disposable / temp provider | +35 |
| Domain health | Normal sender domain reputation | +0 |
| Domain health | Newly seen or high complaint history | +15 |
| Historical outcomes | Past bounces for this domain/user pattern | +20 |
Edge case guidance: if syntax is valid and the domain resolves but there is no MX, treat it as medium risk by default, not an auto-block. Some domains are misconfigured temporarily, and you don’t want false declines.
To keep the rubric stable as you add new signals, cap the impact of any new signal (for example, +10 to +15), and only change thresholds after you have outcome data.
A big promo can change your traffic overnight. You get more real customers, but you also attract bots, coupon hunters, and people using throwaway addresses to grab the offer and disappear. A risk score helps you decide who can sign up smoothly and who should face an extra check.
Assume your scoring range is 0 to 100 (higher = riskier). You run email validation signals (syntax, domain and MX checks, disposable flags, and your own past outcomes) through one pipeline, then assign points.
Here are two signups from the same campaign:
| Signal summary | Points | Total | Decision | |
|---|---|---|---|---|
| [email protected] | Clean syntax, domain resolves, MX present, not disposable, domain has low bounce history | +0, +0, +0, +0, +5 | 5 | Allow, no friction |
| [email protected] | Syntax OK, domain resolves, MX present, disposable flagged, domain has high past bounce and complaint rates | +0, +0, +0, +40, +35 | 75 | Add verification |
For the second signup, “add verification” can be lightweight: email OTP, magic link, or requiring the user to verify before redeeming the promo. You’re not blocking everyone. You’re adding speed bumps only where signals say it’s worth it.
Over time, adjust weights using outcomes, not guesses. If disposable-flagged emails still convert and stay active, reduce that penalty. If a specific domain keeps producing bounces, chargebacks, or spam complaints, increase the domain-history points.
The fastest way to lose trust in a risk score is to treat one signal as a verdict. Email signals are messy: networks glitch, domains get misconfigured, and real people sometimes use addresses that look suspicious.
Disposable detection is useful, but it’s not the same as fraud. If you overweight it, you’ll block real users who just want privacy or a quick trial. A safer approach is to score it heavily, then pair it with context like signup velocity, payment intent, or whether the address passes domain and MX checks.
DNS timeouts happen. Don’t score a timeout the same as “domain does not exist.” Keep a separate bucket for “unknown right now,” retry once, and only increase risk slightly unless other signals agree.
It’s easy to double count the same idea. For example, “no MX record” and “domain verification failed” can be two views of the same problem. If you add both at full weight, you inflate risk without improving accuracy. Pick the clearest version, or reduce weights when signals overlap.
As campaigns change and attackers adapt, yesterday’s weights stop matching reality. Review outcomes (bounces, complaints, chargebacks, activation rates) on a schedule and watch for sudden shifts.
Keep testing and staging separate. Seed data like [email protected], QA scripts, and internal domains can poison your “good vs bad” outcomes.
A practical prevention checklist:
A score is only useful if it matches what happens later. Treat your risk score like a prediction you can check.
Start by collecting outcomes tied to the signup: delivered vs bounced, complaints, account abuse, chargebacks, support flags, and any confirmed fraud.
Pick a few score bands (for example: Low, Medium, High) and review them on a fixed cadence. Weekly is usually enough at first; daily helps when you’re running big campaigns or changing policies.
Look for separation: high-risk should have noticeably worse outcomes than low-risk. Track a small set of metrics consistently:
If the bands look similar, your signals aren’t pulling their weight or your thresholds are off. For example, if “High” risk has the same bounce rate as “Low”, you’re either being too strict on harmless signals (like minor syntax quirks) or too lenient on strong signals (like disposable detection).
Change one thing at a time, and make the change measurable. The safest knob is usually the threshold, not the whole model.
Run an A/B test (or a small holdout) before rolling out new cutoffs. Example: for one week, block only the top 1% highest-risk signups in the test group, while the control uses your current policy. Compare bounce, abuse, and good-user loss.
Most rollout problems come from unclear actions, missing guardrails, or noisy inputs.
Now map the score to an action. If two people on your team would pick different actions for the same score, the policy isn’t ready.
If you want a fourth band during campaigns, add it explicitly (and keep it operationally simple): very high risk can be throttled or blocked.
A scoring rubric only helps if you can operate it day to day. Treat your risk score like any other decision system: log enough to debug it, explain it to humans, and keep it stable under real traffic.
Start with logging so you can recreate what happened for any signup. Capture the raw inputs (syntax result, domain/MX checks, disposable flag, any blocklist match), the final score, and the action you took (allow, friction, review, block). Store a request ID so support can find the full story fast.
Make the score explainable in plain words. Alongside the numeric score, keep a short reason string that someone can read in five seconds. For example: “Disposable provider + failed MX lookup, score 82, blocked.”
Email checks can fail for reasons that have nothing to do with the user. Plan for timeouts, limited retries, rate limits, and safe fallbacks. If checks fail, return a conservative “unknown” state instead of making the score jump around.
Email logs are sensitive. Keep only what you need, restrict access, and set a retention window (for example, 30 to 90 days for raw signals). If you need longer-term analytics, store aggregates or hashed identifiers instead of full addresses.
Start small. Your first score should be a clear rubric that a teammate can read and apply without a spreadsheet. If you can’t explain why a signup got a 72 instead of a 28, you won’t trust it when it matters.
Ship a few signals you understand, then tune only after you have real outcomes (bounces, chargebacks, abuse reports, successful activations). Keep actions simple so they’re easy to run:
Implementation is easier when core signals arrive in one place. For example, Verimail (verimail.co) provides an email validation API that returns checks like RFC-compliant syntax validation, domain verification, MX lookup, and disposable and blocklist matching in a single response. Use those results as inputs to your rubric, then keep the decision rules in your own policy so they stay easy to explain and change.
Once it’s live, measure it like a product feature. Log the score, the band, the decision you took, and the outcome you observed later. Review a small sample of false positives (blocked but legitimate) and false negatives (allowed but harmful), then adjust one rule at a time. The simplest version you monitor and update beats a complicated model nobody can explain.
An email risk score is a quick summary of how likely an address is to cause problems like bounces, fake signups, or abuse. It helps you make consistent decisions when a simple pass/fail “valid email” check isn’t enough.
Email validation answers “can this email probably receive mail?” while a risk score answers “how risky is it to accept this signup right now?” You can have an address that looks deliverable but still scores as risky because it’s disposable or matches patterns linked to abuse.
Start with a small set that’s easy to explain: RFC-style syntax results, whether the domain resolves, whether MX records are present, and whether the address matches known disposable providers. Add historical outcomes later once you can measure what actually happened after signup.
Use a few clear bands tied to actions, like allow, allow with verification, and block or review. Keep the bands stable at first, then adjust thresholds based on observed outcomes such as bounce rate, complaints, or abuse events.
Store simple reason strings alongside the score, such as “disposable provider” or “domain cannot receive mail.” If a teammate can’t answer “why did we block this?” in one sentence, the model is too hard to operate.
Treat timeouts as “unknown” and retry once before scoring. If it’s still unknown, add a small penalty rather than treating it like a hard failure, because temporary DNS issues shouldn’t permanently label real users as high risk.
Disposable detection is very useful, but it shouldn’t automatically mean “block.” A common default is to increase the score and require email verification or limit high-value actions, then tighten or relax based on your conversion and abuse data.
Deliverability risk is about reachability and bounces, while fraud risk is about harmful user behavior like chargebacks or coupon abuse. If you mix them without labeling, teams will argue about what the number means, so either keep two scores or clearly name the single score’s purpose.
Log the inputs you relied on, the final score, the band, the action taken, and a short reason so you can debug decisions later. Keep retention limited and access restricted because email logs are sensitive, and consider storing only what you need for operations and measurement.
Tune using outcomes, not gut feel, by checking whether high-risk signups actually have worse bounce or abuse rates than low-risk ones. Change one thing at a time—often just the threshold—then compare results with a small holdout or A/B test so you can see the tradeoff in false blocks versus prevented abuse.