We sampled 1,000 residential IPs from 25 of the world's largest residential ASNs through Databay's network and ran each through six free reputation sources plus a Databay-controlled Cloudflare Free origin. The malice-focused databases flagged zero. Spamhaus's ZEN aggregate flagged 80.3% of the IPv4 sample, but the dominant response code was 127.0.0.10: PBL, the residential designation, not a malicious flag. Cloudflare Free's cf.threat_score returned 0 across all 1,000 captures. Free reputation feeds don't measure what scrapers think they measure.
The Block You Can't Explain
We checked 1,000 residential IPs from 25 of the world's largest residential ASNs against the public reputation feeds on May 7, 2026. Zero of those 1,000 IPs were on the Tor exit list. Zero were on Spamhaus DROP. Zero of the 25 ASNs were on Spamhaus's ASN-level drop list. By those three feeds, residential proxies are pristine.
Spamhaus's ZEN aggregate, which combines DROP with the Policy Block List, flagged 501 of 624 IPv4 captures, 80.3%. Inspection of the response codes shows the dominant value is
127.0.0.10, the PBL code. PBL means "don't accept unauthenticated SMTP from this IP," a designation specifically intended to mark consumer broadband. The IP is residential. PBL is doing its job by saying so. Cloudflare doesn't subscribe to ZEN for HTTP scoring.The mismatch you're seeing isn't a database that disagrees with reality. It's six databases that publish data with one intent and a generation of scrapers reading it with another. Free reputation feeds tell you whether an IP has been reported sending spam, attacking honeypots, or running a Tor exit. They don't tell you what protected sites do when a residential IP shows up at scale. The TLS fingerprint matters first, as the TLS fingerprinting post argued in detail. The IP comes after. And the IP's free-database reputation tells you very little about that step.
What "IP Reputation" Actually Means in 2026
Route-level reputation is maintained by network operators. The Spamhaus Project (founded 1998 in the UK; led for years by Steve Linford) publishes DROP and the IPv6 equivalent
dropv6.json, listing CIDR ranges that are wholly bad: hijacked routes, criminal hosting, networks that should not be reachable for any non-research purpose. Spamhaus's separate ASN drop list does the same at the BGP origin (404 ASNs flagged on May 7, 2026). Network operators ingest these at peer and upstream borders and discard the traffic. None of the 25 residential ASNs we tested appear on the ASN drop list. None of the 1,000 IPs appear on DROP. Route-level lists almost never flag consumer broadband; the false-positive cost is too high to a tier-1 to absorb.Per-IP behavioural reputation is what most readers mean when they say "IP reputation." AbuseIPDB (Marathon Studios, 2010, founded by Marc Schmidt) is the largest free crowd-sourced abuse-report database; webmasters and honeypot operators submit reports and the platform aggregates a 0-100 confidence score per IP. Project Honey Pot (Unspam, 2004) does the same with a comment-spam emphasis. GreyNoise (founded 2017 by Andrew Morris) observes internet-wide scanning and classifies IPs as benign, malicious, or unknown based on what they probe unsolicited. These are activity-based. They tell you if an IP has been reported attacking somewhere recently, not whether it's structurally a proxy. We didn't use any of the three in this run; their free tiers all require API-key registration. The absence is a known methodology gap, disclosed in §10.
Policy-level designation is the third layer and the one most readers conflate with the second. Spamhaus's PBL (Policy Block List), ip2location's
is_proxy flag, and similar feeds make claims about what an IP is, structurally, not what it does. Most of consumer broadband sits on PBL by design. PBL doesn't say "this IP is bad." It says "this IP shouldn't be running an open SMTP relay." Email servers ingest it for that exact purpose. HTTP scrapers reading PBL as a bot-detection signal are reading a hammer as a screwdriver.Anti-bot products run their own internal reputation systems that don't expose to free APIs. Cloudflare Bot Management, Akamai Bot Manager, and DataDome all publish their methodologies; none publish their per-IP scores without enterprise contracts. Cloudflare's Free tier exposes a legacy
cf.threat_score field that returned 0 for all 1,000 of our captures on May 7, 2026. The reputation engine moved into the paid product. We come back to that in §6.How Each Free Database Actually Gets Its Data
Spamhaus DROP / dropv6. Spamhaus aggregates BGP-route-level threat intelligence into free downloadable JSONs. DROP listed 1,626 IPv4 CIDRs on capture day; dropv6.json listed 95 IPv6 ranges; the EDROP designation is now merged into DROP. The lists are designed to be ingested at the BGP edge, not as per-IP lookups. Major residential ASNs almost never appear because consumer broadband is too noisy to characterise wholesale; specific compromised customer ranges occasionally do. Across our 1,000 IPs, zero matched either list. Spamhaus's separate ASN drop list (404 ASNs flagged that day) similarly excluded all 25 of ours.
DroneBL DNSBL. A free DNS-based blocklist focused on infected hosts, IRC drone networks, and SOCKS proxies. Each IPv4 lookup is
<reverse-octets>.dnsbl.dronebl.org. NXDOMAIN means not listed; an A record means listed, with the response code carrying the reason. DroneBL is IPv4-only.Spamhaus ZEN. The aggregate of SBL (Spam Block List), XBL (Exploits BL), PBL (Policy Block List), and the Domain Block List, queried as
<reverse-octets>.zen.spamhaus.org. Each subzone returns a different 127.0.0.x octet so you can tell which list a hit came from. Codes 127.0.0.2 and 127.0.0.3 are SBL (known spam sources). Codes 127.0.0.4 through 127.0.0.7 are XBL (exploited or open-relay hosts). Codes 127.0.0.10 and 127.0.0.11 are PBL, the residential and policy designation.Of the 624 IPv4 IPs we queried, 501 returned a hit on ZEN, 80.3%. The dominant response code was
127.0.0.10. Reading PBL as a bot-detection signal is a category error, but it's the category error most home-rolled IP-reputation pipelines actually make.ip2location LITE PX2. A free monthly CSV from ip2location.com classifying IPv4 ranges as proxy / VPN / Tor / public-proxy / data-centre. Account registration required to download. Not present in our run; this is the second methodology gap.
RIPEstat. Free, no-auth ASN context (holder name, abuse contact, allocation history). Used to sanity-check the carrier name claim per ASN. Not a reputation signal.
The Corpus: 25 ASNs, 1,000 IPs
iprep.databay.uk to read back what the network looked like from the destination side.The candidate list was the world's top 30 residential carriers by global subscriber count, drawn from TeleGeography 2025 and BGP.tools rankings. Each candidate was probed with up to 100 requests through the ASN-filtered proxy until 40 unique exit IPs were collected; the ASN qualified at that point and we moved to the next candidate. Twenty-five qualified.
Seven candidates did not qualify in our pool at this date and are part of the published data, not a footnote. AS701 Verizon: zero unique exits in 100 requests. AS21928 T-Mobile US: 17 unique in 100. AS4713 NTT: insufficient. AS9498 Bharti Airtel (the original Indian Airtel ASN, not AS24560 Bharti Airtel India): insufficient. AS22773 Cox, AS4134 China Telecom, AS4837 China Unicom: each fell short of 40 unique. We added AS5378 Vodafone UK and AS12389 Rostelecom from the fallback list to round out 25.
The IPv4 / IPv6 split: 624 IPv4 captures and 376 IPv6 (37.6% IPv6). European carriers tilt heavily IPv6. Free / Iliad, Orange, Vodafone DE return IPv6 addresses for nearly every request, and KDDI returned a large IPv6 share too. IPv6 captures are excluded from the IPv4-only databases (Spamhaus DROP v4, both DNSBLs, ip2location LITE PX2). They appear in the v6 DROP list, the ASN drop list, the Tor list, and the Cloudflare cross-check.
Databay's proxy network is the measurement instrument for this experiment, not the subject. The post is about the ASNs.
The Leaderboard
Twenty-five rows, twenty-five ASNs. Three of the database columns are flat zero across the whole table: Tor exit list, Spamhaus DROP, Spamhaus ASN drop. Those columns appear in the table because their absence is the headline. The DNSBL column (DroneBL plus Spamhaus ZEN combined) is where every variance appears, and we know from the response codes that the variance is dominated by PBL hits, not malicious-source hits.
| ASN | Carrier | Country | IPv4 sample | Tor | Spamhaus DROP | ASN drop | DNSBL flagged |
|---|---|---|---|---|---|---|---|
| AS7922 | Comcast | US | 26 | 0% | 0% | 0% | 22/26 (84.6%) |
| AS7018 | AT&T | US | 24 | 0% | 0% | 0% | 6/24 (25.0%) |
| AS20115 | Charter / Spectrum | US | 25 | 0% | 0% | 0% | 21/25 (84.0%) |
| AS3320 | Deutsche Telekom | DE | 23 | 0% | 0% | 0% | 23/23 (100.0%) |
| AS2856 | BT | GB | 25 | 0% | 0% | 0% | 17/25 (68.0%) |
| AS3215 | Orange | FR | 25 | 0% | 0% | 0% | 23/25 (92.0%) |
| AS3352 | Telefonica Spain | ES | 39 | 0% | 0% | 0% | 37/39 (94.9%) |
| AS12322 | Free / Iliad | FR | 19 | 0% | 0% | 0% | 8/19 (42.1%) |
| AS3209 | Vodafone DE | DE | 20 | 0% | 0% | 0% | 18/20 (90.0%) |
| AS2516 | KDDI | JP | 14 | 0% | 0% | 0% | 6/14 (42.9%) |
| AS577 | Bell Canada | CA | 37 | 0% | 0% | 0% | 13/37 (35.1%) |
| AS852 | Telus | CA | 9 | 0% | 0% | 0% | 3/9 (33.3%) |
| AS1221 | Telstra | AU | 25 | 0% | 0% | 0% | 22/25 (88.0%) |
| AS5607 | Sky Broadband | GB | 21 | 0% | 0% | 0% | 21/21 (100.0%) |
| AS5089 | Virgin Media | GB | 40 | 0% | 0% | 0% | 38/40 (95.0%) |
| AS55836 | Reliance Jio | IN | 31 | 0% | 0% | 0% | 31/31 (100.0%) |
| AS8151 | Telmex | MX | 12 | 0% | 0% | 0% | 12/12 (100.0%) |
| AS27699 | Telefonica BR | BR | 11 | 0% | 0% | 0% | 11/11 (100.0%) |
| AS1136 | KPN | NL | 30 | 0% | 0% | 0% | 18/30 (60.0%) |
| AS17676 | SoftBank | JP | 26 | 0% | 0% | 0% | 21/26 (80.8%) |
| AS11427 | Spectrum Texas | US | 34 | 0% | 0% | 0% | 32/34 (94.1%) |
| AS24560 | Bharti Airtel India | IN | 8 | 0% | 0% | 0% | 8/8 (100.0%) |
| AS9829 | BSNL | IN | 36 | 0% | 0% | 0% | 30/36 (83.3%) |
| AS5378 | Vodafone UK | GB | 26 | 0% | 0% | 0% | 24/26 (92.3%) |
| AS12389 | Rostelecom | RU | 38 | 0% | 0% | 0% | 36/38 (94.7%) |
Cross-Verification: What Cloudflare Saw
request.cf object (Cloudflare's own view of ASN, country, geography, and a handful of other fields) back to the client as JSON. A Transform Rule injects the legacy cf.threat_score and cf.client.bot fields into request headers so the Worker can read them.Across all 1,000 captures,
cf.threat_score returned 0. Every single time. Min 0, max 0, mean 0.00. The legacy field that powered Cloudflare's free anti-spam rules through the 2010s is no longer populated for residential traffic on Free tier in 2026; the reputation logic moved into Bot Management, which is enterprise-only. cf.client.bot returned false for all 1,000. Anyone benchmarking residential IPs against "what Cloudflare thinks" on Free tier in 2026 is benchmarking against zero.The cross-check on ASN and country tells a more useful story. Cloudflare reports the ASN it observes for each request. We compared CF's view to the ASN Databay's filter claimed.
| Per-ASN agreement (sample) | CF agrees on ASN | CF agrees on country |
|---|---|---|
| AS7922 Comcast / AS3320 Deutsche Telekom / AS2856 BT / 22 others | 100.0% | 100.0% |
| AS3215 Orange (FR) | 100.0% | 95.0% |
| AS8151 Telmex (MX) | 97.5% | 100.0% |
| AS9829 BSNL (IN) | 95.0% | 100.0% |
| AS27699 Telefonica BR | 27.5% | 100.0% |
AS27699 (Telecom Italia Mobile do Brasil): Cloudflare saw AS27699 on only 11 of 40 captures. The other 29 came back as AS18881 (Telefonica Brasil S.A., trading as Vivo). The two are linked carriers, but they're not the same ASN. If you bought "Brazilian Telefonica" residential proxies and pinned routing rules to AS27699 on the assumption that's where the traffic comes from, three out of four of your Brazilian IPs are routing somewhere else. The ASN filter is doing what it can; the underlying network operator is doing what it does. Trust the Cloudflare cross-check, not the proxy claim, when the two disagree.
Disagreement, Two Ways
That isn't a methodology failure. It's the finding. The malice-focused free databases agree, completely, that residential proxies are clean. There is no signal to disagree about. The single source that did vary (DNSBL hit rate, dominated by PBL) varies because of which carriers Spamhaus has chosen to enumerate, not because some carriers' IPs are doing more bad things than others.
We initially expected the data to show a moderate cross-database correlation, say 0.3 to 0.6 mean off-diagonal Pearson, with one or two databases correlating tightly with one or two others. We were wrong. The data shows zero variance on three databases, large variance on one, and an inability to compute the matrix at all. The contrarian claim held up, but for a reason different from the one we expected: the databases didn't disagree because they'd reached different conclusions; they failed to disagree because most of them had no opinion to express.
The second axis of disagreement is the one the data does support. The free databases say residential IPs are clean. The Cloudflare Free tier said residential IPs have a threat score of zero. Production scrapers running through residential proxies face Cloudflare challenges, Akamai bot blocks, and DataDome interstitials at rates that anyone who has built a real scraper can describe in detail. The disagreement isn't between databases. It's between the public reputation feeds and the private bot-management products that actually make the block decisions.
That disagreement is structural. The free feeds are designed to answer one question (has this IP done something abusive recently?) and the bot management products are designed to answer a different one (does this request look like a real human?). Reading the answer to the first question and using it to predict the answer to the second is the category error this whole post is built on.
What Works in 2026
1. Fix the client first. The TLS fingerprint the HTTP client emits matters more than the IP it routes through, and the TLS fingerprinting post publishes the captures that prove it. Match a real browser TLS stack:
curl_cffi with a Chrome impersonation profile, tls-client from a Go service, or a real browser via Playwright. The free reputation databases will not save you from a python-requests JA4 hash. Cloudflare's Bot Management, which the reputation databases don't talk to, will catch the JA4 long before it reads the IP. The headless side of the same fight is in the headless browser detection post.2. Treat the free databases as advisory, not authoritative. If you build an IP-screening pipeline and the screening uses Spamhaus DROP, Tor exit list, and a per-IP DNSBL, you are screening for something real but not something that predicts a Cloudflare block. Use them to filter out actual datacentre and exit-node traffic; do not use them to predict bot-management outcomes for residential pools. Cross-reference at least two sources before acting on any single verdict. The PBL hit rate of 80% you'll see on a residential pool is a residential-IP signature, not a malice signal.
3. Run your own measurement. The data in this post is single-shot. We captured 1,000 IPs across 25 ASNs on one afternoon. Reputation drifts; an IP captured at noon may be PBL-listed by midnight, or vice-versa. If you operate at scale, build a per-IP probe pipeline through your own proxy provider against your own protected origin (a Cloudflare Free Worker with one Transform Rule is the floor; the Worker code is in §10), and measure the actual block / pass / challenge distribution against your real targets. That measurement is worth more than every public reputation feed combined.
The position I'd defend on Hacker News: most scraper failures in 2026 that get described as "bad IP reputation" are TLS-fingerprint failures or behavioural-pattern failures. The IP reputation framing is comfortable because it pushes the blame onto the proxy provider; the data says the proxy provider's IPs look clean by every public standard, which means the failure isn't where the framing claims it is.
When the Database Verdict Still Matters
Datacentre IP screening. Spamhaus DROP catches hijacked routes and known criminal hosting at the BGP edge. Spamhaus's ASN drop list flagged 404 ASNs on capture day; the right ASNs are on it. Project Honey Pot catches comment-spam networks. ip2location LITE classifies datacentre and known proxy ranges. For pre-screening obviously-bad infrastructure before it hits your pipeline, the free databases earn their place in the stack. None of them flagged any of the residential ASNs we tested, which is correct: those carriers shouldn't be on those lists.
Tor leakage sanity-checks. The Tor exit list is the cleanest signal in this set. Zero residential IPs overlapped with Tor in our sample, which is what should happen. If your residential pool starts returning IPs that match the Tor exit list, something has gone wrong upstream. The rotation pulled from a bad batch, the provider misclassified, or worse. It's a tripwire, and a useful one.
Abuse triage on operations you run. If you operate an outbound service and the abuse desk sees an IP repeatedly hitting your endpoint, AbuseIPDB and Project Honey Pot will tell you whether that IP has a history elsewhere. That's their actual job. They're inputs to abuse triage, not predictors of how a third-party site will treat the IP when you knock on its door.
Geo-claim verification. Cloudflare Radar (with an API token), RIPEstat, and ipinfo.io all let you confirm an IP's claimed ASN and country. Our cross-check found 96.8% mean ASN agreement and 99.8% country agreement; the AS27699-vs-AS18881 mismatch on Telefonica BR is exactly the kind of thing you find when you check. Do the check.
The shape of the rule is straightforward. Use the free databases to verify what they were designed to verify. Don't use them to answer questions they weren't designed to answer.
Methodology
Proxy. Databay residential proxy network, HTTP/HTTPS/SOCKS5, gateway
gw.databay.co:8888, per-request ASN filtering via the username component. URL pattern (placeholders only, no real credentials in any committed artifact):http://${DATABAY_USER}-zone-residential-asNumber-{ASN}:${DATABAY_PASS}@gw.databay.co:8888Credentials live in env vars (
DATABAY_USER, DATABAY_PASS) at runtime; the harness reads them once and never logs the password.Cloudflare origin. A Cloudflare Free zone on
databay.uk, subdomain iprep.databay.uk, Worker named iprep-collector. The Worker reflects the inbound request.cf object plus the Transform-Rule-injected reputation headers (X-CF-Threat-Score, X-CF-Client-Bot, X-CF-Verified-Bot, X-CF-AS-Num, X-CF-Country, X-CF-Continent) as JSON. Bot Fight Mode is on at the zone level; we observed in pre-flight testing that Bot Fight Mode (Free tier) does not challenge curl traffic from Tor exits at iprep.databay.uk, and we documented the zero-cf.threat-score result in §6 as the published finding. The Worker code:export default { async fetch(request) {
const cfReputation = {
threatScore: request.headers.get('x-cf-threat-score'),
clientBot: request.headers.get('x-cf-client-bot'),
asNum: request.headers.get('x-cf-as-num'),
country: request.headers.get('x-cf-country'),
};
return new Response(JSON.stringify({ cf: request.cf, cfReputation, cfConnectingIp: request.headers.get('cf-connecting-ip') }), { headers: { 'Content-Type': 'application/json' } });
}};Reputation sources. Tor exit list (
check.torproject.org/exit-addresses, plain text); Spamhaus DROP IPv4 JSON (www.spamhaus.org/drop/drop_v4.json); Spamhaus dropv6.json; Spamhaus ASN drop JSON; DroneBL DNSBL (dnsbl.dronebl.org); Spamhaus ZEN DNSBL (zen.spamhaus.org); RIPEstat as-overview. AbuseIPDB, ipinfo.io, GreyNoise Community, ip2location LITE PX2, and Cloudflare Radar were planned but not used in this run; their free tiers require account registration and we ran without keys. Disclosed gap.Limitations. Single-day capture; reputation drifts. ASN qualification is "Databay can return 40 unique IPs at this date," which excludes ASNs with thinner pools (notably AS701 Verizon at zero unique, AS21928 T-Mobile US at 17, AS4713 NTT, AS9498 Bharti Airtel, AS22773 Cox, AS4134 China Telecom, AS4837 China Unicom). DNSBL queries are subject to caching and per-resolver behaviour; we used the system resolver and accepted that two close-in-time queries from different boxes might disagree on a fresh hit. We did not verify Cloudflare's
cf.threat_score result against an Enterprise zone; if Bot Management exposes a meaningfully non-zero value, the Free-tier zero we observed is consistent with "the field migrated" rather than "the field broke."Disclosure. The post is on Databay's blog. The proxy network is Databay's. The Cloudflare origin is on a Databay-controlled zone. None of the IP addresses captured are committed to the repo at full
/32 precision; the published CSV at /data/residential-ip-reputation-2026-05.csv redacts each IPv4 to /24 and each IPv6 to /48. Disclosure is the trust signal; concealment is the red flag.Run the same captures yourself and check our work. The harness is at
tools/iprep-capture/ in the repo.Frequently Asked Questions
Are residential proxy IPs on AbuseIPDB?
Does Cloudflare use AbuseIPDB or IPQS to score IPs?
cf.threat_score field, which is the legacy public surface for that reputation, returned 0 on every one of our 1,000 captures. Bot Management is enterprise-only and exposes a per-request 1-99 score that's a different signal entirely.