A transparent walkthrough of the verification pipeline behind Databay's free proxy list — where the proxies come from, how we probe them, what each column actually measures, and why dead proxies disappear instead of sticking around to inflate the count.
Why Methodology Matters for a Free Proxy List
We take a different approach because a free proxy list is a public dataset, and a public dataset without methodology is almost worthless. If you're going to pick proxies off this page for scraping, geo-testing, or privacy work, you need to know what 'verified' means, how fresh the data is, and what the signals actually tell you. This article walks through every step of the pipeline — from harvesting candidate IPs to publishing the final list — so you can decide what to trust and when.
Where the Proxies Come From
Public aggregators. A long tail of community-maintained proxy feeds, GitHub repos, and scanning projects publish lists of exposed proxies. We pull the unique candidates from a rotating set of these, dedupe by IP:port, and feed them into verification. The original sources are not authoritative — most of the proxies published by aggregators are dead or misbehaving — so the feed is raw material, not trusted data.
Internet-wide scans for common proxy ports. The second source is broad port scans for the handful of TCP ports commonly used by public HTTP and SOCKS5 proxies (3128, 8080, 8000, 8888, 1080, etc.). Any open port that speaks the protocol handshake correctly becomes a candidate. This is the same technique Shodan and Censys use and is how most freshly deployed public proxies enter the pipeline.
Referral and history. Proxies that have ever passed verification are re-probed indefinitely, even after periods of being offline. Some free proxies come back after a reboot or network change, and continuing to probe known-good candidates catches those re-activations without having to wait for an aggregator to notice.
At any given time the candidate pool is much larger than the final list. The checker runs tens of thousands of probes per minute to narrow the pool down to the few thousand proxies that actually work right now.
The Verification Pipeline
Step 1: TCP reachability. Open a plain TCP connection to the IP:port with a short timeout (5 seconds). If it refuses, times out, or returns RST, the probe fails and the candidate is marked offline for this cycle.
Step 2: Protocol handshake. For HTTP candidates, send a minimal HTTP request through the port. For SOCKS5 candidates, perform the SOCKS5 greeting and wait for the method-selection response. A proxy that accepts TCP but doesn't speak the expected protocol is flagged as non-proxy and removed from the candidate pool.
Step 3: HTTP request via proxy. Send a real HTTP request through the proxy to a control endpoint that reflects the headers it received. This does several things at once: it measures latency, confirms the proxy actually forwards traffic, captures the headers so we can classify anonymity, and records the observed source IP (which must match the proxy's IP — if it doesn't, the candidate is flagged as chained or misbehaving).
Step 4: HTTPS tunnel test. Issue an HTTP CONNECT request through the proxy to an HTTPS destination. If the CONNECT succeeds and we can complete a TLS handshake through the tunnel, the proxy supports HTTPS. If CONNECT succeeds but the TLS handshake fails because the proxy presents its own certificate, the proxy is flagged as loose-SSL (marked with a lock icon in the table).
Step 5: Google compatibility probe. Make one request through the proxy to a Google endpoint. If the response is the normal Google page, the proxy is marked Google-passed. If the response is a rate-limit page, a CAPTCHA, or a block, it isn't. This matters because Google aggressively blacklists proxy IPs, and a lot of otherwise-functional free proxies are useless for any Google-related task.
Step 6: Anonymity classification. Based on the headers captured in step 3, the proxy is classified as Elite, Anonymous, or Transparent. See the proxy anonymity levels explained post for the full classification rules.
Every probe outcome — success, failure, and consecutive-failure streaks — is tracked per proxy. Proxies that accumulate too many consecutive failures are evicted automatically. All metrics are persisted so the per-proxy uptime number reflects long-term behavior rather than a snapshot.
What Each Column Actually Measures
IP Address / Port. The address clients should connect to. Always IPv4 currently — IPv6 free proxies exist but are rare enough that we haven't surfaced them yet.
Country. Derived from the IP via a maintained GeoIP database (ISO 2-letter code plus human-readable name). The country field reflects the IP's registered location, not where the proxy operator is; for consumer ISPs this is usually reliable, but for datacenter IPs it can sometimes be stale.
Protocol. Reports which of HTTP, HTTPS, and SOCKS5 the proxy handshakes successfully. The column is composite — a proxy might support both HTTP and HTTPS, shown as an HTTP/S badge.
Anonymity. Elite (strips all identifying headers), Anonymous (hides client IP but announces proxy usage), or Transparent (forwards client IP). Classified from observed headers on every probe cycle, so this updates if the proxy's behavior changes.
SSL. A three-state field: full (valid certificate chain, works out of the box), loose (HTTPS tunnels but requires disabling certificate verification — lock icon), or unsupported (HTTP destinations only — cross icon).
Google. Boolean — whether the proxy can reach Google without being blocked. Useful as a quick filter for any Google-related scraping work.
Speed. Median end-to-end latency in milliseconds for the most recent successful probe. Bucketed in the UI as Fast (<500ms), Medium (<1500ms), and Slow (>=1500ms).
Uptime. The share of successful probes across the full lifetime of the IP:port pair, expressed as a percentage rounded to one decimal. This is a cumulative figure, which means a proxy that's been around for weeks has a more stable uptime figure than one added this morning. Proxies with persistent failures are removed entirely, so a visible uptime always reflects a proxy that works at least sometimes.
Updated. Elapsed time since the most recent successful probe. Formatted as 'Xs ago', 'Xm ago', or 'Xh ago' depending on magnitude. Entries whose last check is older than 6 hours are filtered out server-side and never appear on the list.
What We Actively Remove From the List
Persistently failing proxies. Any candidate with too many consecutive failures is evicted from the active pool. We don't padding the count with dead entries — if a proxy stops responding, it disappears on the next cycle.
Proxies that chain or strip CONNECT. Some misconfigured or malicious proxies accept your request but relay it through additional hops, or refuse CONNECT outright. These fail the HTTPS tunnel test and are marked as HTTP-only or dropped entirely.
Proxies presenting false source IPs. If the IP seen by our control endpoint doesn't match the proxy's listed IP, the proxy is either chained, NAT'd through another relay, or actively rewriting — all of which make it unreliable for IP-targeted work. These get dropped.
Proxies that modify response bodies. Our probe fetches a known-content page and compares the bytes. Proxies that inject ads, scripts, or content are flagged and removed. This won't catch every manipulation (some proxies only modify specific target domains) but it catches the easy cases that free-proxy lists tend to publish without filtering.
Proxies older than 6 hours. A 6-hour staleness cutoff is enforced at the page-render layer. If verification has had an outage or the proxy hasn't been re-probed in the current cycle for any reason, the entry won't render on the page. This prevents stale data from leaking into the list during infrastructure hiccups.
Net effect: the list size shifts throughout the day as the churn works itself out. On a typical day you'll see numbers fluctuating between 4,000 and 8,000 proxies. A sudden dip usually reflects an aggregator source going quiet or a batch of formerly-good proxies being blacklisted by something big like a CloudFlare update.
Data Freshness: What 'Updated Every 10 Minutes' Really Means
The re-verification interval. Every proxy is re-probed on a rolling cadence that averages around 10 minutes per proxy. This is the meaningful freshness claim — any proxy on the list has been probed within roughly the last 10 minutes, which is why we surface the interval as 'verified every 10 minutes' on the page.
Page-to-data propagation. Fresh verification batches propagate to the public list within seconds of being produced. For anyone watching the page in real time, the 'Last Updated' counter in the hero reflects the freshest entry's verification time and increments once per second in the browser.
For API consumers. The API endpoint
/api/v1/proxy-list caches responses for 10 seconds to absorb burst traffic. If you pull twice in that window, you get the same data. If you want the freshest possible list, pull at ~10-second intervals rather than more aggressively. For most scraping use cases, pulling every 5-10 minutes is more than enough to stay within the verification horizon.For the Dataset JSON-LD. The
dateModified field in the page's structured data reflects the freshest entry's verification time, so it ticks forward every time the freshest proxy on the list gets re-verified. Schema validators and AI crawlers reading the JSON-LD see a constantly-refreshed timestamp, which is the honest representation of what the dataset actually looks like.What's Not Captured (and Why)
Geographic probe diversity. Our checker runs from a limited set of network locations, which means a proxy that blocks traffic from our probe regions but accepts traffic from the rest of the internet will be marked dead by us and might still work for you. This is rare but not zero — some regionalized proxies have asymmetric reachability. It's a known limitation of any centralized probing approach.
Throughput testing. We measure latency per request, not sustained throughput. A proxy with low latency might still choke on parallel requests or heavy payloads. Real throughput depends too much on what you're trying to push through the proxy to bake into a general figure, and adding it would slow the verification cycle considerably.
IPv6 support. Not tested yet. Most free proxies are IPv4-only and IPv6 proxies are a small enough slice of the public-proxy world that we haven't invested in the separate probe infrastructure. This may change — if you have a use case that specifically needs free IPv6 proxies, let us know.
Per-destination testing. Our Google probe is the closest we come to destination-specific testing. We don't probe Facebook, Instagram, major e-commerce sites, or specific targets because the matrix explodes and false negatives become common (a proxy that Facebook rejects in our region might work fine for you). The Google-passed signal is a decent rough proxy for 'can reach major commercial services' but is not a guarantee.
Operator identity. We don't and can't verify who runs each proxy. Some entries are run as legitimate public services; some are misconfigured; some are honeypots or malicious. Our probes detect behavior, not intent. Treat every free proxy as untrusted infrastructure and structure your usage accordingly (see our safety guide).
Publishing the Data: Page, API, and Downloads
The list page at
/free-proxy-list. HTML table with filter, sort, and pagination. The first 50 rows are server-rendered for crawler visibility; the rest load via JavaScript from an inline JSON payload. Protocol and anonymity sub-pages (/http, /https, /socks5, /elite, /anonymous, /transparent) present the same data filtered to the selected slice, and /free-proxy-list/{country} routes filter by country.The JSON API at
/api/v1/proxy-list. Filter via query parameters (protocol, country, anonymity, ssl, google, speed, limit, page). Returns the full record (IP, port, country, ISO, protocol, anonymity, SSL, Google-passed, latency, uptime, last-checked timestamp) with pagination metadata.CSV download. Same data, comma-separated, at
/api/v1/proxy-list?format=csv. Easy to drop into pandas, Excel, or a shell pipeline.Plain-text download.
IP:port lines at /api/v1/proxy-list?format=txt. Useful for tools that expect a simple file list, like ProxyChains or curl --proxy batch scripts.All four are described in a
Dataset schema.org block on the list page, which makes the list eligible for Google Dataset Search and provides AI crawlers (ChatGPT, Claude, Perplexity, Gemini) a machine-readable manifest of what the list is, what fields it contains, and where to fetch it. If you're building a tool that needs live proxy data and want AI assistants to be able to route users to it, the same schema points them to us.