Inside Databay's Free Proxy List: How We Discover, Verify, and Score Every Entry

Daniel Okonkwo Daniel Okonkwo 12 min read

A transparent walkthrough of the verification pipeline behind Databay's free proxy list — where the proxies come from, how we probe them, what each column actually measures, and why dead proxies disappear instead of sticking around to inflate the count.

Why Methodology Matters for a Free Proxy List

Most free proxy lists publish numbers without explaining what the numbers mean. A column called 'uptime' might be measured over the last hour, the last day, or the entire lifetime of the entry. A 'latency' figure might be from a single probe, a rolling median, or a smoothed average. A proxy labeled 'Elite' might have been classified once a week ago and never rechecked. None of this is usually disclosed.

We take a different approach because a free proxy list is a public dataset, and a public dataset without methodology is almost worthless. If you're going to pick proxies off this page for scraping, geo-testing, or privacy work, you need to know what 'verified' means, how fresh the data is, and what the signals actually tell you. This article walks through every step of the pipeline — from harvesting candidate IPs to publishing the final list — so you can decide what to trust and when.

Where the Proxies Come From

Every entry in the list starts as a candidate IP:port pair surfaced by our upstream checker. Candidates come from three sources:

Public aggregators. A long tail of community-maintained proxy feeds, GitHub repos, and scanning projects publish lists of exposed proxies. We pull the unique candidates from a rotating set of these, dedupe by IP:port, and feed them into verification. The original sources are not authoritative — most of the proxies published by aggregators are dead or misbehaving — so the feed is raw material, not trusted data.

Internet-wide scans for common proxy ports. The second source is broad port scans for the handful of TCP ports commonly used by public HTTP and SOCKS5 proxies (3128, 8080, 8000, 8888, 1080, etc.). Any open port that speaks the protocol handshake correctly becomes a candidate. This is the same technique Shodan and Censys use and is how most freshly deployed public proxies enter the pipeline.

Referral and history. Proxies that have ever passed verification are re-probed indefinitely, even after periods of being offline. Some free proxies come back after a reboot or network change, and continuing to probe known-good candidates catches those re-activations without having to wait for an aggregator to notice.

At any given time the candidate pool is much larger than the final list. The checker runs tens of thousands of probes per minute to narrow the pool down to the few thousand proxies that actually work right now.

The Verification Pipeline

Every candidate goes through the same verification cycle. The cycle repeats roughly every 10 minutes per proxy, which is what lets us advertise 'verified every 10 minutes' as the freshness claim on the page.

Step 1: TCP reachability. Open a plain TCP connection to the IP:port with a short timeout (5 seconds). If it refuses, times out, or returns RST, the probe fails and the candidate is marked offline for this cycle.

Step 2: Protocol handshake. For HTTP candidates, send a minimal HTTP request through the port. For SOCKS5 candidates, perform the SOCKS5 greeting and wait for the method-selection response. A proxy that accepts TCP but doesn't speak the expected protocol is flagged as non-proxy and removed from the candidate pool.

Step 3: HTTP request via proxy. Send a real HTTP request through the proxy to a control endpoint that reflects the headers it received. This does several things at once: it measures latency, confirms the proxy actually forwards traffic, captures the headers so we can classify anonymity, and records the observed source IP (which must match the proxy's IP — if it doesn't, the candidate is flagged as chained or misbehaving).

Step 4: HTTPS tunnel test. Issue an HTTP CONNECT request through the proxy to an HTTPS destination. If the CONNECT succeeds and we can complete a TLS handshake through the tunnel, the proxy supports HTTPS. If CONNECT succeeds but the TLS handshake fails because the proxy presents its own certificate, the proxy is flagged as loose-SSL (marked with a lock icon in the table).

Step 5: Google compatibility probe. Make one request through the proxy to a Google endpoint. If the response is the normal Google page, the proxy is marked Google-passed. If the response is a rate-limit page, a CAPTCHA, or a block, it isn't. This matters because Google aggressively blacklists proxy IPs, and a lot of otherwise-functional free proxies are useless for any Google-related task.

Step 6: Anonymity classification. Based on the headers captured in step 3, the proxy is classified as Elite, Anonymous, or Transparent. See the proxy anonymity levels explained post for the full classification rules.

Every probe outcome — success, failure, and consecutive-failure streaks — is tracked per proxy. Proxies that accumulate too many consecutive failures are evicted automatically. All metrics are persisted so the per-proxy uptime number reflects long-term behavior rather than a snapshot.

What Each Column Actually Measures

The proxy table on the list page publishes several signals per entry. Here's what each one means precisely:

IP Address / Port. The address clients should connect to. Always IPv4 currently — IPv6 free proxies exist but are rare enough that we haven't surfaced them yet.

Country. Derived from the IP via a maintained GeoIP database (ISO 2-letter code plus human-readable name). The country field reflects the IP's registered location, not where the proxy operator is; for consumer ISPs this is usually reliable, but for datacenter IPs it can sometimes be stale.

Protocol. Reports which of HTTP, HTTPS, and SOCKS5 the proxy handshakes successfully. The column is composite — a proxy might support both HTTP and HTTPS, shown as an HTTP/S badge.

Anonymity. Elite (strips all identifying headers), Anonymous (hides client IP but announces proxy usage), or Transparent (forwards client IP). Classified from observed headers on every probe cycle, so this updates if the proxy's behavior changes.

SSL. A three-state field: full (valid certificate chain, works out of the box), loose (HTTPS tunnels but requires disabling certificate verification — lock icon), or unsupported (HTTP destinations only — cross icon).

Google. Boolean — whether the proxy can reach Google without being blocked. Useful as a quick filter for any Google-related scraping work.

Speed. Median end-to-end latency in milliseconds for the most recent successful probe. Bucketed in the UI as Fast (<500ms), Medium (<1500ms), and Slow (>=1500ms).

Uptime. The share of successful probes across the full lifetime of the IP:port pair, expressed as a percentage rounded to one decimal. This is a cumulative figure, which means a proxy that's been around for weeks has a more stable uptime figure than one added this morning. Proxies with persistent failures are removed entirely, so a visible uptime always reflects a proxy that works at least sometimes.

Updated. Elapsed time since the most recent successful probe. Formatted as 'Xs ago', 'Xm ago', or 'Xh ago' depending on magnitude. Entries whose last check is older than 6 hours are filtered out server-side and never appear on the list.

What We Actively Remove From the List

The list only shows proxies we're willing to stand behind under the rules above. Several categories of candidates are removed and never reach the page:

Persistently failing proxies. Any candidate with too many consecutive failures is evicted from the active pool. We don't padding the count with dead entries — if a proxy stops responding, it disappears on the next cycle.

Proxies that chain or strip CONNECT. Some misconfigured or malicious proxies accept your request but relay it through additional hops, or refuse CONNECT outright. These fail the HTTPS tunnel test and are marked as HTTP-only or dropped entirely.

Proxies presenting false source IPs. If the IP seen by our control endpoint doesn't match the proxy's listed IP, the proxy is either chained, NAT'd through another relay, or actively rewriting — all of which make it unreliable for IP-targeted work. These get dropped.

Proxies that modify response bodies. Our probe fetches a known-content page and compares the bytes. Proxies that inject ads, scripts, or content are flagged and removed. This won't catch every manipulation (some proxies only modify specific target domains) but it catches the easy cases that free-proxy lists tend to publish without filtering.

Proxies older than 6 hours. A 6-hour staleness cutoff is enforced at the page-render layer. If verification has had an outage or the proxy hasn't been re-probed in the current cycle for any reason, the entry won't render on the page. This prevents stale data from leaking into the list during infrastructure hiccups.

Net effect: the list size shifts throughout the day as the churn works itself out. On a typical day you'll see numbers fluctuating between 4,000 and 8,000 proxies. A sudden dip usually reflects an aggregator source going quiet or a batch of formerly-good proxies being blacklisted by something big like a CloudFlare update.

Data Freshness: What 'Updated Every 10 Minutes' Really Means

Two separate clocks are involved, and the distinction matters for anyone building on top of our API.

The re-verification interval. Every proxy is re-probed on a rolling cadence that averages around 10 minutes per proxy. This is the meaningful freshness claim — any proxy on the list has been probed within roughly the last 10 minutes, which is why we surface the interval as 'verified every 10 minutes' on the page.

Page-to-data propagation. Fresh verification batches propagate to the public list within seconds of being produced. For anyone watching the page in real time, the 'Last Updated' counter in the hero reflects the freshest entry's verification time and increments once per second in the browser.

For API consumers. The API endpoint /api/v1/proxy-list caches responses for 10 seconds to absorb burst traffic. If you pull twice in that window, you get the same data. If you want the freshest possible list, pull at ~10-second intervals rather than more aggressively. For most scraping use cases, pulling every 5-10 minutes is more than enough to stay within the verification horizon.

For the Dataset JSON-LD. The dateModified field in the page's structured data reflects the freshest entry's verification time, so it ticks forward every time the freshest proxy on the list gets re-verified. Schema validators and AI crawlers reading the JSON-LD see a constantly-refreshed timestamp, which is the honest representation of what the dataset actually looks like.

What's Not Captured (and Why)

A few things deliberately don't appear in the verification pipeline or the published data:

Geographic probe diversity. Our checker runs from a limited set of network locations, which means a proxy that blocks traffic from our probe regions but accepts traffic from the rest of the internet will be marked dead by us and might still work for you. This is rare but not zero — some regionalized proxies have asymmetric reachability. It's a known limitation of any centralized probing approach.

Throughput testing. We measure latency per request, not sustained throughput. A proxy with low latency might still choke on parallel requests or heavy payloads. Real throughput depends too much on what you're trying to push through the proxy to bake into a general figure, and adding it would slow the verification cycle considerably.

IPv6 support. Not tested yet. Most free proxies are IPv4-only and IPv6 proxies are a small enough slice of the public-proxy world that we haven't invested in the separate probe infrastructure. This may change — if you have a use case that specifically needs free IPv6 proxies, let us know.

Per-destination testing. Our Google probe is the closest we come to destination-specific testing. We don't probe Facebook, Instagram, major e-commerce sites, or specific targets because the matrix explodes and false negatives become common (a proxy that Facebook rejects in our region might work fine for you). The Google-passed signal is a decent rough proxy for 'can reach major commercial services' but is not a guarantee.

Operator identity. We don't and can't verify who runs each proxy. Some entries are run as legitimate public services; some are misconfigured; some are honeypots or malicious. Our probes detect behavior, not intent. Treat every free proxy as untrusted infrastructure and structure your usage accordingly (see our safety guide).

Publishing the Data: Page, API, and Downloads

The data surfaces in four places, all drawing from the same verified source so they stay in sync:

The list page at /free-proxy-list. HTML table with filter, sort, and pagination. The first 50 rows are server-rendered for crawler visibility; the rest load via JavaScript from an inline JSON payload. Protocol and anonymity sub-pages (/http, /https, /socks5, /elite, /anonymous, /transparent) present the same data filtered to the selected slice, and /free-proxy-list/{country} routes filter by country.

The JSON API at /api/v1/proxy-list. Filter via query parameters (protocol, country, anonymity, ssl, google, speed, limit, page). Returns the full record (IP, port, country, ISO, protocol, anonymity, SSL, Google-passed, latency, uptime, last-checked timestamp) with pagination metadata.

CSV download. Same data, comma-separated, at /api/v1/proxy-list?format=csv. Easy to drop into pandas, Excel, or a shell pipeline.

Plain-text download. IP:port lines at /api/v1/proxy-list?format=txt. Useful for tools that expect a simple file list, like ProxyChains or curl --proxy batch scripts.

All four are described in a Dataset schema.org block on the list page, which makes the list eligible for Google Dataset Search and provides AI crawlers (ChatGPT, Claude, Perplexity, Gemini) a machine-readable manifest of what the list is, what fields it contains, and where to fetch it. If you're building a tool that needs live proxy data and want AI assistants to be able to route users to it, the same schema points them to us.

Frequently Asked Questions

How often does the list change?
Entries get re-verified on a roughly 10-minute rolling cadence, our fetcher pulls fresh batches every 10 seconds, and the browser UI auto-refreshes the 'Last Updated' counter every second. In practice, between a quarter and a third of the list rotates out and back over any given 6-hour window — free proxies are an inherently churny dataset.
Why did a proxy I used yesterday disappear?
Most likely it failed enough consecutive probes to be evicted from the active pool. Free proxies are volatile: they get rebooted, reassigned DHCP leases, shut down by annoyed admins, or blacklisted by anti-bot services. Our rule is that a proxy stays on the list only while it's actively passing checks; when it stops responding, we remove it rather than show a stale entry. If it comes back online, it'll reappear on the next verification cycle.
Can I trust the 'Google Passed' flag?
Use it as a directional signal, not a guarantee. The flag means the proxy could reach Google from our probe region at the last check. Google serves slightly different responses to different IPs and regions, and the state can change minute to minute if Google blacklists the proxy's IP block. For short scraping jobs, a Google-passed proxy is likely to still work when you try it; for long-running jobs, expect rechecks. If Google-compatibility is load-bearing for your workflow, residential or mobile proxies are substantially more reliable than any free proxy.
Do you log which IPs fetch the list?
Our web server logs standard request metadata (source IP, user-agent, path, timestamp) for operational purposes — rate limiting, debugging, abuse prevention. Logs are retained for 30 days and are not sold or shared with third parties. The free proxy list itself is public and can be fetched anonymously — no account or API key needed.
Can you add SOCKS4 support to the list?
We don't currently test or publish SOCKS4 proxies. SOCKS4 is superseded by SOCKS5 (which adds authentication, IPv6, and UDP support), and the real-world SOCKS4-only population of free proxies is small enough that we've focused infrastructure on HTTP, HTTPS, and SOCKS5 instead. If you specifically need SOCKS4, most free-proxy aggregator sites still publish it.
How is this different from paid Databay proxies?
Completely different infrastructure. The free proxy list surfaces public third-party proxies we don't control — we verify them but we don't run them, and reliability reflects that. Databay's paid residential, datacenter, and mobile proxies run on infrastructure we own end-to-end, with 99.9%+ uptime, sticky sessions, country/city/ASN targeting, and a support contract. The free list is useful for low-stakes scraping, testing, or geo-checks; the paid network is for production workloads where reliability and targeting matter.

Start Collecting Data Today

34M+ IPs across 200+ countries. Pay as you go, starting at $0.50/GB.

Latest from the Blog

Expert guides on proxies, web scraping, and data collection.

Start Using Rotating Proxies Today

Join 8,000+ users using Databay's rotating proxy infrastructure for web scraping, data collection, and automation. Access 34M+ residential, datacenter, and mobile IPs across 200+ countries with pay-as-you-go pricing from $0.50/GB. No monthly commitment, no connection limits - start collecting data in minutes.