Extract URLs from text

pull URLs out of plain text · HTML · Markdown · auto-mode

Paste logs, an email, an HTML dump, a CSV, a code file — get a clean deduplicated list of URLs. Regex plus HTML-aware (<a href>, src, background, srcset, CSS url()).

Input

Mode Dedupe Hostname only Only http(s) Sort A→Z Filter by domain

Output

Ready.

How to use

Paste any text. Files up to ~50 MB work fine.
Pick Mode — Plain text (regex), HTML (parses attributes), Markdown (parses link syntax), Auto (tries all three).
The output list is automatically deduplicated. Toggle Show duplicates if you need them.
Filter options: Host only, Filter by domain, Only http/https, Sort A→Z.
Download as .txt or .csv (with a count column).

When to use which mode

Auto — handles most inputs. Slower than pure regex but catches HTML attribute URLs that regex misses.
Plain text — fastest, finds URLs by regex. Misses attribute values in HTML.
HTML — parses with DOM, pulls from href, src, srcset, style="background-image: url(...)".
Markdown — extracts the URL from [text](url) link syntax.

FAQ

Why does Auto mode return more URLs than Plain?

Plain regex won't follow HTML quoting reliably — it'll match https://example.com">link as one URL with junk on the end. Auto mode parses HTML properly.

Does it follow links to find more?

No — extraction is local, on the text you paste. Crawling needs a server.

How are trailing punctuation (commas, periods) handled?

Stripped automatically. ...visit https://example.com. extracts as https://example.com.

What about scheme-relative URLs (//cdn.example.com/x)?

Currently skipped — they require a base URL to be meaningful. Add the protocol manually if you need them included.