URL sorter & deduper
Take a messy list of URLs, apply rules (lowercase host, strip tracking, ignore trailing slash, ignore fragment, sort params), deduplicate by the normalized form, and sort.
Normalize before dedupe
Ready.
How to use
- Paste your URL list.
- Toggle the rules you want under Normalize before dedupe: Lowercase host, Strip tracking params, Ignore trailing slash, Ignore fragment, Sort query params.
- Pick Sort order — Original, A→Z, By host, By length.
- The Output count vs input count tells you how many duplicates were removed.
Common workflow
- Merging campaign reports — strip tracking params first so the same destination URL with different
utm_*values counts as one. - Building a sitemap from logs — strip fragments and tracking, sort by host so canonical paths land together.
- Deduplicating backlink exports — lowercase host and ignore trailing slash to normalise minor variations.
FAQ
How is this different from the normalizer?
The normalizer just rewrites URLs. This tool normalizes and dedupes by the normalized form — the output is one row per unique URL, not one row per input.
Which URL is kept on dedupe?
The first occurrence in input order. Sort applies after dedupe.
Does it preserve invalid URLs?
Yes — they pass through unchanged. Use the validator to filter them out first if you need a clean dataset.
Can I get a count of duplicates per URL?
Not directly here — use the domain extractor with "Count per group" set to Full hostname and the same input.