IDN xn--

IDN Encoder / Decoder

IDN Encoder / Decoder

Online Free Text Tool — International Domain Name ↔ Punycode Converter

Auto-convert
Lines: 0 | Chars: 0
Lines: 0 | Chars: 0
Normalize to Lowercase
Strip www. Prefix
Validate RFC Compliance
Show Label Breakdown
Encode All Labels (incl. ASCII)
Add https:// prefix

Why Use Our IDN Encoder / Decoder?

50+ Scripts

Arabic, CJK, Cyrillic, Devanagari & more

RFC 5891

IDNA2008 & UTS #46 compliant

Homograph

Detect lookalike attacks

Batch Mode

Process thousands of domains

Private

100% browser-based

Deep Analysis

Script, codepoint & label info

The Complete Guide to IDN Encoding and Decoding: International Domain Names, Punycode, and Why Every Developer Needs an IDN Converter

The internet was not always as international as it is today. In the early days of the World Wide Web, domain names were restricted to the characters defined by the original DNS standard: letters A through Z, digits 0 through 9, and the hyphen. This limitation made perfect sense in an English-language context but became a significant barrier as the internet expanded globally and users in China, Japan, the Arab world, Russia, India, and hundreds of other linguistic communities needed domain names in their own scripts and languages. Internationalized Domain Names (IDN) were developed to solve this problem, enabling domain names to contain characters from virtually any script — Arabic, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Japanese, Korean, Thai, and many more. Our free online IDN encoder decoder tool provides instant, accurate conversion between Unicode domain names and their Punycode representations, with advanced features including batch processing, homograph attack detection, deep character analysis, RFC compliance validation, and comprehensive domain comparison — all running privately in your browser without any data ever leaving your device.

The technical mechanism that makes internationalized domain names work within the existing ASCII-based DNS infrastructure is called Punycode, defined in RFC 3492. Punycode is a clever encoding scheme that converts Unicode strings containing non-ASCII characters into ASCII-compatible strings. The key insight of Punycode is that most domain labels — even in internationalized domains — contain at least some ASCII characters (letters, digits, or hyphens). Punycode encodes a Unicode string by first passing through all the ASCII characters unchanged, then encoding the positions and values of all non-ASCII characters using a compact variable-length encoding. The result is prefixed with "xn--" (the ACE prefix, for ASCII Compatible Encoding) to indicate to DNS software that the label is a Punycode-encoded internationalized label rather than a regular ASCII label. For example, the German city name "münchen" becomes "xn--mnchen-3ya" in Punycode — the ASCII characters m, n, c, h, e, n pass through unchanged, while the ü character is encoded as the suffix "3ya" appended after the double hyphen separator.

The History and Standards Behind IDN: From IDNA2003 to IDNA2008 and UTS #46

The standardization of internationalized domain names was a lengthy and complex process that produced several competing and complementary specifications. The first widely deployed standard was IDNA2003 (Internationalizing Domain Names in Applications, 2003), defined in RFCs 3490, 3491, and 3492. IDNA2003 defined how applications should process Unicode domain names before querying DNS: by applying Unicode normalization (specifically the NFKC normalization form) followed by the Nameprep string preparation profile, then converting each label to Punycode if it contains non-ASCII characters. IDNA2003 worked reasonably well for many scripts but had significant issues with certain Unicode characters — particularly characters that are mapped or deleted by NFKC normalization in ways that surprised users from specific linguistic communities.

The successor standard, IDNA2008 (defined in RFCs 5890-5894, published in 2010), took a fundamentally different approach. Rather than applying character mapping transformations at the application level, IDNA2008 defined precise categories for each Unicode code point: PVALID (valid for use in labels), CONTEXTJ (valid only in specific joining contexts), CONTEXTO (valid only in other specific contexts), and DISALLOWED (not valid in labels). This more restrictive approach eliminated many of the ambiguities of IDNA2003 but created incompatibilities: some domain names that were valid under IDNA2003 became invalid under IDNA2008, and vice versa. The Unicode Consortium responded with UTS #46 (Unicode IDNA Compatibility Processing), which defines a compatibility mapping that bridges the two standards by applying a subset of IDNA2003's character mapping before applying IDNA2008 rules.

Our IDN converter supports all three standards — IDNA2003, IDNA2008, and UTS #46 — through a configurable option, allowing developers and researchers to verify compatibility with different implementations and understand how domain names are processed across different systems. This multi-standard support is essential for anyone working on internationalization infrastructure, domain registrar systems, email routing, or security analysis of international domain names.

Understanding Punycode Encoding: How the Algorithm Works

The Punycode algorithm, specified in RFC 3492, is a specific instance of the general Bootstring encoding algorithm. It encodes a Unicode string into an ASCII-compatible string using only the characters A-Z, a-z, 0-9, and the hyphen. The encoding process works as follows: First, the input string is separated into basic (ASCII) characters and non-basic (non-ASCII) characters. The basic characters are output first in their original order. If there are any basic characters, a hyphen separator is appended. Then, the non-basic characters are encoded using a generalized variable-length integer encoding that compactly represents both the character values (as Unicode code points) and their positions in the original string.

The resulting string — basic characters, optional hyphen, and encoded non-basic characters — is then prefixed with "xn--" when used as a DNS label. This "xn--" prefix is the globally recognized signal that the label contains Punycode-encoded content. When a DNS resolver encounters a label beginning with "xn--", it knows to interpret it as a Punycode label and decode it to get the actual Unicode domain name for display purposes. The entire conversion and decoding process is transparent to end users — when you type a URL in your browser using an internationalized domain name, the browser silently converts it to Punycode before sending the DNS query.

A complete internationalized domain name consists of multiple labels separated by dots, just like regular ASCII domain names. Each label is converted independently. Labels that contain only ASCII characters pass through unchanged and are not prefixed with "xn--". So a domain like "münchen.de" converts to "xn--mnchen-3ya.de" — the first label "münchen" is converted to Punycode, while the second label "de" remains unchanged as pure ASCII. More complex examples like Arabic domain names may have every label converted: "مثال.إختبار" becomes "xn--mgbh0fb.xn--kgbechtv" because both the domain name and the TLD are in Arabic script.

Homograph Attacks: The Security Risk of Internationalized Domain Names

While internationalized domain names are essential for making the internet accessible to non-English speakers, they also introduce a significant security vulnerability known as the IDN homograph attack (or homoglyph attack). The attack exploits the fact that characters from different Unicode scripts can be visually identical or nearly identical. The most famous example is the pair of Latin 'a' (U+0061) and Cyrillic 'а' (U+0430) — these two characters are completely indistinguishable to most users viewing text in common web fonts. An attacker who registers the domain "pаypal.com" using a Cyrillic 'а' has a domain that looks identical to "paypal.com" to most users but is in fact a completely different domain name.

Our tool's homograph detection feature analyzes each character in a domain name and checks for potential lookalike substitutions — characters that are visually similar to ASCII characters but belong to different Unicode scripts. The detector identifies suspicious mixing of scripts within a single domain label (legitimate internationalized domains typically use characters from only one script per label), flags specific high-risk character pairs (Cyrillic vs Latin, Greek vs Latin, etc.), and provides a risk assessment for each domain. This functionality is invaluable for security researchers, domain monitoring systems, phishing detection tools, and anyone who needs to verify that a domain name is what it appears to be.

Modern browsers have partially addressed this attack by implementing heuristics that display Punycode instead of Unicode for domains containing certain mixed-script or suspicious combinations. However, the rules vary between browsers and are regularly updated, making automated detection tools like ours essential for comprehensive analysis. Our homograph checker implements a comprehensive database of confusable character pairs and provides detailed reports about which characters in a domain are potential substitutes for common ASCII characters.

Practical Applications: Who Uses IDN Encoding and Why

The range of professionals and use cases that benefit from an IDN encoder decoder tool is remarkably broad. Domain registrars and DNS operators need to convert between Unicode and Punycode when registering internationalized domains, managing zone files, and configuring DNS servers. While most modern domain registration interfaces handle this conversion automatically, developers building registration systems or managing large portfolios of internationalized domains need programmatic access to accurate IDN conversion.

Email system developers face IDN challenges in multiple places: sender and recipient addresses may use internationalized domains, MX records may point to internationalized hostnames, and SPF/DKIM/DMARC records may need to handle IDN domains correctly. Understanding the correct Punycode representation of internationalized domain names is essential for building email systems that work correctly across languages and scripts.

Web developers and SEO specialists working on multilingual websites need to understand how internationalized URLs work, how to construct canonical URLs for IDN domains, and how search engines index and rank IDN pages. URLs containing IDN domains use Punycode in the hostname part but Unicode in the path, query string, and fragment — a nuance that requires careful handling to avoid broken links and duplicate content issues.

Security researchers and penetration testers use IDN tools to analyze potential phishing domains, to test how applications handle internationalized input, and to identify homograph vulnerabilities in authentication systems. A well-built IDN analyzer that provides character-by-character Unicode information (code points, script categories, bidirectionality) is an essential tool for this work.

Tips for Working with International Domain Names

When encoding domain names for DNS use, always work at the label level — encode each part between the dots separately rather than treating the entire domain as a single string. Only labels that contain non-ASCII characters need the "xn--" prefix; pure ASCII labels pass through unchanged. This means that "münchen.de" has only one IDN label (münchen → xn--mnchen-3ya) while "de" remains unchanged, giving "xn--mnchen-3ya.de".

For maximum compatibility, normalize your Unicode input before encoding. Different users or systems may input the same character using different Unicode representations — for example, the letter 'é' can be represented as a single precomposed character (U+00E9) or as a base 'e' (U+0065) followed by a combining acute accent (U+0301). These are canonically equivalent under Unicode normalization (NFC/NFD), and most IDN implementations normalize to NFC before encoding. Our tool applies appropriate normalization to ensure consistent output.

When validating internationalized domain names, check both the Unicode form and the Punycode form for length constraints. Each label (after Punycode encoding) must be at most 63 octets, and the total domain name (including dots) must be at most 253 characters in its ASCII/Punycode form. A label that is short in its Unicode form may become long after Punycode encoding — each non-ASCII character typically adds 3 or more characters to the encoded form.

Conclusion: Your Essential IDN Tool for International Domain Work

Our free online IDN encoder decoder provides the most comprehensive internationalized domain name conversion tool available — supporting IDNA2003, IDNA2008, and UTS #46 standards, converting between Unicode and Punycode accurately for all Unicode scripts, detecting homograph attacks and security risks, providing deep character-level analysis with script and Unicode category information, processing batch lists of domains efficiently, and validating domains against DNS and IDNA compliance rules. Everything runs in your browser with complete privacy and no signup required. Whether you're a developer building internationalization support into your application, a security researcher analyzing suspicious domains, a domain manager working with a multilingual domain portfolio, or a student learning about internationalized internet infrastructure, this free IDN conversion tool delivers accurate, professional results instantly. Bookmark this page as your go-to resource for all international domain name encoding and decoding needs.

Frequently Asked Questions

An Internationalized Domain Name (IDN) is a domain name that contains characters from non-Latin scripts or scripts with diacritics — such as Arabic, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Japanese, Korean, Thai, and accented Latin characters. Examples include münchen.de (German), 中文.com (Chinese), مثال.إختبار (Arabic), and пример.рф (Russian). IDNs allow non-English speakers to register and use domain names in their native languages and scripts, making the internet more accessible globally.

Punycode (RFC 3492) is an encoding that converts Unicode strings (including non-ASCII characters) into ASCII-compatible strings. It enables internationalized domain names to be stored and transmitted through the existing ASCII-based DNS infrastructure. The "xn--" prefix is called the ACE prefix (ASCII Compatible Encoding). When a DNS label begins with "xn--", it signals that the remaining characters are Punycode-encoded Unicode. For example, "münchen" → "xn--mnchen-3ya" where "mnchen" are the ASCII characters that pass through, and "3ya" encodes the position and value of the ü character.

A homograph (or homoglyph) attack exploits the fact that characters from different Unicode scripts can look visually identical. For example, Cyrillic 'а' (U+0430) looks the same as Latin 'a' (U+0061). An attacker can register "pаypal.com" with a Cyrillic 'а', which looks identical to "paypal.com" to most users. Our tool detects this by: analyzing each character's Unicode script, flagging mixed-script labels, identifying known confusable character pairs, and calculating a risk score. Use the Homograph tab to check any suspicious domain.

IDNA2003 (RFCs 3490-3492) applies NFKC normalization and character mapping before encoding, which changes some characters during conversion. IDNA2008 (RFCs 5890-5894) is stricter — it categorizes each Unicode code point as PVALID, CONTEXTJ, CONTEXTO, or DISALLOWED and does not apply character mapping. This creates incompatibilities: some domains valid in IDNA2003 are invalid in IDNA2008 and vice versa. UTS #46 bridges both by applying compatibility mapping before IDNA2008 rules. Our tool supports all three for maximum compatibility testing.

DNS labels have a maximum length of 63 octets (bytes). For Punycode-encoded IDN labels, this 63-octet limit applies to the encoded Punycode form (including the "xn--" prefix). In Unicode, a single label may appear shorter but expand significantly after Punycode encoding — each non-ASCII character typically adds 3+ characters. The total domain name (including dots) cannot exceed 253 characters in its ASCII/Punycode form. Our validator checks both label length and total domain length for compliance.

Yes, but with important caveats. The domain part of email addresses can use IDN (Punycode) encoding — this has been supported since SMTP servers began supporting IDNA. The local part (username before @) can theoretically use Unicode via the EAI (Email Address Internationalization) extension (RFC 6530-6533), but support is still limited. Most email servers convert IDN domain names to Punycode for delivery. For maximum compatibility, use Punycode in email addresses and ensure your mail server supports IDNA encoding.

Modern browsers display Punycode (xn--) instead of Unicode for IDN domains when they detect a potential homograph attack or when the domain doesn't meet certain safety criteria. Specifically, if a domain label mixes characters from multiple scripts (e.g., Latin + Cyrillic), or if the TLD doesn't allow IDN labels, the browser shows the raw Punycode. Different browsers have different rules for when to display Punycode vs Unicode. This is a security feature to help users recognize potentially deceptive domain names.

Yes. All encoding, decoding, analysis, and validation operations happen entirely in your browser using JavaScript. No domain names or data are ever sent to our servers or any third party. You can verify this by checking the browser's Network tab in developer tools — no requests are made during conversion. This makes the tool safe for confidential domain portfolios, internal network hostnames, and unreleased domain strategies.

Use the Batch tab. Enter your domain names, one per line (or comma-separated). Select the conversion direction (Unicode → Punycode, Punycode → Unicode, or Auto-detect). Click Process. The results appear with status indicators showing success or any errors for each domain. You can copy all results or download them as a CSV file with both original and converted columns. The main input area also accepts multiple lines — each line is processed independently and shown as a separate result.

Our tool supports all Unicode scripts that are valid in domain names under IDNA standards, including: Arabic, Armenian, Bengali, Bopomofo, Cherokee, CJK (Chinese/Japanese/Korean), Cyrillic, Devanagari (Hindi, Sanskrit, etc.), Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Han (Traditional & Simplified Chinese), Hangul (Korean), Hebrew, Hiragana and Katakana (Japanese), Kannada, Khmer, Latin (with all diacritics), Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thai, Tibetan, and many more. Any Unicode character that is PVALID under IDNA2008 can be used in IDN labels.