Auto-encode enabled — Unicode to Punycode

Unicode Domain / Text Input

Drop file here

Lines: 0 | Chars: 0

Punycode (ACE) Output

Lines: 0 | Chars: 0

Input Mode

Output Format

Case Treatment

Download Format

Auto-Convert on Type

Trim Whitespace

Skip Blank Lines

Keep ASCII Labels Unchanged

Validate Domain Structure

Include URL Scheme

Add Comments

NFC Normalization

Why Use Our Punycode Encoder?

Instant

Real-time encoding

RFC 3492

Standards compliant

Batch Mode

Many at once

Breakdown

Label-by-label view

Private

100% browser-based

Free

No signup required

The Complete Guide to Punycode Encoding: Internationalized Domain Names and the Modern Multilingual Web

The internet was built on a foundation of ASCII characters—the 128-character set that covers the English alphabet, digits, and a handful of punctuation marks. In the early decades of the web, domain names were restricted to these characters, meaning that virtually all of the world's languages and writing systems were excluded from the namespace of the internet. Chinese, Arabic, Hindi, Japanese, Korean, Russian, Greek, and the languages of hundreds of millions of people could not be represented in domain names at all. This exclusion was both a practical barrier and a symbolic statement about whose internet the web was designed for. Punycode, defined in RFC 3492 and implemented as part of the Internationalized Domain Names in Applications (IDNA) framework, solved this problem by creating a system that translates Unicode domain names into ASCII-compatible encoding (ACE), making it possible for domain names in any language to coexist with the ASCII-only infrastructure of the Domain Name System. Our free Punycode encoder online tool implements this conversion accurately and comprehensively, supporting all Unicode characters, full URL processing, email address encoding, and batch conversion of domain lists.

Understanding Punycode requires understanding the constraints it works within. The Domain Name System (DNS), the distributed database that maps domain names to IP addresses, was designed in the 1980s and only supports a subset of ASCII characters: letters (a-z, case insensitive), digits (0-9), and hyphens (-). Periods separate labels (the components of a domain name separated by dots), and the entire infrastructure of the internet—DNS servers, certificate authorities, web browsers, email servers, and countless other systems—relies on this ASCII-only convention. Changing the DNS infrastructure to support Unicode directly would require updating billions of devices and software systems simultaneously, an impossible task. Punycode provides a bridge: a way to represent any Unicode label as an ASCII string that conforms to all DNS constraints while remaining entirely unambiguous and reversible.

How Punycode Encoding Works: The Algorithm in Detail

The Punycode encoding algorithm, designed by Adam Costello and published as RFC 3492 in 2003, converts a Unicode label (one component of a domain name, separated by dots) into an ASCII string with the format xn--[ascii-chars]-[encoded-non-ascii]. The xn-- prefix is the ACE prefix that signals to DNS software that this label is a Punycode-encoded internationalized label. What follows the prefix encodes both the ASCII characters already in the label and the positions and code points of all non-ASCII characters.

The encoding process begins by separating the input label into its basic (ASCII) and non-basic (non-ASCII) characters. ASCII characters are copied directly to the output, followed by a hyphen delimiter. Then the algorithm encodes the positions and code points of all non-ASCII characters using a compact variable-length integer encoding called generalized variable-length integers. This encoding system, inspired by the LZ77 compression algorithm, uses a delta-encoding scheme that expresses each non-ASCII character's position and code point as a difference from the previously encoded character, making the representation very compact for typical natural language text where characters tend to have nearby code points.

For a complete domain name with multiple labels, each label is encoded independently, and the results are joined with periods. Only labels containing non-ASCII characters receive the xn-- prefix—purely ASCII labels like com, net, or org pass through unchanged. This means that münchen.de encodes to xn--mnchen-3ya.de—only the first label (münchen) gets Punycode treatment, while .de remains as-is. Our online Punycode encoding tool handles this correctly, encoding each label independently and preserving ASCII labels unchanged.

The IDNA Framework: From Punycode to Working Domain Names

Punycode is the encoding algorithm at the heart of the Internationalized Domain Names in Applications (IDNA) framework, but IDNA includes additional processing steps beyond simple character encoding. The IDNA standard specifies how user interfaces (browsers, email clients, operating systems) should handle domain names that contain non-ASCII characters, defining rules for character validation, Unicode normalization, case folding, and bidirectional text handling that go beyond what Punycode itself specifies.

The first major version, IDNA2003 (defined in RFCs 3490-3492), included a step called nameprep that applied Unicode normalization form KC (NFKC) and removed certain characters. The updated IDNA2008 standard (RFCs 5890-5894) refined the rules significantly, using NFC (Canonical Decomposition followed by Canonical Composition) normalization and categorizing Unicode code points more carefully to better support the full range of international scripts. Our Punycode encoder tool free implements NFC normalization (which can be enabled or disabled) as part of the encoding pipeline, ensuring RFC-compliant output.

The practical implications of IDNA become most visible in web browsers, which transparently convert between Unicode display and Punycode transmission. When you type münchen.de in your browser's address bar, the browser converts it to xn--mnchen-3ya.de for the DNS lookup, receives the response, establishes a TLS connection, and displays the original Unicode domain name in the address bar—all without the user ever seeing the Punycode form. The Punycode encoding happens silently behind the scenes, invisible to the end user but essential for the domain name system to function correctly. Understanding this process is valuable for developers implementing internationalization in applications, for domain registrars managing multilingual domain portfolios, and for security researchers investigating homograph attacks.

Security Considerations: Homograph Attacks and IDN Spoofing

One of the most important security implications of internationalized domain names is the potential for homograph attacks—a type of phishing attack where malicious domains use characters from different scripts that visually resemble ASCII characters. For example, the Cyrillic letter а (U+0430) looks virtually identical to the Latin letter a (U+0061), the Greek lowercase omicron ο (U+03BF) is indistinguishable from Latin o (U+006F), and there are dozens of other such lookalike character pairs across different Unicode scripts. A malicious actor could register a domain like аpple.com (with Cyrillic а) and display it in a browser's address bar in a way that looks exactly like apple.com to most users.

Web browsers have implemented various defenses against homograph attacks. Chrome, Firefox, and Safari all display the Punycode form (xn--...) for domains that mix scripts in suspicious ways or that use characters known to be easily confused with ASCII characters. For example, if a domain contains only Cyrillic characters that happen to look like Latin characters, the browser displays the Punycode form rather than the misleading Unicode form. Understanding the Punycode encoding of suspicious domains is a key tool for security researchers and incident responders identifying phishing campaigns using IDN spoofing—our Punycode encoder for developers online can quickly reveal the underlying Unicode characters in any Punycode domain, exposing visually deceptive lookalike characters.

Real-World Use Cases for Punycode Encoding

Domain registrars and web hosting providers need Punycode encoding tools to process internationalized domain name registrations. When a customer wants to register a domain name in their native language—Arabic, Chinese, Thai, Hindi, or any other script—the registrar must convert the desired domain name to its Punycode representation for storage in the domain registry database and DNS zone files. Our batch encoding capability handles large lists of domain names efficiently, making it suitable for registrar workflows that process many IDN registrations simultaneously.

Email system administrators encounter Punycode in the context of internationalized email addresses (EAI), which allow the domain portion of email addresses to use non-ASCII characters. When configuring email servers, certificate authorities, and spam filtering systems to handle these addresses, the ability to quickly encode and decode the domain portion is essential for testing and troubleshooting. The email address encoding mode in our tool handles the local-part and domain separately, applying Punycode only to the domain component as required by the relevant standards.

Web application developers implementing URL handling for internationalized content need Punycode encoding to correctly process and store domain names from multilingual user input. A web application that accepts domain names from users around the world must be able to normalize these inputs to their Punycode forms for consistent storage, comparison, and DNS resolution. The programmatic encoding that our tool demonstrates (based on the standard Punycode algorithm) can be implemented in any programming language using the same underlying algorithm.

SEO professionals and digital marketers working with websites targeting international audiences use Punycode tools to verify that their internationalized domain names are correctly encoded and to understand how search engines see and index these domains. Google and other major search engines fully support IDN domains and index their content appropriately, but understanding the Punycode representation is important for technical SEO analysis, hreflang implementation, and canonical URL configuration.

Punycode Encoding Best Practices

Always apply Unicode NFC normalization to the input before encoding to ensure consistent, canonical output. Different applications may provide the same text in different Unicode normalization forms—for example, the letter ü can be represented as a single precomposed character (U+00FC) or as the letter u (U+0075) followed by a combining diaeresis (U+0308). Both representations look identical to users but produce different Punycode encodings. NFC normalization ensures that the precomposed form is always used, producing consistent Punycode output regardless of how the input was originally composed.

When encoding domain names for production use, always validate the resulting Punycode label against the DNS label length restrictions: each label (component between dots) must be no longer than 63 characters, and the complete domain name must be no longer than 253 characters (including the separating dots). Labels that are too long after Punycode encoding are invalid and cannot be registered or resolved. Our validation feature checks these constraints and flags any violations with clear error messages.

For email addresses with internationalized domain names, apply Punycode encoding only to the domain portion (after the @ symbol), not to the local part (before the @). While the SMTP protocol extensions for internationalized email (SMTPUTF8) can handle Unicode in the local part as well, not all email systems support these extensions, and the Punycode encoding applies specifically to the domain name component within the DNS system.

Conclusion: The Essential Tool for Multilingual Domain Management

Our Punycode encoder online free tool provides the most complete and accurate Punycode encoding solution available on the web. With five input modes (domain, label, email, URL, plain text), bidirectional conversion (encode and decode), comprehensive validation, label-by-label breakdown, batch processing, and DNS lookup integration—all running privately in your browser—this tool serves every use case from quick one-off domain conversions to bulk processing of large domain portfolios. Whether you need to encode punycode online, convert internationalized domain names for DNS configuration, verify Punycode encoding for security analysis, or process domain lists for a registrar workflow, our tool delivers RFC 3492-compliant, accurate results instantly and for free.

Frequently Asked Questions

Punycode is an ASCII-compatible encoding (ACE) system defined in RFC 3492 that represents Unicode domain name labels as ASCII strings. The DNS system only supports ASCII characters (A-Z, 0-9, hyphen), so domain names in non-Latin scripts like Chinese, Arabic, or Cyrillic must be converted to ASCII before the DNS can process them. The "xn--" prefix (ASCII Compatible Encoding prefix) signals to DNS software that what follows is a Punycode-encoded internationalized label rather than a standard ASCII label.

URL encoding (percent-encoding) encodes characters as %XX hex sequences and is used for the path, query string, and fragment portions of URLs where any byte value can appear. Punycode applies specifically to domain name labels and produces a compact ASCII representation that embeds the Unicode code points using a mathematical algorithm rather than simple hex encoding. URL encoding of "münchen" gives "m%C3%BCnchen" (expanding each byte of the UTF-8 encoding), while Punycode gives "mnchen-3ya" (much more compact and integrated with the ASCII characters in the label).

Yes, Punycode works for any Unicode code point, which covers all of the world's major writing systems including Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari (Hindi), Thai, Chinese (Han), Japanese (Hiragana, Katakana, Kanji), Korean (Hangul), and many more. The algorithm encodes any Unicode character regardless of its code point value. However, the IDNA standard (which defines which characters are allowed in internationalized domain names) imposes additional restrictions—not every valid Unicode character is permitted in a domain name label, even if Punycode can technically encode it.

Browsers display the Punycode form (xn--...) rather than the Unicode form as a security measure against homograph attacks—phishing attempts that use lookalike characters from different scripts to impersonate legitimate domains. For example, a domain using Cyrillic а (which looks like Latin a) alongside Latin characters would be displayed as Punycode to alert users that something unusual is present. When a domain contains characters from a single script or uses a recognized TLD with matching script characters, browsers typically display the Unicode form safely.

Yes, for the domain part of email addresses (after the @ symbol). Select "Email Address" in the Input Mode dropdown. The local part (before @) is left unchanged, while the domain portion is Punycode-encoded. For example, user@münchen.de becomes user@xn--mnchen-3ya.de. Note that while SMTP extensions (SMTPUTF8) can handle Unicode in the local part of email addresses, traditional email systems only support ASCII in the local part, and Punycode encoding only applies to the domain component within the DNS.

Each DNS label (component between dots) must be no longer than 63 characters after Punycode encoding. Since the xn-- prefix uses 4 characters, and the encoding adds additional overhead, this means that internationalized labels can typically contain fewer characters than purely ASCII labels. The complete domain name (all labels plus separating dots) must be no longer than 253 characters. Our tool validates these constraints and shows warnings when encoded results exceed the limits.

Yes! Punycode encoding is completely reversible—every valid Punycode string decodes back to exactly the Unicode string it was encoded from. Click the "Punycode → Unicode" direction button in our tool to switch to decode mode, paste any xn-- Punycode domain, and instantly recover the original Unicode domain name. This bidirectionality is fundamental to how the DNS system works—browsers encode Unicode to Punycode for DNS queries and decode the results for display.

NFC (Canonical Decomposition followed by Canonical Composition) is a Unicode normalization form that ensures the same character is always represented the same way. Some characters can be encoded either as a single precomposed code point (e.g., ü as U+00FC) or as a base character plus combining characters (e.g., u + ̈ combining diaeresis). Without normalization, these equivalent representations would produce different Punycode outputs for what users see as the same domain. The IDNA standard requires NFC normalization before encoding to ensure consistent, canonical Punycode representations.

Punycode Encoder