Auto-decode enabled — Punycode to Unicode

Punycode (ACE) Input

Drop file here

Lines: 0 | Chars: 0

Unicode (IDN) Output

Lines: 0 | Chars: 0

Input Mode

Output Format

Case

Download Format

Auto-Decode on Type

Trim Whitespace

Skip Blank Lines

Pass ASCII Labels Unchanged

Show Original Below

Detect Homograph Attacks

NFC Normalize

Strip URL Scheme

Why Use Our Punycode Decoder?

Instant

Real-time decoding

Security

Homograph detection

Batch

Many at once

Char Info

Unicode details

Private

Browser-based only

Free

No signup required

The Complete Guide to Punycode Decoding: Revealing the True Identity of Internationalized Domain Names

In the landscape of internet security, domain name analysis, and web development, the ability to decode Punycode—that cryptic xn-- prefix you sometimes see in browser address bars, email headers, and server logs—is an increasingly essential skill. Punycode is the ASCII-compatible encoding (ACE) that allows internationalized domain names (IDNs) containing non-ASCII characters like Chinese, Arabic, Cyrillic, Japanese, or accented European characters to exist within the ASCII-only Domain Name System. While web browsers typically display these domains in their human-readable Unicode form, the underlying Punycode representation appears constantly in technical contexts: server logs, DNS query records, SSL/TLS certificate details, HTTP headers, email routing tables, and security analysis reports. Our free Punycode decoder online tool converts these opaque xn-- encoded strings back to their original Unicode characters instantly, with comprehensive analysis features that go far beyond simple text conversion.

Understanding what a Punycode-encoded domain actually represents is particularly critical from a security perspective. The homograph attack—one of the most sophisticated and visually deceptive phishing techniques in modern cybercrime—relies on Punycode to register domain names that look identical or nearly identical to legitimate domains when displayed in Unicode. A malicious actor can register xn--pple-43d.com (which decodes to аpple.com using Cyrillic а rather than Latin a) and use it in phishing emails where the Unicode display of the domain looks exactly like apple.com to most users. Decoding Punycode and analyzing the resulting Unicode characters is the fundamental first step in identifying these attacks, and our decoder's integrated security analysis feature automatically flags suspicious lookalike characters, mixed-script combinations, and other patterns associated with homograph phishing attempts.

The Mechanics of Punycode Decoding: Understanding the Algorithm

Punycode decoding reverses the encoding algorithm defined in RFC 3492. The encoded label (the portion after the xn-- prefix) contains two parts: the basic (ASCII) characters that appear in the original label come first, followed by a hyphen delimiter, followed by a sequence of variable-length encoded integers that specify the positions and code points of all non-ASCII characters. The decoding algorithm reads these integers using a generalized variable-length integer scheme and uses them to reconstruct the original Unicode string by inserting the non-ASCII characters at their correct positions among the ASCII characters.

For a complete domain name like xn--mnchen-3ya.de, the decoding process works on a label-by-label basis. The first label, xn--mnchen-3ya, is identified as a Punycode label by its xn-- prefix. The algorithm finds the last hyphen in the remaining string (mnchen-3ya), taking mnchen as the basic characters and 3ya as the encoded non-ASCII portion. Decoding 3ya reveals that the character ü (Unicode U+00FC, Latin small letter u with diaeresis) should be inserted at position 2 (between the m and the n), producing münchen. The second label, de, contains no Punycode prefix and passes through unchanged, giving the final result münchen.de.

When multiple non-ASCII characters are present, the encoded portion contains multiple delta values that are processed sequentially, each describing the code point difference from the previously decoded character (normalized to account for position). This delta encoding scheme is remarkably compact—a single Chinese domain label with many non-ASCII characters can be encoded in just a few dozen ASCII characters, demonstrating the efficiency of the Punycode algorithm for real-world internationalized domain names.

Security Analysis: Detecting Homograph and IDN Spoofing Attacks

The security analysis tab in our Punycode decoder for developers online implements a comprehensive set of heuristics for identifying potentially deceptive internationalized domain names. The analysis examines the decoded Unicode characters for several categories of risk. Script mixing detection identifies labels that contain characters from multiple Unicode scripts—for example, a label with both Latin and Cyrillic characters is suspicious because legitimate internationalized domains typically use only one script per label (matching the native language of their target audience). A domain that mixes Latin and Cyrillic characters serves no legitimate multilingual purpose but could be designed to visually deceive.

Lookalike character detection compares each character in the decoded domain against a database of characters known to be visually similar to ASCII characters. The Cyrillic alphabet contains several letters that are nearly indistinguishable from Latin letters: Cyrillic а (U+0430) matches Latin a, Cyrillic е (U+0435) matches Latin e, Cyrillic о (U+043E) matches Latin o, Cyrillic р (U+0440) matches Latin p, Cyrillic с (U+0441) matches Latin c, and Cyrillic х (U+0445) matches Latin x. Greek, Armenian, and other scripts have similar lookalike characters. A domain constructed entirely from Cyrillic lookalikes to spell a Latin-alphabet brand name is a classic homograph attack, and our security scanner identifies each suspect character with its Unicode code point for detailed investigation.

The security scanner also checks for the use of zero-width characters (like zero-width non-joiner U+200C and zero-width joiner U+200D) that can be invisible in many fonts and rendering environments but technically create unique domain names. It checks for combining characters that modify adjacent characters and may produce unexpected visual rendering across different platforms and fonts. These zero-width and combining character techniques are sometimes used to create domain names that appear identical to legitimate domains even when security software would normally flag the Punycode form as suspicious.

Character Information: Understanding Every Unicode Code Point

The Character Info tab provides a forensic-level analysis of every character in the decoded domain. For each character, the tool displays the character itself, its Unicode code point in U+XXXX notation, the official Unicode character name, and the Unicode script to which the character belongs. This information is invaluable for security researchers investigating potentially malicious domains, for developers building domain validation systems, and for anyone who needs to understand exactly what characters appear in an internationalized domain.

Unicode character names are the official names defined by the Unicode Consortium that uniquely identify each code point. Knowing that a character is "CYRILLIC SMALL LETTER O" rather than "LATIN SMALL LETTER O" immediately reveals whether a domain that looks like "example.com" actually uses the correct Latin characters or Cyrillic lookalikes. The Unicode script property, which indicates which writing system a character belongs to (Latin, Cyrillic, Greek, Arabic, Han, Katakana, etc.), is the primary tool for script mixing analysis and is displayed prominently for each character in the analysis.

Practical Applications of Punycode Decoding

Security professionals and incident responders use Punycode decoding as a standard part of phishing investigation workflows. When analyzing suspicious emails, the first step is examining all domain names in email headers, links, and attachment URLs. Punycode domains often appear in these contexts, and quickly decoding them reveals whether they are legitimate internationalized domains or homograph spoofs of known brands. Our batch processing capability is particularly valuable here—an incident response analyst can paste an entire list of suspicious domains extracted from a phishing kit and decode all of them simultaneously, getting the Unicode forms and security analysis for the complete set in a single operation.

Web developers working on email filtering, spam detection, and domain reputation systems implement Punycode decoding as part of their processing pipeline. Email servers encounter Punycode in message-ID headers, received headers, from addresses, and link URLs in message bodies. Implementing correct decoding is essential for accurate domain extraction and reputation lookup. The email address mode in our decoder correctly separates the local part and domain portion of email addresses, applying Punycode decoding only to the domain component as required by the relevant RFC standards.

Domain registrars, brand protection services, and trademark attorneys use Punycode decoding to monitor for registration of internationalized domain names that may infringe on Latin-script trademarks through homograph attacks. The batch processing capability enables monitoring services to continuously decode and analyze large lists of newly registered Punycode domains, flagging those that decode to lookalike versions of protected brand names. Understanding which Unicode characters are used in potentially infringing registrations is essential for building legal cases and for requesting suspension or transfer of malicious domains through dispute resolution processes.

Working with Different Types of Punycode Content

Not all Punycode appears in the same context, and our online Punycode decoding tool supports five different input modes to handle each correctly. The Domain mode processes complete domain names with multiple labels separated by dots, decoding any xn-- labels while passing ASCII labels unchanged. This is the most common use case for decoding domain names from any source. The Label mode specifically handles single Punycode labels (after the xn-- prefix has been stripped), which is useful when working directly with DNS zone file records or when the xn-- label has been extracted from a larger context.

The Email mode handles complete email addresses, identifying the domain portion after the @ symbol and applying Punycode decoding there while preserving the local part (before @) unchanged. This correctly handles addresses like user@xn--mnchen-3ya.de, which would be decoded to user@münchen.de. The URL mode processes complete URLs including the scheme, path, query string, and fragment, identifying the authority (hostname) component and applying domain decoding specifically there. The Text mode treats the entire input as a Punycode label to decode, which is useful for educational purposes or when working with the raw Punycode algorithm output directly.

Tips for Effective Punycode Decoding

When analyzing domains from server logs, note that logs typically record the DNS name used in the HTTP Host header or SNI field, which may be either the Punycode form or the Unicode form depending on the client implementation. Modern browsers send the Punycode form in DNS queries and HTTP Host headers, while the address bar displays the Unicode form. Understanding this distinction helps correctly interpret log records when correlating HTTP traffic with DNS queries—the Punycode form in DNS logs corresponds to the Unicode form in browser display.

For security analysis, always decode the complete domain including the TLD (top-level domain). Some internationalized TLDs like .рф (the Russian TLD, encoded as xn--p1ai) and .中国 (Chinese TLD, encoded as xn--fiqs8sirgfmh) are legitimate IDN TLDs, while lookalike variants of common TLDs like .com or .org in different scripts are almost certainly malicious. The Reference tab provides examples of common legitimate IDN TLDs to help distinguish them from suspicious Punycode TLDs.

When working with the batch decoder to process large domain lists, use the CSV export for integration with spreadsheet tools or the JSON export for programmatic processing in scripts and automation workflows. The JSON output includes the original Punycode input, the decoded Unicode output, the error status, and the security risk assessment for each domain, providing all the information needed for automated analysis pipelines.

Conclusion: The Most Comprehensive Free Punycode Decoder Available

Our Punycode decoder online free tool delivers the most complete Punycode decoding experience available, combining instant bidirectional conversion (decode and encode), five input modes for different content types, comprehensive security analysis with homograph detection, character-level Unicode information, label-by-label domain breakdown, batch processing with CSV/JSON export, and domain information lookup—all running privately in your browser without any server uploads. Whether you need to decode punycode online for security investigation, web development, domain analysis, or academic study, our tool provides accurate, instant results with professional-grade analysis features that are completely free and require no account creation or signup.

Frequently Asked Questions

Punycode decoding converts xn-- encoded domain names back to their original Unicode (human-readable) form. The DNS system only supports ASCII characters, so internationalized domain names containing non-ASCII characters (Chinese, Arabic, Cyrillic, accented Latin, etc.) are stored and transmitted as Punycode. You need decoding when analyzing server logs, investigating suspicious domains, debugging email issues, examining SSL certificates, or any situation where you encounter xn-- encoded domains and need to see what they actually represent in human-readable form.

A homograph attack (also called IDN spoofing) registers a domain that looks identical to a legitimate domain by using visually similar characters from different scripts. For example, using Cyrillic "а" (which looks like Latin "a") to create "аpple.com" that appears identical to "apple.com" in browsers. Decoding the Punycode reveals the actual characters used, and our security analysis then flags visually similar characters from different scripts. This is critical for identifying phishing domains before users fall for the visual deception.

Browsers display the Punycode form (xn--...) as a security protection against homograph attacks. When a domain uses characters that could be confused with ASCII characters from a different script, the browser shows the Punycode form to alert you that something unusual is present. For example, a domain using Cyrillic lookalikes to spell an English brand name would be displayed as xn-- in Chrome, Firefox, and Safari because showing the Unicode form could visually deceive users. This is actually the browser protecting you.

Yes! Select "Email Address" from the Input Mode dropdown. The tool correctly identifies the @ separator, leaves the local part (before @) unchanged, and applies Punycode decoding to the domain portion (after @). For example, user@xn--mnchen-3ya.de decodes to user@münchen.de. This is the correct behavior because Punycode encoding applies specifically to domain names within the DNS system, not to the local part of email addresses (which uses different encoding standards).

Simply paste multiple Punycode domains into the input text area, one per line. The tool automatically processes each line and displays all results simultaneously in both the output text area and the Batch tab. The Batch tab shows each input paired with its decoded output, along with copy buttons for individual results. Use the CSV or JSON export buttons in the Batch tab to download all results in a structured format suitable for further analysis or integration with other tools.

The Character Info tab shows detailed Unicode information for every non-ASCII character in the decoded domain, including: the character itself, its Unicode code point (e.g., U+0430), the official Unicode character name (e.g., "CYRILLIC SMALL LETTER A"), and the Unicode script the character belongs to (e.g., Cyrillic, Arabic, Han). This information is essential for security analysis—it immediately reveals whether a domain that looks like it uses Latin characters actually contains Cyrillic, Greek, or other script lookalikes used in phishing attacks.

Yes, completely private. The Punycode Decoder runs 100% in your web browser. Your domain names and text are never sent to any server and never stored anywhere outside your browser session. All decoding, encoding, security analysis, and character information lookup happens locally on your device using JavaScript. This makes it safe for analyzing sensitive domains from ongoing security investigations, proprietary DNS data, or any confidential domain lists.

Yes! Click the "Unicode → Punycode" mode button to switch to encoding mode, where you can convert international domain names to their Punycode (xn--) representations. You can also use the Swap button to move the decoded Unicode output back to the input and switch to encoding mode—this allows quick round-trip verification that encoding the decoded domain produces the same Punycode you started with. The Swap function works in both directions.

Mixed script means a domain label contains characters from more than one Unicode script—for example, combining Latin characters with Cyrillic characters in the same label. Legitimate internationalized domains almost always use only one script per label (matching their target language/region). A domain mixing Latin "appl" with Cyrillic "е" has no legitimate purpose but creates a convincing visual forgery of "apple". Mixed-script labels are a strong indicator of potential homograph attacks and are flagged with a warning in the security analysis.

Punycode Decoder