Punycode Encoder
Punycode Encode Domain Names Online
Convert internationalized domain names to punycode encoded format instantly with this free online tool. Punycode encode transforms Unicode domain labels containing non-ASCII characters into ASCII-compatible encoding (ACE) strings prefixed with xn--, enabling internationalized domain name (IDN) resolution across the global DNS infrastructure. Whether you are registering an idn encode domain, debugging DNS lookups, or testing internationalized domain configurations, get accurate punycode output in real time.
What is Punycode Encoding
Punycode is a specialized encoding system defined in RFC 3492 that converts Unicode strings into a limited ASCII character set compatible with the Domain Name System. The DNS was originally designed to handle only ASCII characters (letters a-z, digits 0-9, and hyphens), which excluded domain names in non-Latin scripts like Chinese, Arabic, Cyrillic, and many others. Punycode bridges this gap by providing a reversible transformation from Unicode to ASCII.
When a domain label contains non-ASCII characters, Punycode encoding produces an ASCII string prefixed with "xn--". For example, the German domain "munchen" with an umlaut becomes "xn--mnchen-3ya" after encoding. The "xn--" prefix is called the ACE (ASCII Compatible Encoding) prefix and signals to DNS resolvers that the label is Punycode encoded and needs special handling.
Punycode is part of the broader Internationalized Domain Names in Applications (IDNA) framework defined in RFC 5891. IDNA specifies how applications should process internationalized domain names, including normalization, validation, and the Punycode encoding step. This framework enables billions of internet users to access websites using domain names written in their native scripts.
How Punycode Encoding Works
The Punycode algorithm uses a clever technique called bootstring encoding to represent Unicode code points using only basic ASCII characters. The process begins by separating the input into two groups: basic ASCII characters that can be used directly, and non-ASCII characters that need encoding. The basic characters are placed first in the output, followed by a hyphen delimiter, and then the encoded non-ASCII characters.
The non-ASCII characters are encoded using a variable-length integer representation that records each character's code point and its position in the string. The algorithm uses an adaptive bias mechanism to optimize the encoding for common patterns, keeping the output as short as possible. This makes Punycode significantly more compact than simpler approaches like percent encoding for domain names.
After the Punycode string is generated, the "xn--" prefix is prepended to create the final ACE label. Each label in a domain name is encoded independently, so "example" with non-ASCII characters in the second-level domain would only affect that specific label. The reverse process, converting Punycode back to Unicode, is handled by our Punycode decoder tool. For percent-encoding Unicode characters in URL paths and query strings rather than domain labels, the UTF-8 percent encoding tool is the appropriate choice. When you need to encode arbitrary binary data into an ASCII-safe format for transport, Base64 encoding for data transfer serves a related but distinct purpose.
Syntax Comparison
Here is how to perform Punycode encoding in popular programming languages:
JavaScript (Node.js): The built-in punycode module provides punycode.toASCII("munchen-with-umlaut.de") for full domain conversion and punycode.encode("munchen-with-umlaut") for individual labels. In browsers, the URL constructor automatically handles Punycode conversion when you set the hostname property.
Python: Use the "idna" codec: "munchen-with-umlaut".encode("idna") returns the Punycode bytes. For more control, the third-party idna library provides idna.encode() with IDNA 2008 compliance and detailed error handling.
Java: The java.net.IDN class provides IDN.toASCII("internationalized-domain") which performs IDNA processing including Punycode encoding. This handles the full IDNA pipeline including Unicode normalization.
Go: The golang.org/x/net/idna package provides idna.Lookup.ToASCII() for converting internationalized domain labels to their Punycode representation with full IDNA 2008 support.
Common Use Cases
Domain Registration: When registering an internationalized domain name with a registrar, the domain must be converted to its Punycode form for submission to the registry. Registrars typically handle this conversion automatically, but administrators often need to verify the Punycode output to ensure the correct domain is being registered and to avoid homograph attacks.
DNS Configuration: DNS zone files and management interfaces require domain labels in ASCII format. When configuring DNS records for internationalized domains, administrators must use the Punycode-encoded form of each label. This includes A records, CNAME records, MX records, and all other DNS record types.
Email Routing: Email addresses with internationalized domain parts must be converted to Punycode for SMTP routing. While the local part of an email address can contain UTF-8 characters under the EAI (Email Address Internationalization) standard, the domain part still requires Punycode encoding for compatibility with existing mail infrastructure.
SSL Certificate Verification: SSL/TLS certificates for internationalized domains typically list the Punycode form of the domain in the Subject Alternative Name field. Understanding the Punycode representation is essential for verifying that certificates match the intended domain and for diagnosing certificate mismatch errors.
Security Analysis: Security researchers use Punycode encoding to detect and analyze IDN homograph attacks, where visually similar characters from different scripts are used to create deceptive domain names that mimic legitimate websites.
Punycode Encode Examples
Here are practical examples demonstrating punycode encode with internationalized domain labels:
Example 1 - German Umlaut: The domain label "munchen" with an umlaut over the u encodes to "xn--mnchen-3ya". The ASCII characters m, n, c, h, e, n appear first, followed by the encoded umlaut character and its position. This is one of the most common European IDN patterns.
Example 2 - Chinese Domain: A domain label consisting of two Chinese characters meaning "example" encodes to a string like "xn--fsq228c". Since there are no ASCII characters in the input, the output after "xn--" contains only the encoded Unicode code points. CJK domains are among the most widely registered internationalized domain names.
Example 3 - Arabic Domain: An Arabic domain label encodes to a Punycode string where all characters are non-ASCII, producing output like "xn--mgbh0fb". Arabic text is written right-to-left, but the Punycode output is always left-to-right ASCII, which can cause confusion when comparing the visual representation to the encoded form.
Example 4 - Mixed Script: A domain label like "cafe" with an accented final e encodes to "xn--caf-dma". The ASCII portion "caf" appears first, then a hyphen separator, followed by the encoded accented character. Mixed-script labels that combine ASCII and non-ASCII characters are common in European languages.
Frequently Asked Questions
What does the xn-- prefix mean in Punycode?
The "xn--" prefix is the ACE (ASCII Compatible Encoding) prefix that identifies a domain label as Punycode-encoded. It was chosen because "xn--" is extremely unlikely to appear at the start of a legitimate non-encoded domain label. When DNS resolvers and applications encounter a label starting with "xn--", they know to apply Punycode decoding to reveal the original Unicode characters. This prefix is standardized in RFC 3490 and is used universally across all implementations.
What is the difference between IDNA 2003 and IDNA 2008?
IDNA 2003 and IDNA 2008 are two versions of the Internationalized Domain Names standard that differ in how they handle certain characters. IDNA 2003 maps some characters to others during processing, such as converting the German sharp s to "ss". IDNA 2008 treats these characters differently, either allowing them as distinct characters or rejecting them entirely. The choice of IDNA version affects which Punycode output is produced for certain inputs, so it is important to know which version your tools and registrar use.
Can I encode an entire domain name with Punycode at once?
Punycode encoding is applied to individual labels within a domain name, not to the entire domain string. A domain like "example.com" consists of two labels separated by a dot. Each label is encoded independently, and the dots are preserved as literal separators. If only one label contains non-ASCII characters, only that label receives the "xn--" prefix while the others remain unchanged.
What is an IDN homograph attack?
An IDN homograph attack exploits the visual similarity between characters from different Unicode scripts to create deceptive domain names. For example, the Cyrillic letter "a" looks identical to the Latin letter "a" but has a different Unicode code point, producing a different Punycode encoding. Attackers register domains that visually mimic legitimate sites to trick users into visiting phishing pages. Modern browsers mitigate this by displaying the Punycode form instead of the Unicode form when a domain mixes scripts suspiciously.
Does Punycode encoding change the length of a domain label?
Yes, Punycode encoding typically increases the length of a domain label because it must represent non-ASCII characters using only ASCII characters plus the "xn--" prefix. DNS labels have a maximum length of 63 characters, and this limit applies to the Punycode-encoded form. This means that very long internationalized domain labels may exceed the limit after encoding, even if the original Unicode label is within bounds.
Is Punycode the same as percent encoding for URLs?
No, Punycode and percent encoding serve different purposes. Punycode is specifically designed for domain name labels and produces compact ASCII strings prefixed with "xn--". Percent encoding is used for URL paths, query parameters, and other URI components, representing each byte as a percent sign followed by two hex digits. Domain names in URLs use Punycode, while the path and query portions use percent encoding. Both are necessary for fully internationalized URLs.
Do all browsers support internationalized domain names?
All modern browsers support IDN and Punycode. When you type an internationalized domain name in the address bar, the browser automatically converts it to Punycode for DNS resolution and may display either the Unicode or Punycode form depending on its security policies. Some browsers show the Unicode form for domains using a single consistent script but switch to displaying the Punycode form when mixed scripts are detected, as a defense against homograph attacks.
FAQ
How does Punycode Encoder work?
Encode Unicode domain labels to Punycode (xn-- prefix).