HTML Decode

HTML Decode Text Online

Need to html decode entity references back to readable characters? Our free tool lets you unescape html entities instantly, converting encoded strings like & and < back to their original symbols. Whether you are extracting content from HTML source code, processing web scraping results, or cleaning up data exports that contain encoded characters, this converter handles named entities, decimal references, and hexadecimal references with ease. Paste your encoded text and get clean, readable output in one click.

What Is HTML Decoding

HTML decoding is the reverse process of HTML encoding. It converts HTML character entity references back to their original characters. When text is encoded for safe inclusion in HTML markup, special characters are replaced with entity references. The ampersand becomes & the less-than sign becomes < the greater-than sign becomes > and quotation marks become " or ' for single quotes. HTML decoding reverses these substitutions, restoring the original characters so the text can be read, processed, or used in non-HTML contexts.

HTML entities come in three forms, and a proper decoder must handle all of them. Named entities use a descriptive word between an ampersand and a semicolon, such as & for the ampersand, © for the copyright symbol, and   for the non-breaking space. The HTML5 specification defines over 2,200 named character references covering letters, symbols, mathematical operators, and technical characters. Decimal numeric references use the format &#nnnn; where nnnn is the Unicode code point in base 10. Hexadecimal numeric references use &#xhhhh; where hhhh is the code point in base 16.

The need to unescape html arises frequently in data processing pipelines. Web scraping tools capture HTML source code that contains encoded entities. Database exports from content management systems may include encoded text. API responses sometimes return HTML-encoded strings. Email content extracted from HTML messages contains encoded special characters. In all these cases, decoding the entities produces clean, human-readable text that can be further processed, analyzed, or displayed in non-HTML environments.

How the HTML Decode Works

The HTML decoding algorithm scans the input string looking for entity reference patterns. When it encounters an ampersand character, it reads ahead to find the matching semicolon and interprets the content between them. For named entities, the decoder looks up the name in a reference table that maps entity names to their corresponding Unicode characters. For decimal numeric references, it converts the decimal number to a Unicode code point. For hexadecimal references, it converts the hex value to a code point. Characters that are not part of an entity reference pass through unchanged.

Edge cases require careful handling in a robust decoder. Some legacy HTML content uses named entities without trailing semicolons, which browsers tolerate but strict parsers may reject. Numeric references can point to code points that are unassigned, reserved, or represent control characters. The decoder must decide how to handle invalid references, either preserving them as literal text, replacing them with a replacement character, or raising an error. Our tool takes the pragmatic approach of decoding valid references and preserving invalid ones as-is.

If you need to perform the reverse operation and encode special characters for safe HTML display, our HTML entity encoding tool handles that direction. For decoding percent-encoded URL strings, the URL decode converter tool processes web address encoding. You might also want to convert decoded HTML content into other formats using the HTML to Markdown converter for documentation workflows.

Syntax Comparison

Comparing HTML-encoded text with other encoding formats helps illustrate what HTML decoding specifically targets. Here is the same content shown in its encoded and decoded forms:

HTML encoded: <h1>Café & Bistro</h1>

HTML decoded: <h1>Cafe & Bistro</h1> (with accented e)

Named entity: &copy; 2024 &mdash; All rights reserved

Decoded result: (copyright symbol) 2024 (em dash) All rights reserved

Decimal reference: &#72;&#101;&#108;&#108;&#111;

Decoded result: Hello

Hex reference: &#x48;&#x65;&#x6C;&#x6C;&#x6F;

Decoded result: Hello

Notice that HTML decoding handles all three entity formats uniformly. Named entities like &eacute; are resolved using the HTML5 entity table, while numeric references are converted directly from their code point values. The decoder produces the same output regardless of which entity format was used to encode the original character.

Common Use Cases

HTML decoding is essential in many data processing and development workflows:

Web Scraping and Data Extraction: When scraping content from websites, the extracted HTML source contains encoded entities throughout the text. A product name like "Ben &amp; Jerry&#39;s" needs to be decoded to "Ben & Jerry's" before it can be stored in a database or displayed in a different application. Web scraping libraries often provide automatic decoding, but when working with raw HTML strings or processing partial content, manual decoding is necessary to produce clean, usable text.

Content Migration Between Platforms: Moving content from one CMS to another frequently involves dealing with HTML-encoded text. WordPress, Drupal, Joomla, and other platforms encode content differently. During migration, exported content may contain double-encoded entities or mixed encoding styles. Decoding the content to its raw form before re-importing ensures that special characters display correctly in the new platform without residual entity references appearing as visible text.

API Response Processing: Many web APIs return HTML-encoded strings in their JSON or XML responses. Social media APIs, news aggregation services, and content delivery platforms often encode special characters in titles, descriptions, and body text. Before displaying this content in a mobile app, desktop application, or different web context, the HTML entities must be decoded to their original characters.

Email Content Extraction: HTML emails contain encoded entities for special characters, typographic quotes, em dashes, and non-breaking spaces. When extracting plain text from HTML emails for indexing, search, or archival purposes, decoding these entities produces readable text. Without decoding, the extracted content would be littered with entity references that make it difficult to read and search.

Database Cleanup and Normalization: Over time, databases can accumulate text with inconsistent encoding. Some records may contain raw characters while others contain HTML entities for the same characters. Decoding all entity references and storing the raw text normalizes the data, making searches and comparisons reliable. This is particularly important for fields used in search indexes, where encoded and unencoded versions of the same text would not match.

HTML Decode Examples

Here are practical examples showing how HTML-encoded text is decoded back to its original form:

Example 1 - Basic named entities:

Input: Tom &amp; Jerry

Output: Tom & Jerry

Example 2 - Encoded HTML tags:

Input: &lt;p&gt;Hello World&lt;/p&gt;

Output: <p>Hello World</p>

Example 3 - Quotation marks and apostrophes:

Input: She said &quot;it&#39;s fine&quot;

Output: She said "it's fine"

Example 4 - Numeric decimal references:

Input: &#169; 2024 &#8212; All rights reserved

Output: (copyright symbol) 2024 (em dash) All rights reserved

Example 5 - Hexadecimal references:

Input: &#x20AC;100 &#x2014; Special &#x2764; Offer

Output: (euro sign)100 (em dash) Special (heart) Offer

Example 6 - Mixed entity types in real content:

Input: Caf&eacute; &amp; Cr&egrave;me &mdash; &#36;9.99

Output: Cafe & Creme (with accents) (em dash) $9.99

In programming languages, HTML decoding is available through built-in functions and libraries. In JavaScript, you can use the DOMParser API or create a temporary textarea element to leverage the browser's built-in decoder. In Python, the html.unescape() function handles all entity types. In PHP, html_entity_decode() and htmlspecialchars_decode() provide decoding with different levels of coverage. In Java, Apache Commons Text provides StringEscapeUtils.unescapeHtml4(). In C#, System.Net.WebUtility.HtmlDecode() is the standard approach. Our online tool performs the same operation instantly without writing any code.

Frequently Asked Questions

What is the difference between HTML decode and unescape HTML?

HTML decode and unescape HTML are two names for the same operation. Both refer to converting HTML character entity references back to their original characters. The term "decode" is more commonly used in the context of data processing and programming APIs, while "unescape" is often used in JavaScript contexts and web development discussions. Regardless of which term you use, the process is identical: entity references like &amp; &lt; and &#169; are converted back to their corresponding characters. Our tool handles both named entities and numeric references in a single pass.

What types of HTML entities does this decoder support?

Our decoder supports all three types of HTML entity references. Named entities use descriptive words like &amp; for ampersand, &copy; for copyright, and &nbsp; for non-breaking space. The full HTML5 named character reference table with over 2,200 entries is supported. Decimal numeric references use the format &#nnnn; where the number represents a Unicode code point in base 10. Hexadecimal numeric references use &#xhhhh; where the value is a Unicode code point in base 16. All three formats are decoded in a single operation, and you can mix them freely in your input text.

Can HTML decoding produce dangerous output?

Yes, HTML decoding can produce output that contains active HTML markup, including script tags and event handlers. If encoded content like &lt;script&gt;alert(1)&lt;/script&gt; is decoded, the result is a functional script tag. This is why decoded content should never be inserted directly into an HTML page without re-encoding or sanitization. The decode operation itself is safe, but how you use the decoded output matters. If the decoded text will be displayed in an HTML context, it must be re-encoded or passed through a sanitization library to prevent cross-site scripting vulnerabilities.

How do I handle double-encoded HTML entities?

Double encoding happens when already-encoded text is encoded again. The entity &amp; becomes &amp;amp; after a second encoding pass. When you decode double-encoded text once, you get &amp; instead of the expected ampersand character. To fully decode double-encoded content, run the decoder twice. If you are unsure how many encoding layers exist, decode repeatedly until the output stops changing. To prevent double encoding in your applications, always encode at the output stage and store raw text in your database rather than pre-encoded text.

Is HTML decoding the same as URL decoding?

No, HTML decoding and URL decoding are different operations that reverse different encoding schemes. HTML decoding converts entity references like &amp; and &lt; back to their original characters. URL decoding converts percent-encoded sequences like %20 and %26 back to their original characters. A string might contain both types of encoding if it was URL-encoded and then embedded in HTML. In that case, you would need to apply HTML decoding first to resolve the entity references, then URL decoding to resolve the percent-encoded sequences. For URL decoding, use our URL percent decode tool.

What happens with invalid or unknown entity references?

When the decoder encounters an entity reference it cannot resolve, it preserves the original text unchanged. For example, if the input contains &fakename; which is not a valid HTML entity name, the decoder leaves it as-is in the output. Similarly, numeric references pointing to invalid Unicode code points, such as code points in the surrogate range or beyond the maximum Unicode value, are handled gracefully. This behavior ensures that partially encoded or malformed input does not cause errors and that any unrecognized sequences remain visible for manual inspection.

How do I decode HTML entities in JavaScript?

JavaScript does not have a single built-in function for HTML decoding, but there are several effective approaches. The most common method is to create a temporary textarea element, set its innerHTML to the encoded string, and read the value property to get the decoded text. This leverages the browser's built-in HTML parser. For server-side JavaScript in Node.js, you can use libraries like he or html-entities that provide dedicated decode functions. The DOMParser API is another browser-based option that parses the encoded string as an HTML document and extracts the decoded text content. Each approach handles named entities, decimal references, and hexadecimal references correctly.

Can I decode only specific HTML entities and leave others encoded?

Selective decoding is possible but requires a custom implementation rather than a standard decode function. Standard decoders process all entity references in the input. If you need to decode only certain entities, such as converting &amp; back to ampersand while leaving &lt; and &gt; encoded, you would need to use targeted string replacement for the specific entities you want to decode. This is an uncommon requirement, but it can arise when you want to partially process HTML content while preserving certain safety encodings. For most use cases, decoding all entities at once and then re-encoding as needed for the target context is the cleaner approach.

FAQ

How does HTML Decode work?

Decode HTML entities back to characters.

Ad