The hierarchy in one sentence

Unicode is the standard that assigns a number to every character. UTF-8 is an encoding scheme that represents those numbers as bytes. ASCII is an older standard that works for only 128 characters — a subset of both.

ASCII: the original standard

ASCII (American Standard Code for Information Interchange) was defined in 1963. It maps 128 characters to numbers 0–127 — the English alphabet, digits, punctuation, and control characters. Every "A" is the number 65. Every "a" is 97.

The limitation: 128 characters is only enough for English. No accented letters (é, ü, ñ), no Chinese, Arabic, Hebrew, or any other writing system. No emoji.

Unicode: the universal standard

Unicode was created in 1991 to assign a unique number — called a code point — to every character in every writing system. The current Unicode standard (15.1) defines over 149,000 characters across 161 scripts, plus emoji.

Examples:

A → U+0041
é → U+00E9
中 → U+4E2D
😀 → U+1F600

Unicode defines the characters — it does not define how to store them as bytes in a file or transmit them over a network. That is what encoding schemes like UTF-8 are for.

UTF-8: the dominant encoding

UTF-8 is a variable-width encoding for Unicode. It represents each character using 1 to 4 bytes:

ASCII characters (U+0000 to U+007F): 1 byte — identical to ASCII representation
Most Latin-script accented letters, Greek, Cyrillic: 2 bytes
Most East Asian characters: 3 bytes
Emoji, rare characters: 4 bytes

The genius of UTF-8: it is fully backward-compatible with ASCII. Any file that only contains ASCII characters is a valid UTF-8 file. This is why UTF-8 became the universal encoding for the web — it works for every language while adding no overhead for English text.

UTF-8 is the encoding for virtually all modern web content: HTML pages, JSON, CSS, most source code files, and most databases. If in doubt, use UTF-8.

UTF-16 and UTF-32

Other Unicode encodings exist:

UTF-16: uses 2 or 4 bytes per character. Used internally by Windows, Java, and JavaScript strings. Compact for East Asian text, wasteful for ASCII.
UTF-32: always 4 bytes per character. Simple but wastes space. Rarely used in practice.

Why encoding errors happen: the mismatch

An encoding error occurs when text written in one encoding is read in another. Common scenarios:

CSV file saved as Latin-1 (ISO-8859-1) opened in a tool expecting UTF-8:accented characters like é (E9 in Latin-1) appear as garbled multi-byte sequences
MySQL table set to latin1 but receiving UTF-8 data:characters above ASCII are stored incorrectly
Email with no Content-Type charset declaration:mail clients guess the encoding and sometimes guess wrong

Mojibake: the encoding error you have seen

"Mojibake" (文字化け) is the Japanese term for garbled text from encoding mismatch. The classic example: é encoded as UTF-8 (0xC3 0xA9) read as Windows-1252 produces é.

If you see é where é should be, or Â where nothing should be, the text was encoded as UTF-8 but decoded as Latin-1 or Windows-1252.

How to fix encoding problems

In HTML: always include <meta charset="UTF-8">in the <head>
In MySQL: set the database, table, and column charset to utf8mb4(not utf8, which is MySQL's broken 3-byte implementation that can't store emoji)
For CSV files: specify UTF-8 when opening in Excel (Import → File Origin → UTF-8)
In Python: use open(file, encoding='utf-8') explicitly

Summary

ASCII: 128 characters, English only, the original standard
Unicode: 149,000+ characters from all writing systems — assigns numbers, not bytes
UTF-8: the dominant encoding for Unicode — variable width, backward-compatible with ASCII, use this by default
Encoding errors are caused by reading text with the wrong encoding — the fix is declaring the correct encoding explicitly everywhere

Need to encode text for safe transport? The Base64 encoder and URL encoder handle the most common encoding scenarios.

UTF-8, ASCII, Unicode: What's the Difference and Why Does It Matter?

The hierarchy in one sentence

ASCII: the original standard

Unicode: the universal standard

UTF-8: the dominant encoding

UTF-16 and UTF-32

Why encoding errors happen: the mismatch

Mojibake: the encoding error you have seen

How to fix encoding problems

Summary

Browse by category

Everything you can do — for free

Work with images

Edit and format text

Stay safe online

Calculate anything