Romanization

In linguistics, romanization or romanisation is changing text from a different writing system into the Roman (Latin) script. This helps people read and understand languages that use letters or symbols not found in the Roman alphabet.

There are different ways to do romanization. One common method is called transliteration. This shows written words as closely as possible. Another method is transcription, which tries to capture how words sound when spoken. Transcription can focus on the meaning units of speech or record every small sound very precisely, called phonetic transcription.

Romanization is important because it makes it easier for people who only know the Roman alphabet to read and study other languages. It is used in many books, maps, and computer systems around the world.

Methods

There are many ways to change text from one writing system into the Roman (Latin) alphabet. We pick a method based on what we need, like making text easy to read or keeping the original sounds.

Source language: Some methods work best for one language, keeping its special sounds. Others work for many languages.
Target language: Most methods are made for people who speak a certain language.
Simplicity: The Latin alphabet has fewer letters than many others, so extra symbols are needed to show all sounds.
Reversibility: Some methods let you go back to the original text, while others do not.

Transliteration

Main article: Transliteration

Transliteration changes each symbol from the original script to the Latin alphabet. It focuses on the symbols, not how they sound. For example, a system for Japanese can help someone see the original Japanese symbols correctly.

Transcription

Main article: Transcription (linguistics)

Phonemic

Phonetic

Compromise

For most languages, a good romanization means finding a balance. Pure transcription isn’t usually possible because languages have sounds that others don’t. Most romanizations today try to help people say words right, rather than just showing the symbols. For example, the Japanese word 柔術 is written as zyûzyutu in one system, but most English readers would find jūjutsu easier to say.

Romanization of specific writing systems

Arabic

The Arabic script is used to write Arabic, Persian, Urdu, Pashto and Sindhi and other languages. Romanization standards include:

Arabic

Deutsche Morgenländische Gesellschaft (1936)
BS 4280 (1968)
SATTS (1970s)
UNGEGN (1972)
DIN 31635 (1982)
ISO 233 (1984)
Qalam (1985)
ISO 233-2 (1993)
Buckwalter transliteration (1990s)
ALA-LC (1997)
Arabic chat alphabet

Persian

Armenian

Georgian

Notes:

Greek

There are romanization systems for both Modern and Ancient Greek.

ALA-LC
Beta Code
Greeklish
ISO 843 (1997)

Hebrew

The Hebrew alphabet is romanized using several standards:

ANSI Z39.25 (1975)
UNGEGN (1977)
ISO 259 (1984)
ISO 259-2 (1994)
ISO/DIS 259-3
ALA-LC

Indic (Brahmic) scripts

The Brahmic family of abugidas is used for languages of India and south-east Asia. Various transliteration conventions have been used for Indic scripts.

ISO 15919 (2001)
The National Library at Kolkata romanization
Harvard-Kyoto
ITRANS
ISCII (1988)

Devanagari–nastaʿlīq (Hindustani)

Hindustani is an Indo-Aryan language. Two standardized registers, Standard Hindi and Standard Urdu, are recognized as official languages in India and Pakistan.

The Hamari Boli Initiative aims to help Hindustani through romanization.

Chinese

Romanization of the Sinitic languages, particularly Mandarin, has been difficult. Many romanization tables include Chinese characters plus one or more romanizations.

Mandarin

China

Hanyu Pinyin (1958)

Taiwan

Main article: Chinese language romanization in Taiwan

Gwoyeu Romatzyh (GR, 1928–1986)
Mandarin Phonetic Symbols II (MPS II, 1986–2002)
Tongyong Pinyin (2002–2008)
Hanyu Pinyin (since January 1, 2009)

Singapore

Main article: Chinese language romanisation in Singapore

Cantonese

Wu

Min Nan or Hokkien

Teochew

Guangdong (1960)

Min Dong

Foochow Romanized

Min Bei

Kienning Colloquial Romanized

Japanese

Romanization is called "rōmaji" in Japanese. The most common systems are:

Hepburn (1867)
Nihon-shiki (1885)
Kunrei-shiki (1937)
JSL (1987)
ALA-LC
Wāpuro

Korean

The following systems are widely used:

Thai

Thai is written with its own script.

Royal Thai General System of Transcription
ISO 11940 1998 Transliteration
ISO 11940-2 2007 Transcription
ALA-LC

Nuosu

The Nuosu language is written with the Yi script. The only romanization system is YYPY.

Tibetan

The Tibetan script has two official romanization systems: Tibetan Pinyin and Roman Dzongkha.

Cyrillic

In English, the Library of Congress transliteration method is used.

In linguistics, scientific transliteration is used.

Belarusian

Bulgarian

A system based on scientific transliteration was official since the 1970s. Bulgarian authorities switched to the Streamlined System in 2009.

Kyrgyz

Macedonian

Russian

There is no single accepted system of writing Russian using the Latin script. Systems include:

BGN/PCGN (1947)
GOST 16876-71 (1971)
United Nations romanization system (1987)
ISO 9 (1995)
ALA-LC (1997)
"Volapuk" encoding (1990s)
Streamlined System
Comparative transliteration

Syriac

Main article: Syriac alphabet § Latin alphabet and romanization

Ukrainian

See also: Ukrainian Latin alphabet

ALA-LC
ISO 9
Ukrainian National transliteration

Consonants
Unicode	Persian letter	IPA	DMG (1969)	ALA-LC (1997)	BGN/PCGN (1958)	EI (1960)	EI (2012)	UN (1967)	UN (2012)	Pronunciation
U+0627	ا	ʔ, ∅	ʾ, —	ʼ, —				ʾ		- as in uh-oh
U+0628	ب	b	b							B as in Bob
U+067E	پ	p	p							P as in pet
U+062A	ت	t	t							T as in tall
U+062B	ث	s	s̱	s̱	s̄	t͟h	ṯ	s̄	s	S as in sand
U+062C	ج	dʒ	ǧ	j	j	d͟j	j	j		J as in jam
U+0686	چ	tʃ	č	ch	ch	č		ch	č	Ch as in Charlie
U+062D	ح	h	ḥ	ḥ	ḩ/ḥ	ḥ		ḩ	h	H as in holiday
U+062E	خ	x	ḫ	kh	kh	k͟h	ḵ	kh	x	somewhat resembling German Ch
U+062F	د	d	d							D as in Dave
U+0630	ذ	z	ẕ	ẕ	z̄	d͟h	ḏ	z̄	z	Z as in zero
U+0631	ر	r	r							R as in rabbit
U+0632	ز	z	z							Z as in zero
U+0698	ژ	ʒ	ž	zh	zh	z͟h	ž	zh	ž	S as in television or G as in genre
U+0633	س	s	s							S as in Sam
U+0634	ش	ʃ	š	sh	sh	s͟h	š	sh	š	Sh as in sheep
U+0635	ص	s	ṣ	ṣ	ş/ṣ	ṣ		ş	s	S as in Sam
U+0636	ض	z	ż	z̤	ẕ	ḍ	ż	ẕ	z	Z as in zero
U+0637	ط	t	ṭ	ṭ	ţ/ṭ	ṭ		ţ	t	t as in tank
U+0638	ظ	z	ẓ	ẓ	z̧/ẓ	ẓ	ẓ	z̧	z	Z as in zero
U+0639	ع	ʕ	ʿ	ʻ	ʼ	ʻ	ʻ	ʿ	ʿ	_____
U+063A	غ	ɢ~ɣ	ġ	gh	gh	g͟h	ḡ	gh	q	somewhat resembling French R
U+0641	ف	f	f							F as in Fred
U+0642	ق	ɢ~ɣ	q			ḳ		q		somewhat resembling French R
U+06A9	ک	k	k							C as in card
U+06AF	گ	ɡ	g							G as in go
U+0644	ل	l	l							L as in lamp
U+0645	م	m	m							M as in Michael
U+0646	ن	n	n							N as in name
U+0648	و	v~w	v				v, w	v		V as in vision
U+0647	ه	h	h	h	h	h		h	h	H as in hot
U+0629	ة	∅, t	—	h	—	t	h	—	—
U+06CC	ی	j	y							Y as in Yale
U+0621	ء	ʔ, ∅	ʾ	ʼ				ʾ
U+0623	أ	ʔ, ∅	ʾ	ʼ				ʾ
U+0624	ؤ	ʔ, ∅	ʾ	ʼ				ʾ
U+0626	ئ	ʔ, ∅	ʾ	ʼ				ʾ

Vowels
Unicode	Final	Medial	Initial	Isolated	IPA	DMG (1969)	ALA-LC (1997)	BGN/PCGN (1958)	EI (2012)	UN (1967)	UN (2012)	Pronunciation
U+064E	ـَ	ـَ	اَ	اَ	æ	a	a	a	a	a	a	A as in cat
U+064F	ـُ	ـُ	اُ	اُ	o	o	o	o	u	o	o	O as in go
U+0648 U+064F	ـو	ـو	—	—	o	o	o	o	u	o	o	O as in go
U+0650	ـِ	ـِ	اِ	اِ	e	e	i	e	e	e	e	E as in ten
U+064E U+0627	ـَا	ـَا	آ	آ	ɑː~ɒː	ā	ā	ā	ā	ā	ā	O as in hot
U+0622	ـآ	ـآ	آ	آ	ɑː~ɒː	ā, ʾā	ā, ʼā	ā	ā	ā	ā	O as in hot
U+064E U+06CC	ـَی	—	—	—	ɑː~ɒː	ā	á	á	ā	á	ā	O as in hot
U+06CC U+0670	ـیٰ	—	—	—	ɑː~ɒː	ā	á	á	ā	ā	ā	O as in hot
U+064F U+0648	ـُو	ـُو	اُو	اُو	uː, oː	ū	ū	ū	u, ō	ū	u	U as in actual
U+0650 U+06CC	ـی	ـیـ	ایـ	ای	iː, eː	ī	ī	ī	i, ē	ī	i	Y as in happy
U+064E U+0648	ـَو	ـَو	اَو	اَو	ow~aw	au	aw	ow	ow, aw	ow	ow	O as in go
U+064E U+06CC	ـَی	ـَیـ	اَیـ	اَی	ej~aj	ai	ay	ey	ey, ay	ey	ey	Ay as in play
U+064E U+06CC	ـیِ	—	—	—	–e, –je	–e, –ye	–i, –yi	–e, –ye	–e, –ye	–e, –ye	–e, –ye	Ye as in yes
U+06C0	ـهٔ	—	—	—	–je	–ye	–ʼi	–ye	–ye	–ye	–ye	Ye as in yes

Georgian letter	IPA	National system (2002)	BGN/PCGN (1981–2009)	ISO 9984 (1996)	ALA-LC (1997)	Unofficial system	Kartvelo translit	NGR2
ა	/ɑ/	a	a	a	a	a	a	a
ბ	/b/	b	b	b	b	b	b	b
გ	/ɡ/	g	g	g	g	g	g	g
დ	/d/	d	d	d	d	d	d	d
ე	/ɛ/	e	e	e	e	e	e	e
ვ	/v/	v	v	v	v	v	v	v
ზ	/z/	z	z	z	z	z	z	z
ჱ	/eɪ/		ey	ē	ē	é	ej	ẽ
თ	/tʰ/	t	tʼ	t̕	tʻ	T or t	t	t / t̊
ი	/i/	i	i	i	i	i	i	i
კ	/kʼ/	kʼ	k	k	k	k	ǩ	k̉
ლ	/l/	l	l	l	l	l	l	l
მ	/m/	m	m	m	m	m	m	m
ნ	/n/	n	n	n	n	n	n	n
ჲ	/i/, /j/		j	y	y		j	ĩ
ო	/ɔ/	o	o	o	o	o	o	o
პ	/pʼ/	pʼ	p	p	p	p	p̌	p̉
ჟ	/ʒ/	zh	zh	ž	ž	J, zh or j	ž	g̃
რ	/r/	r	r	r	r	r	r	r
ს	/s/	s	s	s	s	s	s	s
ტ	/tʼ/	tʼ	t	t	t	t	t̆	t̉
ჳ	/w/			w	w		ŭ	f̃
უ	/u/	u	u	u	u	u	u	u
ფ	/pʰ/	p	pʼ	p̕	pʻ	p or f	p	p / p̊
ქ	/kʰ/	k	kʼ	k̕	kʻ	q or k	q or k	k / k̊
ღ	/ʁ/	gh	gh	ḡ	ġ	g, gh or R	g, gh or R	q̃
ყ	/qʼ/	qʼ	q	q	q	y	q	q
შ	/ʃ/	sh	sh	š	š	sh or S	š	x
ჩ	/t͡ʃ(ʰ)/	ch	chʼ	č̕	čʻ	ch or C	č	c̃
ც	/t͡s(ʰ)/	ts	tsʼ	c̕	cʻ	c or ts	c	c
ძ	/d͡z/	dz	dz	j	ż	dz or Z	ʒ	d̃
წ	/t͡sʼ/	tsʼ	ts	c	c	w, c or ts	ʃ	c̉
ჭ	/t͡ʃʼ/	chʼ	ch	č	č	W, ch or tch	ʃ̌	j̉
ხ	/χ/	kh	kh	x	x	x or kh (rarely)	x	k̃
ჴ	/q/, /qʰ/		qʼ	ẖ	x̣		q̌	q̊
ჯ	/d͡ʒ/	j	j	ǰ	j	j	-	j
ჰ	/h/	h	h	h	h	h	h	h
ჵ	/oː/			ō	ō		ȯ	h̃

Overview and summary

The chart below shows common ways to change spoken sounds into Roman letters for many alphabets. This helps many people, but there are other ways to do it, and some letters don’t always follow the same rules. For more information, see the sections for each language above. (Hangul characters are broken down into jamo pieces.)

Romanized	IPA	Greek	Cyrillic	Amazigh	Hebrew	Arabic	Persian	Katakana	Hangul	Bopomofo
A	a	A	А	ⴰ	ַ, ֲ, ָ	َ, ا	ا, آ	ア	ㅏ	ㄚ
AE	ai̯/ɛ	ΑΙ							ㅐ
AI	ai				י ַ					ㄞ
B	b	ΜΠ, Β	Б	ⴱ	בּ	ﺏ ﺑ ﺒ ﺐ	ﺏ ﺑ		ㅂ	ㄅ
C	k/s	Ξ								ㄘ
CH	ʧ	TΣ̈	Ч		צ׳		چ		ㅊ	ㄔ
CHI	ʨi							チ
D	d	ΝΤ, Δ	Д	ⴷ, ⴹ	ד	ﺩ — ﺪ, ﺽ ﺿ ﻀ ﺾ	د		ㄷ	ㄉ
DH	ð	Δ			דֿ	ﺫ — ﺬ
DZ	ʣ	ΤΖ	Ѕ
E	e/ɛ	Ε, ΑΙ	Э	ⴻ	, ֱ, י ֵֶ, ֵ, י ֶ			エ	ㅔ	ㄟ
EO	ʌ								ㅓ
EU	ɯ								ㅡ
F	f	Φ	Ф	ⴼ	פ (or its final form ף )	ﻑ ﻓ ﻔ ﻒ	ﻑ			ㄈ
FU	ɸɯ							フ
G	ɡ	ΓΓ, ΓΚ, Γ	Г	ⴳ, ⴳⵯ	ג		گ		ㄱ	ㄍ
GH	ɣ	Γ	Ғ	ⵖ	גֿ, עֿ	ﻍ ﻏ ﻐ ﻎ	ق غ
H	h	Η	Һ	ⵀ, ⵃ	ח, ה	ﻩ ﻫ ﻬ ﻪ, ﺡ ﺣ ﺤ ﺢ	ه ح ﻫ		ㅎ	ㄏ
HA	ha							ハ
HE	he							ヘ
HI	hi							ヒ
HO	ho							ホ
I	i/ɪ	Η, Ι, Υ, ΕΙ, ΟΙ	И, І	ⵉ	ִ, י ִ	دِ		イ	ㅣ	ㄧ
IY	ij					دِي
J	ʤ	TZ̈	ДЖ, Џ	ⵊ	ג׳	ﺝ ﺟ ﺠ ﺞ	ج		ㅈ	ㄐ
JJ	ʦ͈/ʨ͈								ㅉ
K	k	Κ	К	ⴽ, ⴽⵯ	כּ	ﻙ ﻛ ﻜ ﻚ	ک		ㅋ	ㄎ
KA	ka							カ
KE	ke							ケ
KH	x	X	Х	ⵅ	כ, חֿ (or its final form ך )	ﺥ ﺧ ﺨ ﺦ	خ
KI	ki							キ
KK	k͈								ㄲ
KO	ko							コ
KU	kɯ							ク
L	l	Λ	Л	ⵍ	ל	ﻝ ﻟ ﻠ ﻞ	ل		ㄹ	ㄌ
M	m	Μ	М	ⵎ	מ (or its final form ם )	ﻡ ﻣ ﻤ ﻢ	م		ㅁ	ㄇ
MA	ma							マ
ME	me							メ
MI	mi							ミ
MO	mo							モ
MU	mɯ							ム
N	n	Ν	Н	ⵏ	נ (or its final form ן )	ﻥ ﻧ ﻨ ﻦ	ن	ン	ㄴ	ㄋ
NA	na							ナ
NE	ne							ネ
NG	ŋ								ㅇ
NI	ɲi							ニ
NO	no							ノ
NU	nɯ							ヌ
O	o	Ο, Ω	О		, ֳ, וֹֹ		ُا	オ	ㅗ
OE	ø								ㅚ
P	p	Π	П		פּ		پ		ㅍ	ㄆ
PP	p͈								ㅃ
PS	ps	Ψ
Q	q	Θ		ⵇ	ק	ﻕ ﻗ ﻘ ﻖ	غ ق			ㄑ
R	r	Ρ	Р	ⵔ, ⵕ	ר	ﺭ — ﺮ	ر		ㄹ	ㄖ
RA	ɾa							ラ
RE	ɾe							レ
RI	ɾi							リ
RO	ɾo							ロ
RU	ɾɯ							ル
S	s	Σ	С	ⵙ, ⵚ	ס, שׂ	ﺱ ﺳ ﺴ ﺲ, ﺹ ﺻ ﺼ ﺺ	س ث ص		ㅅ	ㄙ
SA	sa							サ
SE	se							セ
SH	ʃ	Σ̈	Ш	ⵛ	שׁ	ﺵ ﺷ ﺸ ﺶ	ش			ㄕ
SHCH	ʃʧ		Щ
SHI	ɕi							シ
SO	so							ソ
SS	s͈								ㅆ
SU	sɯ							ス
T	t	Τ	Т	ⵜ, ⵟ	ט, תּ, ת	ﺕ ﺗ ﺘ ﺖ, ﻁ ﻃ ﻄ ﻂ	ت ط		ㅌ	ㄊ
TA	ta							タ
TE	te							テ
TH	θ	Θ			תֿ	ﺙ ﺛ ﺜ ﺚ
TO	to							ト
TS	ʦ	ΤΣ	Ц		צ (or its final form ץ )
TSU	ʦɯ							ツ
TT	t͈								ㄸ
U	u	ΟΥ, Υ	У	ⵓ	, וֻּ	دُ		ウ	ㅜ	ㄩ
UI	ɰi								ㅢ
UW	uw					دُو
V	v	B	В		ב		و
W	w	Ω		ⵡ	ו, וו	ﻭ — ﻮ
WA	wa							ワ	ㅘ
WAE	wɛ								ㅙ
WE	we							ヱ	ㅞ
WI	y/ɥi							ヰ	ㅟ
WO	wo							ヲ	ㅝ
X	x/ks	Ξ, Χ								ㄒ
Y	j	Υ, Ι, ΓΙ	Й, Ы, Ј	ⵢ	י	ﻱ ﻳ ﻴ ﻲ	ی
YA	ja		Я					ヤ	ㅑ
YAE	jɛ								ㅒ
YE	je		Е, Є						ㅖ
YEO	jʌ								ㅕ
YI	ji		Ї
YO	jo		Ё					ヨ	ㅛ
YU	ju		Ю					ユ	ㅠ
Z	z	Ζ	З	ⵣ, ⵥ	ז	ﺯ — ﺰ, ﻅ ﻇ ﻈ ﻆ	ز ظ ذ ض			ㄗ
ZH	ʐ/ʒ	Ζ̈	Ж		ז׳		ژ			ㄓ

Methods

Transliteration

Transcription

Phonemic

Phonetic

Compromise

Romanization of specific writing systems

Arabic

Arabic

Persian

Armenian

Georgian

Greek

Hebrew

Indic (Brahmic) scripts

Devanagari–nastaʿlīq (Hindustani)

Chinese

Mandarin

China

Taiwan

Singapore

Cantonese

Wu

Min Nan or Hokkien

Teochew

Min Dong

Min Bei

Japanese

Korean

Thai

Nuosu

Tibetan

Cyrillic

Belarusian

Bulgarian

Kyrgyz

Macedonian

Russian

Syriac

Ukrainian

Overview and summary

Related articles