'Hiragana <-> katakana transliteration in 4 lines of Python'
Posted on Thu 26 May 2016 in Notes
This is a quick script to make good hiragana <-> katakana conversion in just 4 lines of Python.
This code will make it easy to convert かたかな
to カタカナ
and ヒラガナ
to ひらがな
without any dependencies. It even handles mixed script correctly.
If you don't need romaji translitteration and want to lower your scripts dependencies you can forgo pip installing some surprisingly large libraries just to convert from hiraganan to katakana and simply copy paste the below 4 lines (and preferrably a link to my homepage or github) and you are good to go.
Tested in Python 3.x, doesn't seem to work in Python 2.7
Download it off my github here
How it works
I use the builtin string function translate
which converts characters to corrosponding characters in a translations table, easily created with another string function, maketrans
. See documentation here
We simply create our hiragana and katakana translation tables and use the str.translate()
function to do the heavy lifting.
I've used Mark Rogoyski list of hiragana and katakana unicode codepoints and removed characters I don't want transliterated. For example, I want to be able to convert コーヒ to hiragana and back. If I had naively used the table, then ー
would be converted into ゜
, which wouldn't make any sense.
The magic happens in these 4 lines of code:
katakana_chart = "ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヽヾ"
hiragana_chart = "ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖゝゞ"
hir2kat = str.maketrans(hiragana_chart, katakana_chart)
kat2hir =str.maketrans(katakana_chart, hiragana_chart)
And it is used like so:
mixed = 'きゃりーぱみゅぱみゅは日本の歌手です。'
print(mixed.translate(hir2kat))
# out: キャリーパミュパミュハ日本ノ歌手デス。
# transliterate back and forth
print(mixed.translate(hir2kat).translate(kat2hir))
# out: きゃりーぱみゅぱみゅは日本の歌手です。
Notice how kanji and special characters are left alone.