'Hiragana <-> katakana transliteration in 4 lines of Python'

Posted on Thu 26 May 2016 in Notes

This is a quick script to make good hiragana <-> katakana conversion in just 4 lines of Python.

This code will make it easy to convert かたかな to カタカナ and ヒラガナ to ひらがな without any dependencies. It even handles mixed script correctly.

If you don't need romaji translitteration and want to lower your scripts dependencies you can forgo pip installing some surprisingly large libraries just to convert from hiraganan to katakana and simply copy paste the below 4 lines (and preferrably a link to my homepage or github) and you are good to go.

Tested in Python 3.x, doesn't seem to work in Python 2.7

Download it off my github here

How it works

I use the builtin string function translate which converts characters to corrosponding characters in a translations table, easily created with another string function, maketrans. See documentation here

We simply create our hiragana and katakana translation tables and use the str.translate() function to do the heavy lifting.

I've used Mark Rogoyski list of hiragana and katakana unicode codepoints and removed characters I don't want transliterated. For example, I want to be able to convert コーヒ to hiragana and back. If I had naively used the table, then would be converted into , which wouldn't make any sense.

The magic happens in these 4 lines of code:

katakana_chart = "ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヽヾ"
hiragana_chart = "ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖゝゞ" 
hir2kat = str.maketrans(hiragana_chart, katakana_chart)
kat2hir  =str.maketrans(katakana_chart, hiragana_chart)

And it is used like so:

mixed = 'きゃりーぱみゅぱみゅは日本の歌手です。'
# out: キャリーパミュパミュハ日本ノ歌手デス。

# transliterate back and forth
# out: きゃりーぱみゅぱみゅは日本の歌手です。

Notice how kanji and special characters are left alone.