Showing Japanese characters in Matplotlib on Ubuntu

Posted on Mon 27 October 2014 in Notes

TL;DR: Install Japanese language support and insert the following in your python script

import matplotlib
matplotlib.rc('font', family='TakaoPGothic')

If you are working with any kind of NLP in Python that involves Japanese, it is paramount to be able to view summary statics in the form of graphs that in one way or another includes Japanese characters.

Below is a graph showing Zipf's Law for the distribution of characters used in [TL;DR: Install Japanese language support and insert the following in your python script

import matplotlib
matplotlib.rc('font', family='TakaoPGothic')

If you are working with any kind of NLP in Python that involves Japanese, it is paramount to be able to view summary statics in the form of graphs that in one way or another includes Japanese characters.

Below is a graph showing Zipf's Law for the distribution of characters used in](http://en.wikipedia.org/wiki/Tetsuko_Kuroyanagi) ‘Totto Channel', the sequel to her famous “Totto Chan: The little girl by the window”.


Character Distribution of 100 most used characters - but which ones?

Character Distribution of 100 most used characters - but which ones?


On most systems, Matplotlib will not be able to display Japanese characters out-of-the-box and this is a big problem as the graph above is completely useless for even the most basic investigation.

I've tested on OSX, Windows 8 and Ubuntu and only OSX manages to work out of the box, despite my Windows installation being Japanese!

Most advice online will tell you to change the font used by Matplotlib, but if you are on Ubuntu it might not be completely obvious which font you need to use! Moreover there are many ways to change the font.

I've found the simplest way of changing fonts to simply be using matplotlib.rc

import matplotlib
matplotlib.rc('font', family='Monospace')

In family you can either insert the name of a font family (as in the above example) or you can name a specific font, which is what you want to do in this case. But which one? I wrote the following script to help check which fonts will work.

# -*- coding: utf-8 -*-
"""
Matplotlib font checker
Prints a figure displaying a variety of system fonts and their ability to produce Japanese text

@author: Mads Olsgaard, 2014

Released under BSD License.
"""

import matplotlib
import matplotlib.pyplot as plt
from matplotlib import font_manager

fonts = ['Droid Sans', 'Vera', 'TakaoGothic', 'TakaoPGothic', 'Liberation Sans', 'ubuntu', 'FreeSans', 'Droid Sans Japanese', 'DejaVu Sans']
#fonts = ['Arial', 'Times New Roman', 'Helvetica'] #uncomment this line on Windows and see if it helps!
english = 'The quick ...'
japanese = '日本語'
x = 0.1
y = 1

# Buils headline
plt.text(x+0.5,y, 'english')
plt.text(x+0.7, y, 'japanese')
plt.text(x,y, 'Font name')
plt.text(0,y-0.05, '-'*100)
y -=0.1

for f in fonts:
    matplotlib.rc('font', family='DejaVu Sans')
    plt.text(x,y, f+':')
    matplotlib.rc('font', family=f)
    plt.text(x+0.5,y, english)
    plt.text(x+0.7, y, japanese)
    y -= 0.1
    print(f, font_manager.findfont(f))  # Sanity check. Prints the location of the font. If the font it not found, an error message is printed and the location of the fallback font is shown

plt.show()

On ubuntu the output should be the following:

Droid Sans /usr/share/fonts/truetype/droid/DroidSans.ttf
Vera /home/supermads/anaconda3/lib/python3.4/site-packages/matplotlib/mpl-data/fonts/ttf/Vera.ttf
TakaoGothic /usr/share/fonts/truetype/takao-gothic/TakaoGothic.ttf
TakaoPGothic /usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf
Liberation Sans /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf
ubuntu /usr/share/fonts/truetype/ubuntu-font-family/Ubuntu-R.ttf
FreeSans /usr/share/fonts/truetype/freefont/FreeSans.ttf
Droid Sans Japanese /usr/share/fonts/truetype/droid/DroidSansJapanese.ttf
DejaVu Sans /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf

As you can see, I'm running Anaconda Python 3, and if Anaconda can't find a font it will fallback into it's own folder to load the Vera font.

Font check

Surprisingly, Droid does support Japanese, it just saves the Japanese character space in a seperate font file, rendering it useless for this purpose. However, the Takao font family does work for our purpose.

Takao fonts should be installed by default if you have set your location somewhere in Japan during installation of Ubuntu or if you have installed support for Japanese language in System SettingsLanguage Support (just hit the super key and search for language). I recommend this, since this will also install the Japanese input method, Anthy

You can also use apt-get, like this from the command line (not tested):

sudo apt-get install fonts-takao-mincho fonts-takao-gothic fonts-takao-pgothic

And now we can finally see which characters Kuronayagi used the most for her sequel:

Character Distribution of 100 most used characters

And apparently, that's the Japanese comma, also called 読点