Convert encoding for multiple files recursively
Posted on Thu 14 January 2016 in Notes
If you have a large corpus of text files in, say euc-jp encoding, they can be quite difficult to work with, since most command-line tools on modern systems expects utf-8 files.
iconv can be used to convert file encodings from one known encoding to another. One problem on OSX is that the -o option doesn't work and instead you have to use the redirect operator >
. Moreover you can't do this to overwrite an existing file, so if you have a large, complex directory structure you need to traverse recursively to change the encoding of each file, it becomes problematic.
I've found the following to work very well:
find . -type f -exec sh -c "iconv -f eucjp -t UTF-8 {} > {}.utf8" \; -exec mv "{}".utf8 "{}" \;
find
finds all files and directories recursively.
denotes starting directory. In this case, the current directory and thus everything below as well.-type f
limits the search to files only (so no directories will be returned)-exec
executes a command for each search resultsh -c
opens bash shell, and executes the string followin -ciconv -f eucjp -t UTF-8
converts encoding -f(rom) euc-jp to utf-8{}
denotes the search result (filename)>
the redirect operator. We run this line via the shell to get this to work, since it doesn't work if run directly via the -exec command (what a mess!){}.utf8
save to a file with “utf8” as the extension" \;
close the bash command and close the -exec command.-exec
do another command with the search resultmv "{}".utf8 "{}"
move the new file to the old filename, thus overwriting the original file\;
close the second -exec command.