Convert encoding for multiple files recursively
Posted on Thu 14 January 2016 in Notes
If you have a large corpus of text files in, say euc-jp encoding, they can be quite difficult to work with, since most command-line tools on modern systems expects utf-8 files.
iconv can be used to convert file encodings from one known encoding to another. One problem on OSX is that the -o option doesn't work and instead you have to use the redirect operator >. Moreover you can't do this to overwrite an existing file, so if you have a large, complex directory structure you need to traverse recursively to change the encoding of each file, it becomes problematic.
I've found the following to work very well:
find . -type f -exec sh -c "iconv -f eucjp -t UTF-8 {} > {}.utf8" \; -exec mv "{}".utf8 "{}" \;
findfinds all files and directories recursively.denotes starting directory. In this case, the current directory and thus everything below as well.-type flimits the search to files only (so no directories will be returned)-execexecutes a command for each search resultsh -copens bash shell, and executes the string followin -ciconv -f eucjp -t UTF-8converts encoding -f(rom) euc-jp to utf-8{}denotes the search result (filename)>the redirect operator. We run this line via the shell to get this to work, since it doesn't work if run directly via the -exec command (what a mess!){}.utf8save to a file with “utf8” as the extension" \;close the bash command and close the -exec command.-execdo another command with the search resultmv "{}".utf8 "{}"move the new file to the old filename, thus overwriting the original file\;close the second -exec command.