I recently started studying Japanese, and so I wanted my computer to work in Japanese too - if nothing else so I could use it to prepare study aids. I use a home-brewed configuration of Slackware Linux (effectively "Linux From Scratch," though I didn't actually follow that project's how-to documents) and I wanted the Japanese stuff to work nicely with the rest of my configuration, including the application software and tools I already use for Canadian English. And I wanted to typeset in Japanese with LaTeX. That meant it wasn't as simple as just choosing "Japanese" during installation of one of the more entry-level distributions. Here are some notes on what I had to do, which may be helpful for others in similar situations.

My tutor often tells me, "Well, that way of saying it is not incorrect, but it is incorrect and you should say it this other way instead." I think maybe there's a cultural thing about never saying in an unqualified way that the student is wrong. Very much of the documentation you will find on the Net is written in the same spirit. There are dozens of ways to accomplish each of the sub-goals involved in the general goal of "I want my computer to work in both Japanese and English" and very few of them actually work. Which one is correct, especially for you? 分かりません。 This page documents what seemed to work for me.

konsole and joe

This part turned out to be easy. I set the LANG environment variable to the value en_US.UTF-8 (which means "American English, encoded in UTF-8") and the konsole terminal emulator and joe text editor just magically became able to understand UTF-8 encoding. This article on linux.com helped me find the appropriate setting.

The joe documentation isn't very clear; joe supports a lot of complicated options for detecting when files are or are not in UTF-8. But leaving it strictly on the default seemed to produce the results I wanted once LANG=en_US.UTF-8 was in effect. Similarly, konsole has an "encoding" option in its menus, but leaving it on the default seemed to produce the right results once the environment variable was set (which had to be done before starting X so everything would inherit it).

I think I already had appropriate fonts installed somewhere that konsole (or the X server, or whatever libraries they're using) could find them. If you didn't, you would need to do that. There may be something wrong with my system-wide fonts still, because Firefox sometimes crashes when I visit some Web sites that contain kanji, or (even more often) actual Chinese as opposed to Japanese text. At some point I'll have to figure out where the system-wide font substitution configuration is, because I think that's where the problem is, and make sure there's nothing screwed up in that. ETA: This issue has now (August 2009) been fixed by installing a new version of Freetype2. Solution found by locating the error message Firefox produces, searching the Web for "libxul.so: undefined symbol: FT_GlyphSlot_Embolden", and digging through the resulting storm of Web-forum messages.

One gotcha: I tried to put the environment variable setting in the system-wide shell config file /etc/profile and that didn't work because the settings there were overridden by ones in a file called /etc/profile.d/lang.sh (and in /etc/profile.d/lang.csh - I don't use csh, but it's wise to keep them parallel just in case I or somebody else on my system do try to use csh some day). This one took a long time to debug.

Another gotcha: on my firewall box there seems to be another variable called LC_ALL whose value defaults to POSIX and when that's in place, it overrides any setting of LANG. It might be better to put my configuration in LC_ALL instead of in LANG, but I held off on doing that because I don't want to start getting currency amounts in yen, U before E in collation order, etc. Instead I just removed LC_ALL.

Note: some Japanese-language text files you will find on the Net are not written in UTF-8 but in some other encoding (EUC-JP, I think), and with this configuration you can't view those and have them come out right. For the moment, though, I think UTF-8 is definitely the one I want. I'm more interested in being able to create my own documents than in reading Japanese-language text files. Web pages will be going through a browser that does character set conversion anyway.

Just because the terminal emulator and text editor can handle UTF-8 doesn't mean you can type in Japanese script; your keyboard doesn't generate UTF-8 codes. With just this change I could edit UTF-8 files but to enter non-ASCII characters I had to go find them on the Web, and cut and paste them one at a time with the mouse.

Input: SCIM and Anthy

Typing in Japanese is almost as bad as transcoding video, for the same reasons. There are several components you have to put together, you have several choices for which implementation to use of each component, many of them are meta-components that have their own stables of pluggable modules, and they all have compatibility modules that supposedly allow them to emulate or work with one another, but of course each one wants to be on top. And they're all engineered on the assumption that most users will want to use all available features, all at once, all the time. Mix English, Arabic, and Japanese, in every sentence, typesetting them respectively left-to-right, right-to-left, and vertically, with spell checking on all three? You bet!

Here are the components I know I needed to download and install:

  • SCIM, the "Smart Common Input Method," which provides a general interface for complicated methods of typing.
  • Anthy, which is a library for doing the AI task of translating typing on an ASCII keyboard (in roomaji, like "korehanihongodesu.") into Japanese text (like 「これは日本語です。」), guessing which Japanese text you mean, and providing a sensible list of alternatives in case the first guess isn't right. This task requires nontrivial AI because of the ambiguity inherent in roomaji - a computer needs to do machine learning stuff to do it at all, and the human needs to be able to override it when the computer guesses wrong. Anthy doesn't actually handle doing the GUI stuff of talking to the keyboard and applications. This package is sort of documented in English, but not really, and I wasn't able to figure out whether or not the name refers to the Rose Bride.
  • scim-anthy: glue code to put SCIM and Anthy together. Linked page is in Japanese. The download links are listed under 「ダウンロード」.

I also ended up downloading and trying to install, but in retrospect I think I probably didn't actually need, the following packages:

  • UIM: another general interface similar in nature to SCIM, but it ends up running as a plug-in inside SCIM. (?) May possibly be required by something else.
  • skim: a KDE-specific version of the little icon that appears in your system tray to let you control SCIM. Seems to duplicate the similar applet SCIM provides (which is technically GNOME-oriented, but seems to work fine in my KDE installation) and uses a wacky python-based build system instead of make. Also, see below.
  • skim-scim-anthy: glue code specifically for using skim with the combination of SCIM and Anthy. Doesn't compile. As a result, I couldn't use skim. One problem was that the configure script carefully overrides the PKG_CONFIG_PATH environment variable to prevent pkg-config from being able to find skim, before complaining that skim seems not to be installed, but even with that fixed it won't compile.
  • scim-qtimm: Somehow related to combining SCIM with "immodule," a KDE-specific thing that is somehow similar in nature to SCIM. This package claims to provide better system stability in the event that something like Anthy crashes, but it requires either KDE 4 or a patched version of KDE 3, and it seems not to be absolutely necessary, so I didn't attempt to go ahead with it.

Compiling all these and putting the binaries in the appropriate places was basically the straightforward configure/make/make install process we know so well. It pays to read the configure --help and turn on any options that look like they're appropriate (in particular, KDE and Qt-related things, because GNOME seems to be the default in most places) to the extent the scripts will allow.

Having installed the binaries you must make sure the appropriate software actually runs. Yukiko Bando's SCIM-Anthy HOWTO helped here; although she says that that section is Mandriva-specific, the only relevant difference for me was that I had to put the environment variables in a different config file. I used /etc/profile.d/lang.sh, which might not be the really Right place to put the settings, but works and keeps them together with the related LANG setting. Note that the "If...without scim-qtimm" warning in the HOWTO applies to my case, so the actual settings I added were:

export GTK_IM_MODULE=scim
export QT_IM_MODULE=xim
export XIM_PROGRAM="scim -d"
export XMODIFIERS=@im=SCIM

With that in place, KDE starts the tray-icon thing for SCIM automatically on startup. Other sources on the Net describe adding it to one of the various config files that tell X "start these applications every time the server starts" but that didn't appear to actually be necessary.

With this in place, typing in Japanese appears to work in all applications that handle UTF-8 - such as bash, joe, and most GUI applications. I don't know the details of how it works but it seems to involve sending your partially-typed text to the application and then sending backspaces to revise it as you switch it into Japanese script. As a result, it does NOT work with cat, pico, and other byte-oriented programs, because it'll send a single two-byte character, then a backspace, and expect that to erase the whole character. A non-wide-character-supporting application will only erase the last byte of the character, leaving you with garbage. I'm still not sure what I'll do about the pico incompatibility, since I like pine but will eventually want to send and receive Japanese-language email messages. Maybe switching pine to use joe as editor will be enough.

The user-interface mechanics of typing in Japanese are discussed a bit in the SCIM-Anthy HOWTO. It boils down to: <Ctrl-Space> to toggle Japanese input mode on or off, enter your text in roomaji and it appears in hiragana, to convert a chunk into something other than hiragana type <Space> and select what you want with the mouse or cursor keys, <Enter> to accept. There is more to it, including typing a long block and then selecting different parts of it to modify out of sequence, and it's supposed to learn the user's favourite words and phrases so you can re-enact the scene from Lucky Star in which Hiiragi tries to use Konata's computer, but that's the basic idea.

Output: CJK-LaTeX

If you want to typeset Japanese text with LaTeX, you run into the problem that TeX uses an eight bits per character assumption throughout. You can switch to some shiny new post-TeX thing (Omega, XeTeX, or LuaTeX, maybe - I don't really know any of them), or you can use the CJK package. I went for the CJK package. CJK works by splitting a thousands-of-characters font into lots of little virtual fonts and doing encoding things to switch between the virtual fonts appropriately.

Get the CJK package from its home page and follow the instructions in its doc/INSTALL file. The checklist at the start seems to be completely useless because it's written in the form of questions like "Have you frobbed the wotzit? Have you shizzled the nizzle?" with absolutely no information on what any of the terms means nor what you actually should do when, as is usual, the answer to each question is "No." Fortunately there's something better resembling proper instructions right below the checklist. The "Unix (web2c and teTeX)" section was the one applicable to me, though most of the individual instructions in it didn't apply.

Move the texinput tree from inside the CJK distribution into the latex subdirectory of your texmf, renaming it to CJK along the way. In my case that meant it became a directory called /usr/share/texmf/tex/latex/CJK/. I think this violates some kind of standard for where things are supposed to go within texmf, but I wasn't about to mess with it.

Don't touch TEXINPUTS, despite the instructions apparently saying you should - there is small print saying it's not necessary for "recent" installations, and yours probably is. Don't touch the FD files either (they will be handled later). I installed hbf2gf but suspect it's not actually needed. Skip over the stuff about Chinese TrueType fonts, Fontforge, and special.map, none of which apply (I hope to use vector fonts exclusively, see below). I compiled and installed the *conv programs from the utilities directory but suspect they are not needed. I skipped the man pages, EMACS, AucTeX, and Thai-language-related stuff. Normally I am a stickler for always installing man pages, but in this case they would be man pages for the Dark Arts of bitmap font conversion and I'd rather not bother.

Run mktexlsr so TeX can find the new files.

That seems to be all that's necessary for CJK, but see below.

Fonts for CJK-LaTeX: watanabe-DNP, wadalab

NOTE: This section describes installing the Postscript wadalab fonts. You might not actually want those. The next section describes installing the TrueType IPA fonts, which may be better; you probably don't need both.

The CJK package would allow LaTeX to use a Japanese font if you had one and had it installed in LaTeX and had all the CJK-specific glue in place to make the font, CJK, and LaTeX talk to each other, but it doesn't actually include any of those other things. So to actually get any use from CJK you have to first find one or more fonts, and then get them installed, and get the installation to work with CJK, each of which is a significant project.

If you got this far you probably already know how territorial people are about ASCII-character-set fonts, which only have about a hundred characters each. Consider how much work goes into making a multi-thousand-character Japanese font, and how the designers are going to feel about sharing that work with smelly white free software people; then compound that with the fact that any documentation you'll find from and about the Japanese publishing industry will certainly not be written in English; and consider the level of onmyoudou required to install even ASCII-character-set fonts in TeX, even when they have good English documentation.

The first-line recommendation of Japanese-language fonts for use with CJK seems to be a font called "kanji48" which is supplied only in an unfamiliar bitmap format that I have seen nowhere else, with utilities to convert it to TeX's bitmap format. I am not keen on bitmap fonts. I've had bad luck with trying to install them in TeX, and they have horrible consequences for PDF viewers. Also, I think the "48" means each character is 48 pixels high, which works out to about 300dpi for 12-point type. My printer can do a lot better than that, and it'd be worse if I set type at larger point sizes (for instance, to make flashcards, which were the first thing I wanted to typeset). So I decided I wanted vector fonts.

According to a file I found in the CJK distribution (doc/japanese/japanese.txt), there's a pretty good set of fonts made by Dai Nippon Printing. They are commercial and you can't have them. But there is a free emulation called watanabe-dnp, available from ftp.math.s.chiba-u.ac.jp. The japanese.txt file is dated 1996. When I connected to ftp.math.s.chiba-u.ac.jp, I managed to find the watanabe-dnp subdirectory (in "tex-old" instead of "tex" where the document from 1996 had said it would be) but the server wouldn't let me enter it, giving "permission denied," and I couldn't find these fonts elsewhere on the Net. It appears that in the 13 years since that file was written, Chiba University has stopped being willing to distribute these fonts to the general public, and they're probably obsolete anyway.

Instead I found something called "wadalab," which appears to be a set of free fonts that emulate the commercial DNP fonts much as the old watanabe-dnp fonts used to; and wadalab is current enough that it appears in Debian and support for it (but not the fonts themselves) is included in the CJK distribution as the only thing in the "contrib" subdirectory. The wadalab fonts are available from CTAN.

Note that wadalab is not the last word on the subject. Jan Poland's page about Japanese LaTeX claims the wadalab fonts look bad. He provides a link to a GeoCities page which makes the interesting claim that a primary reason more free Japanese fonts aren't available, is that font designers don't want their fonts to be used on porn sites. There are links to a bunch of other free fonts from the GeoCities page, which might be worth checking into, though they mostly seem to be pseudo-handwriting fonts rather than printed-text fonts. I have not attempted to install any of these yet. TrueType seems to be the standard format for such fonts and I'm not thrilled with TrueType, nor especially with trying to make TrueType fonts work from LaTeX. It is also noted on that page that most people who do word processing in Japanese use certain Microsoft fonts that come with Windows, and are no doubt readily available to anyone who wants to snag a copy, but that are not legally supposed to be used other than with Windows.

Note that there is a file in circulation called "cyberbit.ttf," which purports to be a font for pretty much all of Unicode. I think I have it somewhere on my system as a fallback for characters not covered by other fonts. (It may even be related to my outstanding kanji-browser-segfault issue.) There seems to be a popular sentiment that because of something called "Han unification," you should not use this font: there are some Unicode character numbers (code points) that need to be displayed differently depending on whether you are displaying Chinese Simplified, Chinese Traditional, or Japanese script, any one font can only be right for at most one of those three scripts, and cyberbit.ttf (it is claimed) is really right for none of them.

The wadalab fonts come in several .tar.gz files, each containing a texmf directory that obeys the same file tree layout standard I think CJK breaks; so you can just unpack the files and overlay their texmf directory structure onto your existing one to install them.

Then you need to install some FD files from contrib/wadalab in the CJK distribution to "somewhere TeX can find them." I copied all the c70*.fd files from there into texmf/tex/latex/CJK/UTF8. There are other FD files that seem not to be relevant; it looks to me like the number at the start of the name refers to the font encoding TeX will use, and the other numbers I saw were 42 and 52, which (unlike 70 for Unicode) didn't seem to match anything I already had; so I guessed 70 was closest to appropriate for my installation and it seemed to work. These files are tersely documented in contrib/wadalab/wadalab.txt in the CJK distribution.

The contrib/wadalab directory also contains a wadalab.map file, and you should know what to do with it if you've ever installed LaTeX fonts before. On my installation (this WILL likely be different for you) I ended up putting it in texmf/fonts/map/dvips/wadalab and editing texmf/web2c/updmap.cfg to add the line "Map wadalab.map" before running mktexlsr and updmap-sys as root and updmap as myself.

Note that notwithstanding implications in wadalab.txt, no font families with names like "udgj" (for "Unicode version of DNP Gothic, Japanese") seem to be created; I can only find names like "dgj" in the map files. It seems to work with Unicode anyway, though. I think there may be virtual fonts involved.

Don't try to run the makeuniwada.pl script. It is unnecessary and will only lead to tears.

At this point, if you're lucky, you should be able to edit and typeset documents like the following:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{CJK}

\begin{document}

\begin{CJK}{UTF8}{min}
   俺は、おおきい尻を好きです。嘘じゃない。
\end{CJK}

\end{document}

Font-family values to use with the CJK environment are min, shown, for Wadalab Mincho 12; goth for Wadalab Gothic 13; and maru for Wadalab Maru 8. There is also something called Wadalab Mincho 8, which seems to be a lighter version; but it appears not to be available in Unicode, and I couldn't get it to work. These are all demonstrated in the "Wadalab Sampler" document and its source code. I think the numbers refer to weight, as in some of the older Western commercial text font families, and not optical size.

More fonts: the IPA TrueType fonts

As mentioned above, Jan Poland pooh-poohs the Wadalab fonts, saying they look bad in subtle ways inaccessible to non-Japanese readers. I'm not sure whether I buy into that, but I decided to attempt installing the IPA TrueType fonts according to his instructions. Having done that, I think I do like their appearance better, but it's not an overwhelming difference. I made a small demo document showing both, for comparison.

IMPORTANT: These fonts, in my installation, only seem to work with pdflatex, not with latex+dvips. I generally prefer pdflatex anyway, but it's something to bear in mind.

The IPA fonts are made freely available by the Information-technology Promotion Agency, Japan, which seems to be an agency of the Japanese government and unrelated to the International Phonetic Alphabet. The current version is TrueType-flavoured OpenType and I don't know how to make it work with CJK. I'm using the older plain TrueType version, which I got by downloading the Common Open Printing System package according to Jan Poland's link, digging out the *.ttf files, and throwing out the rest of that 24-megabyte archive. I also grabbed the ready-made .tfm and .fd archives he provides.

The Common Open Printing System included a font file called ipagui.ttf which doesn't seem to be mentioned anywhere else. It's probably not necessary. I put the other .ttf files in texmf/fonts/truetype/IPA. I also put all the ipa*-uni??.tfm files in texmf/fonts/tfm/IPA; the .tfm files that don't include -uni in their names are for non-Unicode encodings and I didn't want them. I put all the c70*.fd files in texmf/tex/latex/IPA/UTF8.

Hilarity ensued when I tried to build the maps. Jan Poland writes about a file called ttfonts.map, which doesn't exist on my system, and the lines he says to add to it are not actually correct for my configuration, and there is another file needed that he doesn't mention at all. Last things first: the missing file is Unicode.sfd, and I found it at http://delloye.free.fr/Unicode.sfd though it also seems to be in wide distribution elsewhere. This file defines a mapping from the high bytes of Unicode 16-bit codes to the subfonts CJK uses for splitting up large fonts into byte-size chunks. It appears to actually be part of CJK, so I don't know why I didn't get it when I installed CJK. I put this file in texmf/fonts/sfd.

Something needs to find Unicode.sfd when you typeset a document that uses this technology, and for reasons I don't understand, it can't be found through TeX's normal file-finding mechanism. In order to get it to work I had to put the full path to Unicode.sfd into the lines in my map file; where Jan Poland writes @Unicode@ I had to write @/usr/share/texmf/fonts/sfd/Unicode@. I created a new map file called texmf/fonts/map/dvips/IPA/IPA.map with this content:

ipag-uni@/usr/share/texmf/fonts/sfd/Unicode@ <ipag.ttf
ipagp-uni@/usr/share/texmf/fonts/sfd/Unicode@ <ipagp.ttf
ipam-uni@/usr/share/texmf/fonts/sfd/Unicode@ <ipam.ttf
ipamp-uni@/usr/share/texmf/fonts/sfd/Unicode@ <ipamp.ttf

Then I added Map IPA.map to the end of texmf/web2c/updmap.cfg, and ran mktexlsr and updmap (locally and globally).

The result is that I can typeset documents using the IPA fonts. Please note that these are supposed to be vector fonts. If your installation attempts to burn them into bitmap fonts, something is wrong. Even if your installation is capable of burning the vectors into bitmaps successfully, and mine isn't, the result is probably not what you actually want. That was how I discovered the Unicode.sfd problem.

You can examine what kind of fonts are in any given PDF file using the pdffonts utility, which comes with xpdf. It lists the IPA fonts, in documents typeset by my installation, as "TrueType." Properly-applied PostScript fonts will usually list as "Type 1." If TeX is allowed to create bitmap fonts they will usually be embedded as "Type 3," and may look bad when documents are viewed on the screen at lower than their final resolution. "Type 1C" fonts also occur sometimes and I'm not sure of their significance.

Here's a test document for the IPA font installation:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{CJK}

\begin{document}

\begin{CJK}{UTF8}{ipag}
   昨日は近くまで希望の船が来たけど僕らを迎えに来たんじゃない
\end{CJK}

\end{document}

Font-family values for these fonts are ipag, shown, for IPA Gothic, and ipam, for IPA Mincho. They are monospace fonts, which seems to be the usual expectation for this script, but there are also proportional versions under the names ipagp and ipamp.

Yet more fonts: Anzu

Once you start down this road, there's no end to it. I decided that what with all those handwriting fonts on the GeoCities page Jan Poland linked to, I ought to have a handwriting font installed as well. And then I found a reason to typeset a few Korean words, and realized I'd need a Korean font for that. Google searching on Korean fonts led me to WAZU JAPAN'S Gallery of Korean Unicode Fonts. I notice they also have pages for both Japanese and Canadian, which look interesting, but I already have fonts for Japanese and can't read Canadian, so those aren't immediately useful to me.

I looked at the samples on the GeoCities page, and that page's author's interpretations of the licensing, and decided I wanted to install a font called Anzu. It has the cute handwritten look but seems more readable and visually balanced than some of the more stylized ones. I'd be okay with having my own handwriting in Japanese look like that (okay, maybe a little less girly in an ideal world) and wouldn't say the same for some of the others. I also liked the near-collision with my Web site's name.

The GeoCities page says to go to the designer's site, click on "Font" for a page about free fonts, click on 「◆フォントダウンロードページへ◆」 for a page about downloading the fonts, and then click on 「◆あんずもじをダウンロード◆」 to download the actual font file. That link text means "Download Anzu-script." I noticed three other similar download links and had to investigate them. They seem to point to other, slightly modified versions of the same font. All of these are LZH archives which unpack to contain TrueType font files and a few miscellaneous images and license files.

  • ◆あんずもじをダウンロード◆」 is the basic or standard version of the font. The file inside is called APJapanesefont.ttf.
  • ◆あんずもじ等幅をダウンロード◆」 is a monospace version. The extra characters in the link mean "monospace." The file inside is called APJapanesefontT.ttf. This version, unlike the others, lacks vertical metrics (for typesetting vertical text).
  • ◆あんずもじ奏をダウンロード◆」 is slightly modified: some Unicode box-drawing characters (which are dotted lines in the other versions) are replaced here by musical notation symbols. One or two dingbat-type characters are also in modified forms. The extra character in the link means "music" and the filename is APJapanesefontK.ttf.
  • ◆あんずもじ湛をダウンロード◆」 (the added character means something like "smiling") contains an alternate version of the Latin (ASCII) character set, decorated with stars, in code points FF00-FFFF. I think that code range is supposed to be alternate forms associated with CJK - full-width Latin, half-width katakana, and vertical forms of the punctuation marks that change when written vertically. So presumably if you installed this font you could get Latin with stars by entering it as full-width letters. The filename of this font is APJapanesefontF.ttf.

I chose the basic standard version. To install this font, I approximately followed the instructions on Jan Poland's page, but doing manually the task for which (with IPA, above) I had used his ready-made files. Once I had unpacked the LHZ archive, I renamed the font file to anzu.ttf because I wanted "anzu" to be the family name in LaTeX and it seems like things work best if the filename matches. I didn't have ttf2tfm installed, and CTAN only seems to carry it in the form of a DOS executable, but I found that I had an old distribution of FreeType 1.3.1 lying around, and I found ttf2tfm sources in the contrib/ttf2pk subdirectory of that. I also found Unicode.sfd there, so that's probably the authoritative source for that. You can get FreeType 1.x from the FreeType site.

I ran ttf2tfm anzu.ttf anzu-uni@/usr/share/texmf/fonts/sfd/Unicode@ (see comments in previous section about giving the path to Unicode.sfd) and it generated a bunch of anzu-uni??.tfm files. I put them in texmf/fonts/tfm/anzu. I put anzu.ttf (the former APJapanesefont.ttf) in texmf/fonts/truetype/anzu.

The .fd file required a bit of tweaking. I first wrote one following Poland's example closely, but having done that, I found there were problems with the sizes. This font makes the kana a lot smaller than the kanji, almost like lower- and upper-case letters in English, and even the kanji looked too small compared to kanji from other fonts at the same font size. So I had to hack the .fd file (which in turn required learning a bit about what such files actually mean) and I ended up with the following contents, which went into texmf/tex/latex/anzu/UTF8/c70anzu.fd:

\DeclareFontFamily{C70}{anzu}{\hyphenchar \font\m@ne}
\DeclareFontShape{C70}{anzu}{m}{n}{<-> sCJK *[1.2] anzu-uni}{}
\DeclareFontShape{C70}{anzu}{bx}{n}{<-> sCJKb *[1.2] anzu-uni}{\CJKbold}

That says: Okay, we're going to declare a font family. It'll have encoding C70, which is Unicode, and it'll be named "anzu". Its hyphen character is something meaningless. This family has a shape, which we'll call "m" for medium weight. In this shape, when we ask for any type size, generate the font for that size by calling the "sCJK" font-generating function, which generates lots of little sub-fonts according to the high byte of the Unicode code, and does not generate a warning to the user terminal. The parameters for that function will be "scale by a factor of 1.2" and "use 'anzu-uni' as the base name for the subfonts." And then do the whole shape thing again to define "bx" (for "bold, but fake") using the font-generating function "sCJKb" which does the Poor Man's Bold thing; and when that is in effect, set a flag telling the CJK package we are in bold mode. You can compare this with Jan Poland's example code: the critical difference is that I've added the 1.2 scale factor and switched to the non-warning versions of the font-generating functions. The scale factor of 1.2 was determined by trial and error with an eye to making the font at least sort of blend with my other fonts.

Finally, I had to update the font maps. I created a map file called anzu.map containing the line anzu-uni@/usr/share/texmf/fonts/sfd/Unicode@ <anzu.ttf and put it in texmf/fonts/map/dvips/anzu, then added Map anzu.map to updmap.cfg. I ran mktexlsr and updmap (both locally and globally) and it worked. The family name is "anzu"; and here's a sample document:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{CJK}

\begin{document}

\begin{CJK}{UTF8}{anzu}
   通りすがりの「bunnygirl」です。
\end{CJK}

\end{document}

Yet more fonts: Baekmuk

My first thought on a Korean font was to just grab the first one on the Gallery page, which was Adobe Myungjo. I have a theory that Korean and Japanese are actually the same language spoken with different accents - certainly when I watch Korean movies I always have the impression that I'm listening to heavily accented Japanese (unlike, for instance, Cantonese, which sounds completely different), and I'm sure many people from both countries would be happy to agree. Anyway, "Myungjo" sounds like it could be what you'd get if you said "Mincho" in Korean, and the font looks like Japanese Mincho fonts, so I figured it would blend well, and Adobe has a generally good reputation for making decent-quality fonts so it wouldn't be one of these fly-by-night free software things. So off I went to download the Korean language pack for Acrobat Reader.

One problem was licensing: you're only supposed to use that font for Acrobat Reader, not for things like typesetting your own documents in LaTeX. Not that anybody would be able to stop me, but it'd be preferable not to be breaking the rules. A second problem was that the font is in OpenType format and using it wasn't worth enough to me to be worth figuring out how to make an OpenType font work. I worked my way down the list and even though I was consciously trying not to favour entries at the top of the alphabetical list, I ended up picking the Baekmuk fonts. There are four, in this tarball. There is some documentation in both English and Korean the parent directory; the links I was given for the creator's home page and so on, do not work.

The fonts in the tarball are Baekmuk Batang, which looks kind of like Mincho; Baekmuk Gulim, which looks kind of like Gothic; Baekmuk Dotum, which seems to be in between; and Baekmuk Headline, which is a heavy-bold font and only includes hangul (the alphabet), not also hanji (the Korean equivalent of Japanese kanji) like the other three.

Installation was relatively straightforward, following the same routine as the Anzu font, described in the previous section. I ran ttf2tfm on the .ttf files, then put the resulting .tfm files in texmf/fonts/tfm/baekmuk and the .ttf files in texmf/fonts/truetype/baekmuk.

I created .fd files named c70{batang,gulim,dotum,hline}.fd in texmf/tex/latex/baekmuk/UTF8 just like with the IPA fonts (same content except with the font names changed). I created a new map file with this content, in texmf/fonts/map/dvips/baekmuk/baekmuk.map:

batang-uni@/usr/share/texmf/fonts/sfd/Unicode@ <batang.ttf
dotum-uni@/usr/share/texmf/fonts/sfd/Unicode@ <dotum.ttf
gulim-uni@/usr/share/texmf/fonts/sfd/Unicode@ <gulim.ttf
hline-uni@/usr/share/texmf/fonts/sfd/Unicode@ <hline.ttf

Then I added it to updmap.cfg and ran mktexlsr and updmap, locally and globally. Here's a sample document:

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage{CJK}

\begin{document}

\begin{CJK}{UTF8}{batang}
   거기에 손 대지 마세요!
\end{CJK}

\end{document}

It actually worked on the first try (previous sections represent the expurgated version, just the things I did that worked), so I was quite pleased. Family names for these fonts are "batang" (shown); "dotum"; "gulim"; and "hline."

I have not attempted to configre Korean input; I don't expect to need it often enough to make that worthwhile, especially since there'd be the added cognitive and configuration load of having to switch between Japanese and Korean as my "alternate" input mode.

I note that these fonts also include Japanese kana, and a lot of hanji, and so it is probably technically possible to typeset Japanese text with them. They also have Latin, so I could use them for English too and throw out all my other fonts. That is probably not a good idea. The complaints I hear about cyberbit.ttf and other "all of Unicode" fonts seem to apply here. If you type Korean-style hanji in place of Japanese-style kanji people may be able to read it but it won't be really right, and I want my installation's output to be really right.

Related pages:

Comments

No comments yet.

New comments are disabled, pending transition to new site code.
Copyright 2017 Matthew Skala
Updates to this site: [RSS syndication file]