Localization and Internationalization of R
Hints to R users in non-English speaking countries

Page Contents (Last modified 10, August, 2003 by S. Mase)

This site is based on the PukiWiki system, a Japanese variant of Wiki (Internet collaboration) system. It was established in June, 2003 by courtesy of M. Okada (Tsukuba University, Japan). Its aim is the exchange and the accumulation of Japanese informations (documents and tips) on R. It has been quite successful so far getting supports of isolated and hidden Japanese R users. Other pages are written solely in Japanese and your browser may not display it correctly.

Feedbacks and reports are welcome, inquiries and/or complains not. Since I am poor both in these wizardry skills and in technical English, please don't expect re-feedbacks or replies from me. Please do not send mails personally to E. Nakama (he prefers C language to E(nglish) language) or M. Okada. If you (from abroad) want ot make a comment on this page, please put it in the companion comment page CommentsOnI18nOfR. However, I cannot guarantee replies you hope. For Japanese comments, please use the page JapaneseCommentOnI18nOfR.

(1) Introduction

This page is for "L10N" (Localization) and "i18n" (internationalization) patches for R by E. Nakama and M. Okada. They succeeded in "making R speak Japanese". Although they are still incomplete, we hope they will give useful hints to many R users in multi-bytes character countries like Japan. Please study their patches yourself carefully if you are interested in.

It is probably necessary to explain to R users in single-byte character countries our specific difficulties in using R in Japan (as well as other multi-bytes character countries such as China, Korea, etc). In Japan, we use several completely different sets of characters simultaneously and interchangeably;

Already complex enough, isn't it? But the story does not end yet. Throughout the period of adapting computer technologies into Japan, several incompatible kinds of character coding systems assigning byte-codes to above Japanese characters were proposed and are still in use in parallel. Three main coding systems in use now are:

We have to add relatively new international codes such as UNICODE. One Japanese character is represented by 1 to 3 bytes (EUC-TW seems to have 4-bytes code partially). Further It may be necessary to note that Japanese PCs have only alphabetic keyboards which can be used also as Hira-Kana keyboards. Inputs of Japanese phrases are done first as alphabets (or Hira-Kana's) corresponding to Japanese phrases phonetically and, then, a special software called FEP (Front End Processor) translates them into final Japanese phrases. Since most alphabetic (phonetic) representations correspond to several Kanji words, it it usually necessary to choose correct ones from candidates FEP suggests.

(2) Japanese and R

Regretfully, R has not such a popularity as it should deserve in Japan now. One of main reasons, I think, is that R cannot handle Japanese. The use of Japanese of the present R is confined to:

But, as explained above, we want and have to use about three thousands of characters as object names, in file IO, and, in particular, as graphical objects such as titles. This prevents Japanese end-users from using R. Why not use English, you might ask. The reason is simple, ordinary Japanese are in general poor in English.

(3) Localization and Internationalization of R

Localization (often abbreviated as L10N) means to adapt R to a particular language such as Japanese. While internationalization (often abbreviated as i18n) means to make R capable to handle many (if not to say any) languages simultaneously. Of course, the former is much easier. Recently, many softwares such as X window systems are i18n'ed. But a full adaptation of a software such as R to this mechanism is by no means easy. The following patches are for L10N and 18n of R. Several relevant remarks are:

(4) Prerequisites

In order to use L10N'ed or i18n'ed R you have to note followings:

(5) Patches of Nakama and Okada for EUC and SJIS environments

Warnings: Following patches and resulting binaries may cause your OSes troubles potentially. They are offered with no warranty. Please note that, although they seem to work fine (still with several restrictions) in Japan so far, we never guarantee that they also work in other multi-bytes codes country. You had better consider them hints to L10N and/or i18n of R necessary in your country.

(5-1) Japanese strings

In EUC environment, there is no problem even now (at least in Japan). Whereas, in SJIS environment (e.g. Japanese MS Window case), characters having 0x5c as second bytes cannot be handled correctly. If one apply the following Japanized patch, it will become possible.

(5-2) Japanese object names and device outputs for Unix-like and EUC-* environments

First download the R source file, and Nakama and Okada's patches:

  R-1.7.1.tgz
  http://r.nakama.ne.jp/R-1.7.1/patchs/
  R.l10n.YYYYMMDD.patch
  R.l10n.PSXFIG.YYYYMMDD.patch
  R.i18n.x11_mb.YYYYMMDD.patch

The first two are integrated patches applicable both to EUC-JP case and to SJIS case. The third one is for postscript (L10N) and xfig device (I18n, as to "i18n"ed xfig, see http://wwwusr.obspm.fr/departement/demirm/xfig/japanese/i18n.html ). According to Nakama's instruction, issue following commands at an appropriate working directory where R's source directory reside. Please note they are for Unix-like OSes with EUC environment. The command rm -f src/main/gram.c is mandatory (gram.y will be used instead).

gzip -d -c | tar xvf -
cd R-1.7.1
patch -p1 < ~/R.l10n.YYYYMMDD.patch
patch -p1 < ~/R.l10n.PSXFIG.YYYYMMDD.patch
patch -p1 < ~/R.i18n.x11_mb.YYYYMMDD.patch       
rm -f src/main/gram.c
MAIN_CFLAGS="-DL10N_JP" R_BROWSER="/usr/bin/mozilla" ./configure
       

For SJIS. use the flag MAIN_CFLAGS="-DL10N_JP -DL10N_SJIS_JP" instead of MAIN_CFLAGS="-DL10N_JP" . (For SJIS, no check is done yet.)

Now you should follow the R install instruction for the rest. Because i18n of xfig is only done for Japan (ja_JP) and Korea (ko_KR) at present, the third patch may be unnecessary.

Nakama made it possible to specify available fonts used by R flexibly via a X resource file. The following is an example to use free Japanese truetype fonts called kochi-mincho and kochi-gthoic. You should change them appropriately. <R_HOME> is the full path to R's home directory, which is the value of the environment variable R_HOME if it is already set. <locale> is your present locale, which is the value of the environment variables LANG if it is already (and correctly) set.

 <R_HOME>/etc/R_X11.<locale>

For example, it is /usr/lib/R/etc/R_X11.ja_JP.eucJP in my Debain GNU/Linux. The contents of this file may be as follows. You can list as many fonts available (which X programs can use) as you like.

 *fontSet0: -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                 -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                 -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
 *fontSet1: -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                 -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                 -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
 *fontSet2: -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                 -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                 -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
 *fontSet3: -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                 -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                 -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0

Nakama believe these recipes will work also for Chinese including Taiwanese, and Korean (maybe even European languages) with appropriate changes and enables your R to display local fonts both on console and on graphical devices (with some restrictions). However, so far we have no chance to test.

Remarks:

Pleae note he cannot understand Korean.

(6) Binaries

Nakama and Okada kindly offer patched binaries of R in their web page. Since they are primarily for Japanese, they may be of little interest for R users abroad.

(6-1) RPM binary (by E. Nakama)

RPM packages. These can refer the X resource file above. R-1.7.1-1vl20.nosrc.rpm and R-1.7.1-1vl20.i386.rpm

(6-2) Debian packages (woody and sid, by M. Okada)

Debian binaries. At present, these binaries cannot refer the X resource file above and has defaults fonts.

(6-3) Japanese MS Windows binary (by M. Okada)

This binary is still at a testing stage and you should install it with your own risk.

(7) Present progress status summary by Nakama (for R-1.7.1)

Subjects \ OS*nixMS WindowsClassic MacOSMacOS X
ParserL10N 100%L10N 100%??
Regexp????
Graphical devicesi18n 80%???
Dataentry (Spreadsheet)????
POSTSCRIPT????
xfigi18n 100%(needs i18n'ed xfig)---------
PDF????

(8) Misc R-KNOPPIX

KNOPPIX is a Linux distribution in a single CDROM. Since all files are compressed, it actually contains 1.6GB of files, enough for almost full Linux environment. It is based on Debian GNU/Linux with KDE desktop. The most remarkable feature of KNOPPIX is that it is bootable. Also it can recognize hardwares of your PCs marvellously. Since it does not reside in HD, it will left nothing after shutdown. KNOPPIX is of German origin and it is Japanized by A. Suzaki.

S. Tanimura (Nagasaki University, Japan) rebuilt KNOPPIX-jp (based on knoppix_20030606-20030625) including partially Japanized R. It can be downloaded from his web page or its mirror. R-KNOPPIX make you try R using home MS Windows PCs quite easily and safely.


*1 If you are curious enough, please visit Mojikyo Institute. It offers about 110,000 free truetype fonts of Kanji and related fonts, present, past and ancient ones as well as their numerous variants in shape.

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2023-03-25 (土) 11:19:16