COLOR(red){SIZE(25){Localization and Internationalization of R}}~
SIZE(20){Hints to R users in non-English speaking countries}



SIZE(20){COLOR(blue){Page Contents}}  (Last modified 10, August, 2003 by S. Mase)
#contents

This site is based on the PukiWiki system, a Japanese variant of Wiki
(Internet collaboration) system. It was established in June, 2003 by
courtesy of M. Okada (Tsukuba University, Japan). Its aim is the
exchange and the accumulation of Japanese informations (documents and
tips) on R.  It has been quite successful so far getting supports of
isolated and hidden Japanese R users. Other pages are written solely in
Japanese and your browser may not display it correctly.


Feedbacks and reports are welcome, COLOR(blue){inquiries and/or complains not}.
Since I am poor both in these wizardry skills and in technical English,
please don't expect re-feedbacks or replies from me. Please do not send
mails personally to E. Nakama (he prefers C language to E(nglish) language)
or M. Okada. If you (from abroad) want ot make a comment on 
this page, please put it in the companion comment page [[CommentsOnI18nOfR]]. However, I cannot guarantee replies you hope. For Japanese comments, please use the page [[JapaneseCommentOnI18nOfR]].

* (1) Introduction

This page is for "COLOR(red){L10N}" (Localization) and "COLOR(red){i18n}"
(internationalization) patches for R by E. Nakama and M. Okada. They succeeded
in "COLOR(blue){making R speak Japanese}".  Although they are still incomplete,
we hope they will give useful hints to many R users in multi-bytes character
countries like Japan. Please study their patches yourself carefully if you
are interested in.

It is probably necessary to explain to R users in single-byte character countries
our specific difficulties in using R in Japan (as well as other multi-bytes
character countries such as China, Korea, etc). In Japan, we use several
completely different sets of characters simultaneously and interchangeably;
- "COLOR(blue){Roman alphabet}" characters. single-byte characters as used in English speaking countries.
- "COLOR(blue){Kata-Kana}" characters.  A series of Phonograms. About 70 kinds.
- "COLOR(blue){Hira-Kana}"  characters. Another kind of Phonograms. About 70 kinds. Apart from shapes, they are essentially equivalent to Kata-Kana characters. Why are there two kinds, you may ask. Originally, Kata-Kana was for men, and Hira-Kana for women :-)
- "COLOR(blue){Kanji}" characters. Ideograms. They were originally Chinese characters. A lot of them are still the same as those used in China now, but there are also many which are of Japanese origin. About 3,000 kinds are commonly used now (in the long history of China, Japan, and neighboring areas, at least 50,000 kinds ((If you are curious enough, please visit [[Mojikyo Institute:http://www.mojikyo.org/html/abroad/abroad_top.html]]. It offers about 110,000 free truetype fonts of Kanji and related fonts, present, past and ancient ones as well as their numerous variants in shape.)) were once used!).

Already complex enough, isn't it?  But the story does not end yet. Throughout
the period of adapting computer technologies into Japan, several incompatible
kinds of character coding systems assigning byte-codes to above Japanese
characters were proposed and are still in use in parallel.  Three main coding
systems in use now are:
- COLOR(blue){EUC-JP} codes. It was originally used in IBM mainframes and, now, is the main coding system for Unix-like OSes. Similar codes are also used in several countries (e.g. EUC-CH, EUC-TW and  EUC-KR).
- COLOR(blue){JIS} codes. The Japan Industorial Standards code,
- COLOR(blue){SJIS} (COLOR(blue){Shift-JIS}) codes. It is now mainly used in Microsoft Windows machines.

We have to add relatively new international codes such as COLOR(blue){UNICODE}.
One Japanese character is represented by 1 to 3 bytes (EUC-TW seems to have 4-bytes code partially). Further It may be necessary to note that Japanese PCs have only
alphabetic keyboards which can be used also as Hira-Kana keyboards. Inputs of
Japanese phrases are done first as alphabets (or Hira-Kana's) corresponding to
Japanese phrases phonetically and, then, a special software called FEP (Front
End Processor) translates them into final Japanese phrases. Since most
alphabetic (phonetic) representations correspond to several Kanji words, it
it usually necessary to choose correct ones from candidates FEP suggests.


*(2) Japanese and R

Regretfully, R has not such a popularity as it should deserve in Japan now.
One of main reasons, I think, is that R cannot handle Japanese. The use of
Japanese of the present R is confined to:
- It can handle Japanese character strings if terminals can display Japanese characters and have Japanese fonts installed (though I cannot understand why it is possible even now),
- Paul Murrell kindly made it possible to use hundreds of Japanese characters as well as other characters in graphic devices as graphical symbols (Hershey vector fonts).

But, as explained above, we want and have to use about three thousands of
characters as object names, in file IO, and, in particular, as graphical objects
such as titles. This prevents Japanese end-users from using R. Why not use
English, you might ask. The reason is simple, ordinary Japanese are in general
poor in English.


*(3) Localization and Internationalization of R


Localization (often abbreviated as COLOR(blue){L10N}) means to adapt R to a
particular language such as Japanese.  While internationalization (often
abbreviated as COLOR(blue){i18n}) means to make R capable to handle many
(if not to say any) languages simultaneously. Of course, the former is
much easier. Recently, many softwares such as X window systems are i18n'ed.
But a full adaptation of a software such as R to this mechanism is by no means
easy. The following patches are for L10N and 18n of R. Several relevant remarks
are:
-It is necessary to L10N and i18n of terminals (Japanese object names as well as file IO  with Japanese) .
-Graphical devices such as X11, postscript, png, etc, have to be considered separately. It is a prerequisite companion softwares such as GS and LATEX are already L10N'ed or i18n'ed.
-Different codings and OSes have to be treated separately. It seems the case of Unix-like OSes with EUC-JP code is simplest. Microsoft Windows case is more difficult. Okada starts an experiment on Mac OS, but has no reportable success now.

*(4) Prerequisites

In order to use L10N'ed or i18n'ed R you have to note followings:
- Your OS should be already capable to handle your language.
- Companion softwares such as terminals, GS and LATEX should be already L10N'ed or i18n'ed. In particular, they should be conscious of eighth-bits of character code bytes.
- Local fonts files which companion softwares can use, of course.
- Some (or a lot of) patience and knowledge about compilations and installations of softwares.

*(5) Patches of Nakama and Okada for EUC and SJIS environments

COLOR(red){Warnings}: Following patches and resulting binaries may cause your
OSes troubles potentially. They are offered with no warranty. Please note
that, although they seem to work fine (still with several restrictions) in
Japan so far, we never guarantee that they also work in other multi-bytes
codes country. You had better consider them hints to L10N and/or i18n of R
necessary in your country.

**(5-1) Japanese strings

In EUC environment, there is no problem even now (at least in Japan). Whereas,
in SJIS environment (e.g. Japanese MS Window case), characters having 0x5c as
second bytes cannot be handled correctly. If one apply the following Japanized
patch, it will become possible.

**(5-2) Japanese object names and device outputs for Unix-like and EUC-* environments

First download the R source file, and Nakama and Okada's patches:

   R-1.7.1.tgz
   http://www.nakama.ne.jp/memo/cran-R/l10n/R.l10n.20030708.patch
   http://www.nakama.ne.jp/memo/cran-R/l10n/R.i18n.x11_mb.20030715.patch
   http://www.nakama.ne.jp/memo/cran-R/l10n/R.l10n.PSXFIG.20030808.patch
   http://r.nakama.ne.jp/R-1.7.1/patchs/
   R.l10n.YYYYMMDD.patch
   R.l10n.PSXFIG.YYYYMMDD.patch
   R.i18n.x11_mb.YYYYMMDD.patch

The first two are integrated patches applicable both to EUC-JP case and to
SJIS case. The third one is for postscript (L10N) and xfig device (I18n, as to "i18n"ed xfig, see http://wwwusr.obspm.fr/departement/demirm/xfig/japanese/i18n.html ).
According to Nakama's instruction, issue following commands at an
appropriate working directory where R's source directory reside.
Please note they are for Unix-like OSes with EUC environment.
The command COLOR(red){rm -f src/main/gram.c} is mandatory (COLOR(red){gram.y} 
will be used instead).

//  gzip -d -c R-1.7.1.tgz | tar xvf -
//  cd R-1.7.1
//  patch -p1 < ~/R.l10n.20030708.patch
//  patch -p1 < ~/R.i18n.x11_mb.20030715.patch
//  patch -p1  < ~/R.i18n.xfig.20030725.patch
//  rm -f src/main/gram.c
//  X_CFLAGS="-DI18N_MB" MAIN_CFLAGS="-DL10N_JP" //R_BROWSER="/usr/bin/mozilla" ./configure

 gzip -d -c | tar xvf -
 cd R-1.7.1
 patch -p1 < ~/R.l10n.20030808.patch
 patch -p1 < ~/R.i18n.x11_mb.20030715.patch       
 patch -p1 < ~/R.l10n.PSXFIG.20030808.patch
 patch -p1 < ~/R.l10n.YYYYMMDD.patch
 patch -p1 < ~/R.l10n.PSXFIG.YYYYMMDD.patch
 patch -p1 < ~/R.i18n.x11_mb.YYYYMMDD.patch       
 rm -f src/main/gram.c
 MAIN_CFLAGS="-DL10N_JP" R_BROWSER="/usr/bin/mozilla" ./configure
        
For SJIS. use  the flag MAIN_CFLAGS="-DL10N_JP -DL10N_SJIS_JP" instead of 
MAIN_CFLAGS="-DL10N_JP" . (For SJIS, no check is done yet.)

Now you should follow the R install instruction for the rest.
Because i18n of xfig is only done for Japan (ja_JP) and Korea (ko_KR) at present, 
the third patch may be unnecessary.

Nakama made it possible to specify available fonts used by R flexibly via a 
X resource file. The following is an example to use
free Japanese truetype fonts called COLOR(blue){kochi-mincho} and
COLOR(blue){kochi-gthoic}. You should change them appropriately.
COLOR(blue){<R_HOME>} is the full path to R's home directory, which is the value
of the environment variable COLOR(blue){R_HOME} if it is already set.
COLOR(blue){<locale>} is your present locale, which is the value of the environment
variables COLOR(blue){LANG} if it is already (and correctly) set.

  <R_HOME>/etc/R_X11.<locale>

For example, it is COLOR(blue){/usr/lib/R/etc/R_X11.ja_JP.eucJP} in my Debain GNU/Linux.
The contents of this file may be as follows. You can list as many fonts available 
(which X programs can use)  as you like.

  *fontSet0: -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                  -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                  -kochi-kochi gothic-medium-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
  *fontSet1: -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                  -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                  -kochi-kochi gothic-bold-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
  *fontSet2: -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                  -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                  -kochi-kochi mincho-medium-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0
  *fontSet3: -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-iso8859-1, \
                  -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-jisx0201.1976-0, \
                  -kochi-kochi mincho-bold-r-*-*-%d-*-*-*-*-*-jisx0208.1983-0

Nakama believe these recipes will work also for Chinese including Taiwanese, and Korean (maybe even European languages) with appropriate changes and enables 
your R to display local fonts both on console and on graphical devices
(with some restrictions). However, so far we have no chance to test.

Remarks:
- Postscript device cannot show strings containing both single- and multi-byte characters properly. Nakama commented a full i18n of Postscript device will be extremely difficult. If you cannot get a satisfactory postscript output, try png device. You can coerce it into a postscript file afterwards if necessary using an apropriate tool such as Imagemagik.  It works fine at least for me.
- Pictex device is only L10N'ed.
-COLOR(blue){Korean R users} can get hints from [[Nakma's web page:http://www.nakama.ne.jp/memo/cran-R/i18n/ko_KR/]].  

//-COLOR(blue){Russian R users} can get hints from [[Nakama's web //page:http://www.nakama.ne.jp/memo/cran-R/i18n/ru_RU.KOI8-R/]]. 
Pleae note he cannot understand Korean.
-Following troubles were reported as side effects of patches:
--X fonts fixed-bold-r, fixed-medium, fixed-bold-o for plotting cannot be used. Please add corresponding fonts to the above resource file.
--symbol fonts (adobe symbol fonts used in plotmath) cannot be used. The newest patch of Nakama can show this symbol fonts.

*(6) Binaries

Nakama and Okada kindly offer patched binaries of R in their web page. 
Since they are primarily for Japanese, they may be of little interest for 
R users abroad.

**(6-1) RPM binary (by E. Nakama)

RPM packages. These can refer the X resource file above.
[[R-1.7.1-1vl12.nosrc.rpm:http://www.nakama.ne.jp/memo/cran-R/l10n/R-1.7.1-1vl12.nosrc.rpm]]
and [[R-1.7.1-1vl12.i386.rpm:http://www.nakama.ne.jp/memo/cran-R/l10n/R-1.7.1-1vl12.i386.rpm]]
[[R-1.7.1-1vl20.nosrc.rpm and R-1.7.1-1vl20.i386.rpm:http://r.nakama.ne.jp/R-1.7.1/binary/Vine/]]

**(6-2) Debian packages (woody and sid, by M. Okada)

[[Debian binaries:http://www.okada.jp.org/~mokada/R/]]. At present,
these binaries cannot refer the X resource file above and has defaults fonts.

**(6-3) Japanese MS Windows binary (by M. Okada)

- Integrated patch [[R-1.7.1-windows-Japanese.patch:http://www.okada.jp.org/RWiki/index.php?plugin=attach&openfile=R-1.7.1-windows-Japanese.patch&refer=%5B%5BWindows%C8%C7%C6%FC%CB%DC%B8%EC%A5%B0%A5%E9%A5%D5%A5%D1%A5%C3%A5%C1%5D%5D]] for Japanese MS Windows including all presently available patches.
- Japanized R binary [[Rdll-1.7.1-jVar-jPIC-jPS-jGraph.lzh:http://www.okada.jp.org/~mokada/R/Rdll-1.7.1-jVar-jPIC-jPS-jGraph.lzh]] for Japanized MS Windows. R binary for Japanese MS Windows including all presently available patches. They are compressed using COLOR(blue){lha} program which is a commonly used free archiver of Japan. You can get the Windows binary from Internet (use google with keywords lha or LHarc). After melting this lzh file, you can get the executable binary COLOR(red){R.dll}. Replace it with the original R.dll in COLOR(blue){bin} directory of R's home directory of your PC. 
- You should have already installed the official R-1.7.1. The version of this R.dll (that is, 1.7.1) should be the same with your preintalled R.dll.
- If your Japanized R cannot display Japanese correctly, replace the file COLOR(red){etc\Rdevga} in the R's home directory by [[Rdevga-JapaneseFont.txt:http://www.okada.jp.org/RWiki/index.php?plugin=attach&openfile=Rdevga-JapaneseFont.txt&refer=%5B%5BWindows%C8%C7%C6%FC%CB%DC%B8%EC%A5%B0%A5%E9%A5%D5%A5%D1%A5%C3%A5%C1%5D%5D]].

This binary is still at a testing stage and you should install it with your
own risk.

*(7) Present progress status summary by Nakama (for R-1.7.1)

|Subjects \ OS | *nix | MS Windows | Classic MacOS | MacOS X |
|Parser               |L10N 100%  |L10N 100%  |?            |?      |
|Regexp              |?          |?          |?            |?      |
|Graphical devices        |i18n 80%   |?          |?            |?      |
|Dataentry (Spreadsheet)     |?          |?                |?             |?             |
|POSTSCRIPT    |?           |?                |?            |?             |
|xfig          |i18n 100%(needs i18n'ed xfig)| --- | ---      | ---      |
|PDF          |?               |?               |?            |?      |


Remark: E. Nakama will stop his voluntary work till the release of R.1.8.0.


*(8)  Misc R-KNOPPIX



KNOPPIX is a Linux distribution
in a single CDROM. Since all files are compressed, it actually contains
1.6GB of files, enough for almost full Linux environment. It is based on
Debian GNU/Linux with KDE desktop. The most remarkable feature of KNOPPIX 
is that it is bootable. Also it can recognize hardwares of your PCs marvellously.
Since it does not reside in HD, it will left nothing after shutdown.
[[KNOPPIX:http://www.knopper.net/knoppix/]] is of German origin and it is
Japanized by [[A. Suzaki:http://unit.aist.go.jp/it/knoppix/]].

S. Tanimura (Nagasaki University, Japan) rebuilt COLOR(blue){KNOPPIX-jp}
(based on knoppix_20030606-20030625) including partially Japanized R.  It can be downloaded from 
[[his web page:http://shakan2.tm.nagasaki-u.ac.jp/~umusus/R-KNOPPIX/]] or its
[[mirror:http://epidemiology.md.tsukuba.ac.jp/~mokada/R/]].
R-KNOPPIX make you try R using home MS Windows PCs quite easily and safely.

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS