R 1.9.0 の新機能・変更点
R の新バージョンが公開されました。結構大がかりな変更があったようですので、r-help に投稿された変更サマリーを和訳したいと思います。例により、勝手にボランティアお願いします。結構訳しにくいので、誤訳気味でも英語よりはましという人向きです。(2004.4.12)
MS Windows 版 R 固有の変更点も r-help 記事より付け加えました。(2004.4.13)
onUnload of packages/namespaces: see ?setHook.
"save.defaults", which is also used by save.image() if option "save.image.defaults" is not present.
strptime() did not validate dates correctly, so we have added extra code to do so. However, this cannot correct scanning errors in the OS's strptime (although we have been able to work around these on Windows). Some examples are now tested for during configuration.
split="" is optimized.
levels. The default is still to keep them.
component.
can be used to check that the variables supplied for prediction are of the same type as those used for fitting. (It is currently used by predict() methods for classes "lm", "mlm", "glm" and "ppr", as well as methods in packages MASS, rpart and tree.)
pointer protection stack to be set higher than the previous limit of 10000.
can be specified by a new argument `fonts' defaulting to the value of a new option "X11fonts".
installFoundDepends. These provide functionality for assessing dependencies and the availability of them (either locally or from on-line repositories).
installation and if available used to speed loading the package, so packages with namespaces should be reinstalled.
in the ... of graphics functions without a warning. It now works as expected in contour().
of R into other programs. As a result, R.app is now relocatable.
(efficiently).
plotted via stats:::plotNode called from plot.dendrogram). The diamond frames around edge labels are more nicely scaled horizontally.
default expressions for arguments. If these arguments are missing in the call, the defaults in the selected method will override a default in the generic. See ?setMethod.
--- Renamed push/pop.viewport() to push/popViewport().
--- Added upViewport(), downViewport(), and seekViewport() to allow creation and navigation of viewport tree (rather than just viewport stack).
--- Added id and id.lengths arguments to grid.polygon() to allow multiple polygons within single grid.polygon() call.
--- Added vpList(), vpStack(), vpTree(), and current.vpTree() to allow creation of viewport "bundles" that may be pushed at once (lists are pushed in parallel, stacks in series).
current.vpTree() returns the current viewport tree.
--- Added vpPath() to allow specification of viewport path in downViewport() and seekViewport().
See ?viewports for an example of its use.
NOTE: it is also possible to specify a path directly, e.g., something like "vp1::vp2", but this is only advised for interactive use (in case I decide to change the separator :: in later versions).
--- Added "just" argument to grid.layout() to allow justification of layout relative to parent viewport *IF* the layout is not the same size as the viewport. There's an example in help(grid.layout).
--- Allowed the "vp" slot in a grob to be a viewport name or a vpPath. The interpretation of these new alternatives is to call downViewport() with the name or vpPath before drawing the grob and upViewport() the appropriate amount after drawing the grob. Here's an example of the possible usage:
pushViewport(viewport(w=.5, h=.5, name="A")) grid.rect() pushViewport(viewport(w=.5, h=.5, name="B")) grid.rect(gp=gpar(col="grey")) upViewport(2) grid.rect(vp="A", gp=gpar(fill="red")) grid.rect(vp=vpPath("A", "B"), gp=gpar(fill="blue"))
--- Added engine.display.list() function. This allows the user to tell grid NOT to use the graphics engine display list and to handle ALL redraws using its own display list (including redraws after device resizes and copies).
This provides a way to avoid some of the problems with resizing a device when you have used grid.convert(), or the gridBase package, or even base functions such as legend().
There is a document discussing the use of display lists in grid on the grid web site (http://www.stat.auckland.ac.nz/~paul/grid/grid.html)
--- Changed the implementation of grob objects. They are no longer implemented as external references. They are now regular R objects which copy-by-value. This means that they can be saved/loaded like normal R objects. In order to retain some existing grob behaviour, the following changes were necessary:
+ grobs all now have a "name" slot. The grob name is used to uniquely identify a "drawn" grob (i.e., a grob on the display list). + grid.edit() and grid.pack() now take a grob name as the first argument instead of a grob. (Actually, they take a gPath - see below) + the "grobwidth" and "grobheight" units take either a grob OR a grob name (actually a gPath - see below). Only in the latter case will the unit be updated if the grob "pointed to" is modified.
In addition, the following features are now possible with grobs:
+ grobs now save()/load() like any normal R object. + many grid.*() functions now have a *Grob() counterpart. The grid.*() version is used for its side-effect of drawing something or modifying something which has been drawn; the *Grob() version is used for its return value, which is a grob. This makes it more convenient to just work with grob objects without producing any graphical output (by using the *Grob() functions). + there is a gTree object (derived from grob), which is a grob that can have children. A gTree also has a "childrenvp" slot which is a viewport which is pushed and then "up"ed before the children are drawn; this allows the children of a gTree to place themselves somewhere in the viewports specified in the childrenvp by having a vpPath in their vp slot. + there is a gPath object, which is essentially a concatenation of grob names. This is used to specify the child of (a child of ...) a gTree. + there is a new API for creating/accessing/modifying grob objects: grid.add(), grid.remove(), grid.edit(), grid.get() (and their *Grob() counterparts can be used to add, remove, edit, or extract a grob or the child of a gTree. NOTE: the new grid.edit() API is incompatible with the previous version.
--- Added stringWidth(), stringHeight(), grobWidth(), and grobHeight() convenience functions (they produce "strwidth", "strheight", "grobwidth", and "grobheight" unit objects, respectively).
--- Allowed viewports to turn off clipping altogether. Possible settings for viewport clip arg are now:
"on" = clip to the viewport (was TRUE) "inherit" = clip to whatever parent says (was FALSE) "off" = turn off clipping
Still accept logical values (and NA maps to "off")
(uniform & normal) and set.seed(1). example(*, setRNG = TRUE) does the same.
NULL' which produces a warning whenever the default values of function arguments differ between documentation and code. Note that this affects "R CMD check" as well.
check) now restores the search path after every help file.
in the loaded namespaces/packages listed in the Depends fields of the package's DESCRIPTION file when testing an installed package.
field for packages that are only used in examples or vignettes.
'Suggests' levels of dependencies.
vignetteDepends.
added.
well as CVS directories.
is a function pointer that provides access to character strings (such as the names vector) rather than assuming these are passed in.
and there is a new equivalent subroutine rchkusr for calling from FORTRAN code.
and S_realloc, since current S versions use these forms.
int, to allow for a future change.
R_DT_val() and R_DT_Cval(), a new R_D_LExp() and improved R_DT_log() and R_DT_Clog(); this improves accuracy in several [dpq]-functions {for "extreme" arguments}.
recognized as synonyms for --min-nsize and --min-vsize (which replaced them in 1.2.0).
removed: they were each identical to the default method.
equivalent to psigamma(, deriv=2) and psigamma(, deriv=3).
deprecated in 1.2.0.
--without-bzlib --without-pcre.
PCRE >= 4.0 is now required if --with-pcre is used.
The included zlib sources have been updated to 1.2.1, and this is now required if --with-zlib is used.
capabilities' as all builds of R have had them since 1.7.0.
o When lm.{w}fit() disregarded arguments in ... they reported the values and not the names.
o lm(singular.ok = FALSE) was looking for 0 rank, not rank < p.
o The substitution code for strptime in the sources no longer follows glibc in silently `correcting' invalid inputs.
1.8.1 では, > x <- 1:5 > y <- c(1,3,2,NA,5) > cor(x, y, method="kendall") [1] 0.6 バグ > cor(c(1,2,3,5), c(1,3,2,5), method="kendall") [1] 0.6666667 正しい値 1.9.0 では,以下のようになるということ > x <- 1:5 > y <- c(1,3,2,NA,5) > cor(x, y, method="kendall") Error in cor(x, y, method = "kendall") : missing observations in cov/cor > cor(x, y, method="kendall", use="complete.obs") [1] 0.6666667
> choose(10,-1) [1] 0 > choose(10,20) [1] 0
o find(simple.words=TRUE) (the default) was still using regular expressions for e.g. "+" and "*". Also, it checked the mode only of the first object matching a regular expression found in a package.
o Memory leaks in [dpq]wilcox and [dqr]signrank have been plugged. These only occurred when multiple values of m or n > 50 were used in a single call. (PR#5314, plus another potential leak.)
o Non-finite input values to eigen(), La.eigen(), svd() and La.svd() are now errors: they often caused infinite looping. (PR#5406, PR#4366, PR#3723: the fix for 3723/4366 returned a vector of NAs, not a matrix, for the eigenvectors.)
o stepfun(x,y) now gives an error when `x' has length 0 instead of an invalid result (that could lead to a segmentation fault).
o buildVignettes() uses file.remove() instead of unlink() to remove temporary files.
o methods(class = "lqs") does not produce extraneous entries anymore.
o Directly calling a method that uses NextMethod() no longer produces the erroneous error message 'function is not a closure'.
o cutree(a, h=h) silently gave wrong results when 'a' was an agnes object; now gives an error and reminds of as.hclust().
o postscript() could crash if given a font value outside the valid range 1...5.
o qchisq(1-e, .., ncp=.) did not terminate for small e. (PR#6421 (PR#875))
o contrasts() turns a logical variable into a factor. This now always has levels c("FALSE", "TRUE") even if only one (or none) of these occur in the variable.
o model.frame()'s lm and glm methods had 'data' and 'na.action' arguments which they ignored and have been removed.
o The defaults data=list() in lm() and glm() could never be used and have been removed. glm had na.action=na.fail, again never used.
o tools:::.getInternalS3generics() was omitting all the members of the S3 group generics, which also accept methods for members.
o Some BLASes were returning NA %*% 0 as 0 and some as NA. Now slower but more careful code is used if NAs are present. (PR#4582)
o package.skeleton() no longer generates invalid filenames for code and help files. Also, care is taken not to generate filenames that differ only by case.
o pairs() now respects axis graphical parameters such as cex.main, font.main and las.
o Saving images of packages with namespaces (such as mle) was not compressing the image.
o When formula.default() returned a terms object, it returned a result of class c("terms", "formula") with different subsetting rules from an object of class "formula".
o The standalone Rmath library did not build correctly on systems with inaccurate log1p.
o Specifying asp is now respected in calls like plot(1, 10, asp=1) with zero range on both axes.
o outer() called rep() with an argument the generic does not have, and discarded the class of the answer.
o object.size() now returns a real (not integer) answer and so can cope with objects occupying more than 2Gb.
o Lookups base:: and ::: were not confining their search to the named package/namespace.
o qbinom() was returning NaN for prob = 0 or 1 or size = 0 even though the result is well-defined. (In part, PR#5900.)
o par(mgp)[2] was being interpreted as relative to par(mgp)[3]. (PR#6045)
o Versioned install was broken both with and without namespaces: no R code was loaded.
o methods(), getS3method() and the registration of S3 methods in namespaces were broken if the S3 generic was converted into an S4 generic by setting an S4 method.
o Title and copyright holder of the reference manual are now in sync with the citation() command.
o The validation code for POSIXlt dates and hence seq(, by="DSTdays") now works for large mday values (not just those in -1000...1000). (PR#6212)
o The print() method for data frames now copes with data frames containing arrays (other than matrices).
o texi2dvi() and buildVignettes() use clean=FALSE as default because the option is not supported on some Solaris machines. For buildVignettes() this makes no difference as it uses an internal cleanup mechanism.
o The biplot() method for "prcomp" was not registered nor exported. (PR#6425)
o Latex conversion of .Rd files was missing newline before \end{Section} etc which occasionally gave problems, as fixed for some other \end{Foo} in 1.8.1. (PR#5645)
o Work around a glibc bug to make the %Z format usable in strftime().
o The glm method for rstandard() was wrongly scaled for cases where summary(model)$dispersion != 1.
o Calling princomp() with a covariance matrix (rather than a list) failed to predict scores rather than predict NA as intended. (PR#6452)
o termplot() is more tolerant of variables not in the data= argument. (PR#6327)
o isoreg() could segfault on monotone input sequences. (PR#6494)
o Rdconv detected \link{\url{}} only very slowly. (PR#6496)
o aov() with Error() term and no intercept incorrectly assigned terms to strata. (PR#6510)
o ftable() incorrectly handled arguments named "x". (PR#6541)
o vector(), matrix(), array() and their internal equivalents report correctly that the number of elements specified was too large (rather than reporting it as negative).
o Minor copy-paste error in example(names). (PR#6594)
o length<-() now works correctly on factors (and is now generic with a method for factors).
o x <- 232; x:(x+3) no longer generates an error (but gives a result of type "double").
o pgamma(30, 100, lower=FALSE, log=TRUE) is not quite 0, now. pgamma(x, alph) now only uses a normal approximation for alph > 1e5 instead of alph > 1000. This also improves the accuracy of ppois().
o qgamma() now does one or more final Newton steps, increasing accuracy from around 2e-8 to 3e-16 in some cases. (PR#2214). It allows values p close to 1 not returning Inf, with accuracy for 'lower=FALSE', and values close to 0 not returning 0 for 'log=TRUE'. These also apply to qchisq(), e.g., qchisq(1e-13, 4, lower=FALSE) is now finite and qchisq(1e-101, 1) is positive.
o gamma(-n) now gives NaN for all negative integers -n.
o The Unix version of browseURL() now protects the URL from the shell, for example allowing & and $ to occur in the URL.
It was incorrectly attempting to use -remote "openURL()" for unknown browsers.
o extractAIC.coxph() works around an inconsistency in the $loglik output from coxph. (PR#6646)
o stem() was running into integer overflows with nearly-constant inputs, and scaling badly for constant ones. (Partly PR#6645)
o system() under Unix was losing the 8095th char if the output was split. (PR#6624)
o plot.lm() gave incorrect results if there were zero weights. (PR#6640)
o Binary operators warned for inconsistent lengths on vector op vector operations, but not on vector op matrix ones. (PR#6633 and more.)
Comparison operators did not warn about inconsistent lengths for real vectors, but did for integer, logical and character vectors.
o spec.pgram(x, ..., pad, fast, ...) computed the periodogram with a bias (downward) whenever 'pad > 0' (non-default) or 'fast = TRUE' (default) and nextn(n) > n where n = length(x); similarly for 'df' (approximate degrees of freedom for chisq).
o dgamma(0, a) now gives Inf for a < 1 (instead of NaN), and so does dchisq(0, 2*a, ncp).
o pcauchy() is now correct in the extreme tails.
o file.copy() did not check that any existing `from' file had been truncated before appending the new contents.
o The QC files now check that their file operations succeeded.
o replicate() worked by making the supplied expression the body of an anonymous function(x), leading to a variable capture issue. Now, function(...) is used instead.
o chisq.test(simulate.p.value = TRUE) was returning slightly incorrect p values, notably p = 0 when the data gave the most extreme value.
o terms.formula(simplify = TRUE) was losing offset terms. Multiple offset terms were not being removed correctly if two of them appeared first or last in the formula. (PR#6656)
o Rd conversion to latex did not add a new line before \end{Section} in more cases than were corrected in 1.8.1.
o split.default() dropped NA levels in its internal code but returned them as NA in all components in the interpreted code for factors. (PR#6672)
o points.formula() had problems if there was a subset argument and no data argument. (PR#6652)
o as.dist() does a bit more checking of its first argument and now warns when applied to non-square matrices.
o mle() gives a more understandable error message when its 'start' argument is not ok.
o All uses of dir.create() check the return value. download.packages() checks that destdir exists and is a directory.
o Methods dispatch corrects an error that failed to find methods for classes that extend sealed classes (class unions that contain basic classes, e.g.).
o Sweave no longer wraps the output of code chunks with echo=false and results=tex in Schunk environments.
o termplot() handles models with missing data better, especially with na.action=na.exclude.
o 1:2 * 1e-100 now prints with correct number of spaces.
o Negative subscripts that were out of range or NA were not handled correctly. Mixing negative and NA subscripts is now caught as an error: it was not caught on some platforms and segfaulted on others.
o gzfile() connections had trouble at EOF when used on uncompressed file.
o The Unix version of dataentry segfaulted if the `Copy' button was used. (PR#6605)
o unlist on lists containing expressions now works (PR#5628)
o D(), deriv() and deriv3() now also can deal with gamma and lgamma.
o The X11 module can now be built against XFree86 4.4.0 headers (still with some warnings).
o seq.POSIXt(from, to, by="DSTdays") was shorter than expected for rare times in the UK time zone. (PR#4558)
o c/rbind() did not support vectors/matrices of mode "list". (PR#6702)
o summary() methods for POSIX[cl]t and Date classes coerced the number of NAs to a date on printing.
o KalmanSmooth would sometimes return NA values with NA inputs. (PR#6738)
o fligner.test() worked correctly only if data were already sorted by group levels. (PR#6739)
新規ウィンドウがMDIクライアント領域に収まるように windows() が修正された。
ユーザメニューを調べるための関数 winMenuNames() と winMenuItems() が追加された。
ヘルプメニューに www.r-project.org と CRAN の項目が追加された。 (Wishlist PR#6492)
Added "R" command to be similar to Unix invocation of scripts, e.g. "R CMD INSTALL" is the same as "Rcmd INSTALL". Rcmd still exists for backwards compatibility (and to avoid conflicts over the name `R'). All of R, R CMD and Rcmd now accept --help.
Rcmd Rd2dvi.sh でなく Rcmd Rd2dvi と指定できるようになった。
Rguiコンソールの編集メニュー、ポップアップメニューに "Paste commands only" を追加。 出力のブロックをコピーする際、 再実行のためコンソールにはコマンドのみを張り付ける。 (Code contributed by Tony Plate.)
並列 make (make -j2など) が使えるようになったが、最低384MBのメモリを持つ デュアルプロセッサ(もしかしたらhyperthreading対応)マシンでのみ便利と言えるだろう。
Installing now sorts in the C locale to ensure that a consistent sort order is used. (Some aspects of sorting used to be done in the locale of the host machine, but Perl and the cygwin-based tools used the ASCII collation order.)
長らくテストされていなかった、Windowsの .hlp ファイル作成への対応が放棄された。
K. Gotoによる高速BLASへの対応が行われた。 2.6GHz Pentium 4、1GB RAMという環境で1000 x 1000行列Aに対し以下の結果を得た。
R BLAS ATLAS Goto
A %*% A 3.7 0.65 0.56 svd(A) 16.2 7.77 6.83
高速BLASは統計処理で多く出てくる小さな行列に対してはずっと非効率的であることに注意。
指数関数に速いアセンブラコードが使われるようになった。
R自身のクロスコンパイルが再び機能するようになった。 (1.8.0 以降で壊れていた)
R CMD INSTALL/build/check が空白を含むパス名をMS-DOSの8.3形式に変換するようになった。
R CMD INSTALL が --with-package-versions によってヴァージョンを意識したインストールに対応するようになった。
(バイナリ)パッケージバンドルのインストールにおいて、パッケージと同様にMD5をチェックして報告するようになった。
INSTALL ログの末尾に "* DONE" を追加したので、 CHECK へのオプション --install が機能するようになった。 (This is a repository maintainer option; see src/scripts/check.in for docs).
R 1.8.0 で導入された高速 bmp/png/jpeg 処理コードが256色表示にも使われるようになった (as we have now been able to test it on such).
R内部の malloc 等の関数が Rm_malloc 等に変更され、 Rのオブジェクト、Wilcoxon検定他いくつかのメモリに敏感な処理のためのメモリ割り当てのみに使われるようになった。
Doug Lea の手になる改良型 malloc ルーチン(David Tellerの示唆による)のおかげで、 大きなメモリ領域(特に、OSがサポートすれば、2GB超えのメモリ)を効率的に扱えるようになった。 最初に要求されたメモリは保たれなくなったが、 この malloc ルーチンは非連続なメモリ領域も扱えるので、問題ない。
インストーラがLZMA圧縮を使うようになり、Inno Setupのヴァージョンが 4.1.5 以降である必要がある。
バイナリビルドではlibpngのヴァージョン 1.2.5 を使うようになった。
list.files() が "C:" 等のパスを正しく扱うよう修正。
Unixでの一貫性のため unlink() がファイルの空リストを受け付けるよう修正。
Rd2dvi.sh で DESCRIPTION ファイルの処理における空白文字の扱いの修正。
コマンドライン引数 "--max-mem-size" の処理の文法エラーを修正
RGui で、 ctrl-T が行の第1、2番目の文字を入れ換えることができなかった。 (PR#5593)
pipe() の先頭にあったゴミ文字を修正。 (PR#5053)
R CMD SHLIB がコマンドラインで指定されたものに限らず全てのCファイルに対する依存性を求めており、 またDLLをディレクトリ内の全ての *.o ファイルから作っていた。
メタファイルのピクセル単位のサイズがしばしば要求より1大きくなっていて、背景が塗りつくされないことがあった。 よって多めに塗るようにした。
Rproxy.dll が大きいデータを扱うときにクラッシュすることがあった。
「1970-01-01 00:00:00 (UTC) 以前の日付データは無効」とするMicrosoft社の奇妙な方針に対する応急処置 (時計がUTCより早い地域で as.POSIXct("1970-01-01 00:00:00") がエラーになっていた)
browseURL() に264文字以上のURLを渡すと起きるかも知れないセグメンテーションフォルトの回避 (確実に直ったとはまだ言えないが)。