P_Johnson_tips_2 の変更点

追加された行はこの色です。
削除された行はこの色です。
P_Johnson_tips_2 へ行く。
P_Johnson_tips_2 の差分を削除
COLOR(red){SIZE(25){Paul Johnson 氏の R Tips 集訳 2/2}}

前半 [[P_Johnson_tips_1]]へ移動~

2003/08/09  現在~

注：これは P. Johnson さんが r-help の記事から役にたちそうな箇所を抜き出したものです。オリジナルは Tips 集参照。様々な話題があり、参考になります。とりあえず各見出しだけでも訳して見当を付けやすくしたいと考えています。例によりお手伝いお願いします。お勧めの記事があればその訳も付けたいですね。整形だけでもお願いします。(なお元そのものが r-help 記事の抄約ですから、著作権問題はクリアされていると考えますが？)~
注：あまりに長くて読み込みに時間がかかるので、二つに分割しました。
----

#contents
~
//////////////////////////////////////
*6 通常の統計的雑用 [#yf1eac7e]
//////////////////////////////////////

**6.1 表集計 [#o14131c4]
(12/02/2002 Paul Johnson <pauljohn@ku.edu>)

クロス集計機能 xtabs() は R1.2 で取り入れられた。これについてはヘルプですべてがみられる。また、 table というものもある。
//A crosstab facility xtabs() was introduced in R1.2. Read all about it in help. table is there too.

これらの関数ではカウントは得られるが、パーセンテージは得られない。パーセンテージを得るには特別な関数がある。以下にテーブルの作成と "prop.table" を呼び出して、列についてのパーセンテージを得るコードを示す。
//These give counts, not percentages. There are special functions for that. Here's code that creates a table and then calls "prop.table" to get percents on the columns:
 > x <- c(1,3,1,3,1,3,1,3,4,4) 
 > y <- c(2,4,1,4,2,4,1,4,2,4) 
 > hmm <- table(x,y)
 > prop.table(hmm,2) * 100 

列の合計が欲しければ、それ用に "margin.table()" 関数があるが、以下のように手動で合計するのとまったく同じことである。
//If you want to sum the column, there is a function "margin.table()" for that, but it is just the same as doing a sum manually, as in:

 apply(hmm, 2, sum)

これで列の合計が得られる。
//which gives the counts of the columns.
古い Rを持っていれば、以下の "do it yourself" クロス集計がある。
//If you have an old R, here is a "do it yourself" crosstab

# 訳注：数値列の後の空行は必須
 count <- scan()
 65 130 67 54 76 48 100 111 62 34 141 130 47 116 105 100 191 104

 s <- c("Lo", "Med", "Hi")
 satisfaction <- factor( rep(c(s,s), rep(3,6)),  levels = c("Lo","Med","Hi"))
 contact <- factor( rep(c("Low","High"), rep(9,2)), levels = c("Low","High"))
 r <- c("Tower", "Flat", "House")
 residence <- factor(rep(r,6))
 tapply(count,list(satisfaction,contact,residence),sum)


**6.2 t 検定 [#j715558f]
(18/07/2001 Paul Johnson <pauljohn@ukans.edu>)

以下のようにグループごとにベクトルにインデックスを付ける。
//Index the vector by group:

 t.test(x[sex == 1], x[sex == 2])

これで効果覿面のはずである。次に以下を試してみる。
//should do the trick. You could try:

 summary(aov(sex~group))

これは上と同じものであるが、分散が等しいことを仮定している。(Mark Myatt より)

//which is equivalent but assumes equal variances. (from Mark Myatt)

検出力は以下のように計算できる。
//The power of the test can be calculated:

 power.t.test()

help(power.t.test) を参照。
//see help(power.t.test)

**6.3 正規性の検定 [#s45f07a9]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

x が正規分布からのものかどうかの検定には shapiro.test(x) を使う。X はベクトルまたは変数でなくてはならない。
//Use shapiro.test(x) for testing whether x comes from a normal distribution. X must be a vector or variable.

**6.4 データの一部分の解析(サブグループの要約もしくはモデル化) [#ff5fcc43]
(06/04/2001 ? <?>)

質問：　サブグループごとの要約情報が欲しい。
//Question: I want summary information by subgroup.

答え： tapply, または tapply の「ラッパー」（簡略化された拡張という意味）である by() 関数でできる。
//Answer: This can be done with tapply, or the by() function, which is a "wrapper" (that means "simplifying enhancement") of tapply.

このように使う。factor A で定義されたグループがあるなら、データフレーム "fred" の変数 x から変数 z までの要約が得られる。
//Use like so. If you have groups defined by a factor A, to get summaries of variables x through z in dataframe "fred",

 > attach(fred)
 > by(fred[,x:z], A, summary)

グループの多次元セットに対して factor のリストを作成することができ、多くの異なる関数が "summary" の位置に入れられる。"by"のドキュメンテーションを参照のこと。
//You can make a list of factors for a multidimensional set of groups, and many different functions can be put in place of "summary". See the by documentation.

Before by, we did it like this, and you may need it for some reason.

3 つの factor の値の組み合わせによるサブグループごとに，変数 "rt" の平均値を求めるためには，
//Get the average of a variable "rt" for each subgroup created by combinations of values for 3 factors.

 tapply(rt,list(zf, xf, yf), mean, na.rm=T)

代替法としては，
//alternatively,

 ave(rt, zf, xf, yf, FUN=function(x)mean(x,na.rm=T))

(from Peter Dalgaard)


/////////////////////////////////////////////////////////
*7 モデルの当てはめ (回帰タイプの事柄) [#g0e0e271]
/////////////////////////////////////////////////////////

**7.1 回帰モデルの指定のチップス [#y2e9a424]
(12/02/2002 Paul Johnson <pauljohn@ku.edu>)

1. 以下よりも
//1. Rather than

 lm(dataframe$varY~dataframe$varX)
これを実行するように
//, do

 lm(varY~varX,data=dataframe)

2. タイムラグ変数を使うモデルの場合には，普通のベクトル変数に対する組込関数はないということを覚えておこう。ts パッケージの lag 関数は期待するのとは違うことをしてくれる。そいうわけで，以下を検討しよう。
//2. To fit a model with a "lagged" variable, note thereis no built in lag function for ordinary vectors. lag in the ts packagedoes a different thing. So consider:

 # x を d だけ後ろにずらす
 # x の先頭に d 個の NA を付加し，最後の d 個を取り除く
 mylag <- function(x, d = 1) {
    n <- length(x)
    c(rep(NA,d),x)[1:n]
 }
 lm(y ~ mylag(x))

**7.2 要約メソッド。出力オブジェクトから結果を取り出す [#vc2c8ab2]
(17/02/2002 Paul Johnson <pauljohn@ku.edu>)

Brian Ripley: 慣例により、 summary メソッドは print.summary.foo クラスのオブジェクトを新規に作成して、自動的にプリントされる。
//Brian Ripley: By convention summary methods create a new object of class print.summary.foo which is then auto-printed.

基本的に、結果になにがあるのかを探し、それをつかみ出す。あてはめられたモデルが "fm3DNase1"だとすれば:
//Basically, you find out what's in a result, then grab it. Suppose a fitted model is "fm3DNase1":

 > names(summary(fm3DNase1))
 [1] "formula"      "residuals"    "sigma"        "df"           "cov.unscaled"
 [6] "correlation"  "parameters"  
 > summary(fm3DNase1)$sigma[1] 0.01919449
 > summary(fm3DNase1)$parameters
 Estimate Std. Error t value Pr(>|t|)
 Asym 2.345179 0.07815378 30.00724 2.164935e-13
 xmid 1.483089 0.08135307 18.23027 1.218523e-10
 scal 1.041454 0.03227078 32.27237 8.504308e-14

標準誤差については以下のようになる
// for the standard errors:

 > summary(fm3DNase1)$parameters[,2]
 Asym xmid scal
 0.07815378 0.08135307 0.03227078

(from Rashid Nassar)

t 値は
//The t-values are

 summary(fm3DNase1)$coefficients[, "t value"]

そして標準誤差は
//and the standard errors are

 summary(fm3DNase1)$coefficients[, "Std. Error"]

また，推定値の共分散行列を得るためには，以下のようにする。
//You can also use

 vcov(fm3DNase1)

//to get the estimated covaraince matrix of the estimates.

**7.3 特定の因子の各水準毎に係数を計算 [#wd0f4128]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

grp はファクターである。それで
grp がファクターとすると，
//grp is a factor. THen:

 > lm(y ~ grp:x, data=foo)
 Call: lm(formula = y ~ grp:x, data = foo)
 Coefficients: (Intercept) grpa.x grpb.x grpc.x -0.80977 0.01847 0.13124 0.14259   
 Call:
 lm(formula = y ~ grp:x, data = foo)
 Coefficients:
 (Intercept)   grpa.x   grpb.x   grpc.x
    -0.80977  0.01847  0.13124  0.14259   

Martyn

**7.4 回帰モデルの当てはめの比較(係数の一部がゼロかどうかの F 検定) [#f093e3d6]
(14/08/2000 Paul E Johnson <pauljohn@ukans.edu>)

  summary(lm.sml <- lm(y ~ x1))
  summary(lm.big <- lm(y ~ x1 + x2 + x3 + x4))
  anova(lm.sml, lm.big)

さて、 b2 = b3 = b4 = 0 のテストをしよう。
//now will do a test of b2 = b3 = b4 = 0

以下を見てみよう
//Look at

 demo(lm.glm)

where models "l1" and "l0" are compared that way. (from Martin Maechler)

Here's another one along the same lines. Bill Venables (Feb.2,00) said the philosophy is "You fit a larger model that contains the given model as a special case and test one within the other." "Suppose you wonder if a variable x Suppose you have only one predictor with repeated values, say x, and you are testing a simple linear regression model. Then you can do the test using

 inner.mod <- lm(y ~ x, dat)
 outer.mod <- lm(y ~ factor(x), dat)
 anova(inner.mod, outer.mod)

Test for lack of fit done. If you have several predictors defining the repeated combinations all you need do is paste them together, for example, and make a factor from that. (from Bill Venables)

Suppose your data frame x has some column, say, ID, which identifies the various cases, and you fitted

 fit1 <- lm(y ~ rhs, data=df)

ここで以下を実行する
//Now do

 fit2 <- lm(y ~ factor(ID), data=df)
 anova(fit1, fit2, test="F")

例を挙げると
//e.g.

 set.seed(123)
 df <- data.frame(x = rnorm(10), ID=1:10)[rep(1:10, 1+rpois(10, 3)), ]
 df$y <- 3*df$x+rnorm(nrow(df))
 fit1 <- lm(y ~ x, data=df)
 fit2 <- lm(y ~ factor(ID), data=df)
 anova(fit1, fit2, test="F")

CRAN にある lmtest というパッケージは線形モデルに関するたくさんの検定をサポートする(Brian Ripley より)

//there is an R package lmtest on CRAN, which is full of tests for linear models. (from Brian Ripley)

The models with factor() as the indep variable are the most general because they individually fit each category, whereas using the variable X assumes linearity (from me!)

Here's another example. "I want to know how much variance of pre/post-changes in relationship satisfaction can be explained by changes in iv1 and iv2 (dv ~ iv1 + iv2), by changes in iv3 and iv4 (dv ~ iv3 + iv4) and I want to know wether a model including all of the iv's explains a significant amount of variance over these two models (dv ~ iv1 + iv2 + iv3 + iv4)."

 model.0  <- lm(dv ~ 1)
 model.A  <- lm(dv ~ iv1 + iv2)
 model.B  <- lm(dv ~ iv3 + iv4)
 model.AB <- lm(dv ~ iv1 + iv2 + iv3 + iv4)
 anova(model.AB, model.A, model.0) 
 anova(model.AB, model.B, model.0)

(from Ragnar Beer)

**7.5 predict() を用いてモデルから予測値を得る [#y2762494]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

 > lot <- c(30,20,60,80,40,50,60,30,70,60)
 > hours <- c(73,50,128,170,87,108,135,69,148,132)
 > z1 <- lm(hours~lot)
 > new <- data.frame(lot=80)
 > predict(z1,new,interval="confidence",level=0.90)
      fit      lwr      upr
 [1,] 170 166.9245 173.0755

注意してもらいたいのは、このコマンドの "confidence" と "prediction" とには違いがあるということである。一方が推定値の信頼区間であり、他方は個々のケースの予測の信頼区間である。私はこう考えている！
//Please note, there is a difference between "confidence" and "prediction" in this command. One is the confidence interval of the point estimate, the other is the confidence interval for a prediction about an individual case. I "think"!

**7.6 多項式回帰 [#m457dc86]
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

I() 関数を使って、 ^2 と ^3 が式の一部として評価されないように保護しなければならない。以下に例を示す。 
//Use the I() function to protect the ^2 and ^3 from being evaluated as part of the formula, ie.

 formula = Response ~ Var1 + I(Var1^2) + I(Var1^3)

(from Douglas Bates)

昨日これを演習クラスでおこなった。以下の方法もある。
//I was doing this in a practical class yesterday. There is another way,

 Response ~ poly(Var1, 3)

これは，同じく3次のモデルに直交多項式を使って当てはめる。あてはめられた値（予測値）は同じになるだろう。しかし，直交多項式を使って得られる係数は polynomial によるものとは違うだろう。両方の手法を比較したいと思うだろう。
//which fits the same cubic model by using orthogonal polynomials. The fitted values will be the same, but the coefficients are those of the orthogonal polynomials, not the terms in the polynomial. You might like to compare the two approaches. (from Brian D. Ripley)

**7.7 回帰式から F 統計量の "p" 値を計算 [#z2596868]
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

a$fstatistic は実際のＦテストと自由度を返すがＰ値は返さない。線形モデルを得て、出力のＦの部分を「掴み」、Ｆ分布 (pf)を使用する。
//a$fstatistic returns the actual f-test and df, but not the p-value. Get a linear model, "grab" the f part of the output, use the f distribution (pf).

 foo <- lm(y~x)
 a <- summary(foo)
 f.stat <- a$fstatistic
 p.value <- 1-pf(f.stat["value"],f.stat["numdf"],f.stat["dendf"])

(from Guido Masarotto)

**7.8 段階的回帰/分散分析における当てはめの比較 (F 検定) [#b7a6d3cc]
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

まず，以下のようにして始めよう
//Begin with:

 data(stackloss)
 fm <- lm(stack.loss ~ ., data=stackloss)
 anova(fm)
 fm1 <- lm(stack.loss ~ ., data=stackloss)
 anova(fm1)

もう少しモデルを追加する
//Add more models:

 fm2 <- update(fm1, . ~ . - Water.Temp)
 fm3 <- update(fm2, . ~ . - Air.Flow)
 anova(fm2, fm1)
 anova(fm3, fm2)

Note that the SSqs are all the same, but the sequential table compares them to the residual MSq, not to the next-larger model (from Brian D. Ripley)

**7.9 傾きと切片シフトの有意性検定 (Chow test?) [#vfd03129]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

Mark M. Span asked this: I have a four-parameter, biexponential model. What I want to evaluate is whether the parameter values differ between (groups of) tasks.

 
 myfunc    <-formula(x ~ a*exp(-b*age) + (c*exp(d*age)) )

both x (representing response time) and age are variables in the current scope. At the moment I fit the model for each task seperately. But now I cannot test the parameters for equality.

Answer: Use a model parametrizing a,b,c by groups and one without, and use anova to compare the models. A worked example is in V&R3, page 249. (from Brian Ripley)

**7.10 非線形モデルを推定したい? [#v8de9ebf]
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

nls ライブラリを使う
//Use the nls library

      library(nls)
        ... get data
        model.fit <- nls(y ~ exp(x))
        plot(model.fit)


**7.11 疑似ファミリーとそれへの引数の渡しかた [#b2718710]
(12/11/2002 Paul Johnson <pauljohn@ku.edu>)

(from Halvorsen) 以下の簡単な関数がある。
//(from Halvorsen) I have the following simple function:

 testcar <- function(pow) {      
   ob <-glm(Pound~CG+Age+Vage, 
                 data=car,weights=No,subset=No>0,family=quasi(link=power(pow),
                 var=mu^2))           
   deviance(ob) 
 }

これを実行しようとすると、以下のようになる。
//But trying to run it gives: 
 > testcar(1/2)
 Error in power(pow) : Object "pow" not found

もっと信頼できるバージョン（代替版）は以下のとおり
//A more reliable version (replacement) is

 eval(substitute(glm(Pound~CG+Age+Vage,data=car,
   weights=No subset=No>0,  family=quasi(link=power(pow),var=mu^2)),
   list(pow=pow)))

(from Thomas Lumley)

**7.12 共分散行列の計算 [#ybbbf2dc]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

 var(X)

さもなければ、あてはめられたモデルの共分散行列がほしければ、上で述べたで summary() 結果を取ることをわすれないように。
//or, if you want the covariance matrix of a fitted model, don't forget you can take the results from summary() as described above.:

また MASS パッケージはジェネリック関数  vcov をもっており、係数の推定値より分散共分散値を引き出せることに注意すると、以下のようにもできる。
//Also note package MASS has a generic function vcov to pull out variance-covariance estimates of coefficient estimates, so you can do

 > vcov(fm3DNase1)
          Asym      xmid      scal
 Asym 0.006108 0.0062740 0.0022720
 xmid 0.006274 0.0066183 0.0023794
 scal 0.002272 0.0023794 0.0010414

この関数はジェネリックなので、ほとんどのあてはめられたモデルで動く。
//too. As it's generic, it works for most fitted model objects.

**7.13 二項分布変数に対する信頼区間の推定 [#c39e7b53]
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

R では以下が利用できる。
//In R, I can use

 library(ctest)
 binom.test(15, 20)

これで信頼区間が得られるが、これは母集団が無限であるという仮定の下である。 (from RINNER Heinrich)
//to get a confidence interval, but under the assumption of an infinite population. (from RINNER Heinrich)

母集団が有限であるものについて修正された推定を得るには、stubs は phyper の中に入れる必要がある。
有限母集団の修正をした推定を得るには、phyper 関数を使ってちょっとしたことをやる必要がある。
//母集団が有限であるものについて修正された推定を得るには、stubs は phyper の中に入れる必要がある。
//To get an estimate corrected for a finite population, the stubs must be in phyper:

 uniroot(function(x)phyper(15,x,100-x,20)-0.025,,0,100)$root
 uniroot(function(x)phyper(14,x,100-x,20)-0.975,,0,100)$root

(from Peter Dalgaard)

**7.14 推定モデルから標準誤差行列を保存する [#w60ff1e0]
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

More generally, the vcov function in the MASS package extracts If you need variance-covariance matrices, try the MASS library's vcov function. Or, to just save the diagnoal "standard errors" of coefficients, do

 library(MASS) 
 stdErrVector<-sqrt(diag(vcov(lm.D9)))

Or, the hard way,

 > example(lm)
 > summary(lm.D9)
 > names(.Last.value)
 > summary(lm.D9)$coeff
 > summary(lm.D9)$coeff[, "Std. Error"]


**7.15 出力中の有効桁数を制御する [#f2ad184c]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

 > summary(c(123456,1,2,3), digits=7) 
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
1.00 1.75 2.50 30865.50 30866.25 123456.00
 Min.  1st Qu.   Median       Mean    3rd Qu.        Max. 
 1.00     1.75     2.50   30865.50   30866.25   123456.00

これでほしいものを得た。たいていはRのサマリメソッドが出力する「有効桁数」が３桁または４桁で切り捨てられる。(from Robert Gentleman)
これでほしいものを得た。たいていはRの summary メソッドが出力する「有効桁数」は３桁または４桁で切り捨てられる。(from Robert Gentleman)
//and you get what you want. In most cases the number of "significant digits" printed out by summary methods in R is truncated to 3 or 4. (from Robert Gentleman)

**7.16 分散分析と "アンバランス" もしくは欠損データ [#l2b9aeb1]
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

Question: In a split-plot design if there is a missing subunit summary gives me a table with two rows for the same factor, one in the error within section and one in the section using error between units. With no data missing the table is "normal". How does one interpret the table when data is missing?, or is it that aov cannot cope with missing values in this case?

Answer (from Brian Ripley): aov() does cope, but the conventional analysis is indeed as you describe. Factors do have effects in more than one stratum. For example, consider the oats example in Venables & Ripley (1999). If I omit observation 70, I get

 > summary(oats.aov)
 Error: B 
          Df Sum of Sq  Mean Sq   F Value     Pr(F) 
 Nf         1    841.89  841.890 0.2242563 0.6605042
 Residuals  4  15016.57 3754.142                    
 Error: V %in% B 
         Df Sum of Sq Mean Sq F Value Pr(F) 
 Nf 1 1100.458 1100.458 1.747539 0.2187982 
 V 2 1156.821 578.410 0.918521 0.4335036 
 Residuals 9 5667.471 629.719
 Error: Within
 Df Sum of Sq Mean Sq F Value Pr(F) 
 Nf 3 19921.29 6640.431 37.23252 0.0000000 
 Nf:V 6 408.96 68.160 0.38217 0.8864622 
 Residuals 44 7847.41 178.350

so Nf appears in all three strata, not just the last one. The `recovery of intra-block information is needed'.

If you have an unbalanced layout (e.g. with a missing value) use lme to fit a model. V&R do this example in lme too.

**7.17 多重分散分析 [#med76a34]
(06/09/2000 Paul Johnson <pauljohn@ukans.edu>)

 > aov(cbind(y1,y2,y3)~x)

**7.18 分散の均一性の検定 [#u0b47e22]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

Bartlett の検定が（個人的には）好きだ。SPSS は今 Levine の検定を使用しているが、ある人に言わせれば、これは不正確で、頑健でない。R は Fligner-Killeen (メディアン)検定を、 ctest の fligner.test() で提供している。 
//Bartlett's test, I like (personally). SPSS now uses Levine's test, some say incorrectly and nonrobustly. R's offers the Fligner-Killeen (median) test, fligner.test() in ctest.

The test "does an anova on a modified response variable that is the absolute value of the difference between an observation and the median of its group (more robust than Levene's original choice, the mean)". (from Brian Ripley)

Incidentally, if you insist on having Levine's calculation, Brian Ripley showed how: "

  >Levene <- function(y, group)
    {
       group <- as.factor(group)  # precautionary
       meds <- tapply(y, group, median)
       resp <- abs(y - meds[group])
       anova(lm(resp ~ group))[1, 4:5]
   }
 > data(warpbreaks) 
 > attach(warpbreaks) 
 > Levene(breaks, tension) 
 F value Pr(>F) group 2.818 0.06905

I could (and probably would) dress it up with a formula interface, but that would obscure the simplicity of the calculation."

**7.19 非線形モデルの推定に nls を使う [#c5be496c]
(14/08/2000 Paul E Johnson <pauljohn@ukans.edu>)

 x <- runif(100)
 w <- runif(100)
 y <- x^2.4*w^3.2+rnorm(100,0,0.01)
 plot(x,y)
 plot(w,y)
 library(nls)
 fit <- nls(y~x^a*w^b,start=list(a=2,b=3))
 l.est <- lm(log(y) ~ log(x)+log(w)-1)
 fit <-   nls(y~x^a*w^b,start=list(a=coef(l.est)[1],b=coef(l.est)[2]))

Error is a fairly common result of starting with parameter values that are far away from the true values, or have very different scales, so that the numeric-derivative part of nls fails. Various solutions are to do an approximate least-squares regression to start things off (also the "self-starting" models in the nls package), or to provide an analytic derivative. (all from Ben Bolker)

**7.20 nls の使用とそのグラフ化 [#fa13de14]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

If you use the nls function in R to fit the model in the parameterization that Guido described, you can graphically examine the profile likelihood with

 pr <- profile(nlsObject)
 plot(pr)

For an example, try

 library(nls)
 example(plot.profile.nls)

(from Douglas Bates)

**7.21 -2 Log(尤度) と仮説検定 [#i939cd6d]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

This is about the polr package, but the general point is that to compare likelihood models, you have to use your head and specify the comparison. Brian Ripley said:

As in the example on the help page

     options(contrasts=c("contr.treatment", "contr.poly"))
     data(housing)
     house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
     house.plr0 <- polr(Sat ~ 1, weights = Freq, data = housing)

Note that -2 LOG L is not actually well-defined as it depends on the grouping of the data assumed, and so is not a statistic. I assume that you want differences of that quantity, as in

 house.plr0$deviance - house.plr$deviance
 169.7283

**7.22 繰り返し測定をもつロジスティック回帰 [#n95a5343]
(02/06/2003 Paul Johnson <pauljohn@ku.edu>)

Hiroto Miyoshi described an experiment with 2 groups and pre/post measurement of a dichotomous dependent variable. Frank Harrell had this excellent response:

"There are several ways to go. GEE is one, random effects models another. One other approach is to install the Hmisc and Design packages (http://hesweb1.med.virginia.edu/biostat/s) and do (assume id is the unique subject identifier):

 f <- lrm(y ~ x1 + x2*x3 + ..., x=T, y=T) # working independence model
 g <- robcov(f, id)         # cluster sandwich variance adjustment
 h <- bootcov(f, id, B=100) # cluster bootstrap adjustment
 summary(g)  # etc.

**7.23 ロジット [#h2c9ce8d]
//Logit   
(06/04/2001 Paul Johnson <pauljohn@ukans.edu>)

ロジスティックモデル (glm(.......,family=binomial....) より、オッズ比と信頼限界を計算する。
//Compute the Odds Ratio and its confidence intervall, from a logistic model (glm(.......,family=binomial....).

http://www.myatt.demon.co.uk で入手できる入門ノートでこれをおこなう簡単な関数を示している。
//I show a simple function to do this in my introductory notes available from: http://www.myatt.demon.co.uk

基本的にはこうなる。
//Basically it is:

        lreg.or <- function(model)
          {
          lreg.coeffs <- coef(summary(salex.lreg))
          lci <- exp(lreg.coeffs[ ,1] - 1.96 * lreg.coeffs[ ,2])
          or <- exp(lreg.coeffs[ ,1])
          uci <- exp(lreg.coeffs[ ,1] + 1.96 * lreg.coeffs[ ,2])
          lreg.or <- cbind(lci, or, uci)        
          lreg.or
          }

(from Mark Myatt)

**7.24 時系列: 基礎 [#y0b4b84c]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

ts ライブラリをロードする必要がある。やってみよう。
//You have to load the ts library! try

 library(ts) 
 example(acf)

時系列のコンポーネントを持つ他の関数がある。以下を試してみよう。
//There are other packages with time series components. Try

 help.search("time")

to see whatever you have installed. Under my packages section of this document, I keep a running tally of packages that have time series stuff in them

Many time series procedures want to operate on a "time series" object, not raw numbers or a data frame. You can create such an object for "somedataframe" by:

 myTS <- ts(as.matrix(somedataframe), start=c(1992,1), frequency=12)

Note start and frequence designate time attributes.

**7.25 時系列: 雑多な例 [#ga3113ca]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

これには ts package の filter() が使用される。シンプルな指数加重移動平均手続きは以下のようになる。
//This uses the ts package's filter(): A simple exponential weighted moving average procedure would be

 ewma <- function (x, lambda = 1, init = 0)
 {
   y <- filter(lambda*x, filter=1-lambda, method="recursive", init=init)
   return (y)
 }

予測関数として ewma を使うのは指数平滑化として知られている。以下を試してみよう。
//Using ewma as a forecast function is known as exponential smoothing. Try, e.g.:

 x <- ts(diffinv(rnorm(100)))
 y <- ewma(x,lambda=0.2,init=x[1])
 plot(x)
 lines(y,col="red")

(from Adrian Tripletti)

**7.26 SAS と同値な手順 [#n51ad2f9]
(14/08/2000 Paul E Johnson <pauljohn@ukans.edu>)

もっといいリストがここでは必要だが
//I need a better list here, but:

1. R の gml は SAS の PROC GLM ではない。R では"G"は｢一般化｣のことを指し、SAS では GLIM ないしは GENMOD を見てほしい。 SAS GLM (一般線形モデル)は機能的に R の lm() や friends によるものである。
1. R の gml は SAS の PROC GLM ではない。R では"G"は｢一般化｣のことを指す（SAS では GLIM ないしは GENMOD を見てほしい）。 SAS GLM (一般線形モデル)は機能的に R の lm() やその仲間によるものである。
//1. glm in R is not PROC GLM in SAS. In R the "G" is for "generalized", cf. GLIM or GENMOD in SAS. The SAS GLM (General Linear Model) functionality comes via lm() and friends in R. (from Peter Dalgaard)

**7.27 その他。実際的な glm コード [#cf35771e]
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

 x1 <- seq(from=-1,to=2, length=20)
 x2 <- (rep(1:5,4))/4
 x <- cbind(1,x1,x2)
 y <- c(0,1,1,1,1,1,0,0,1,1,1,0,1,0,1,1,1,1,1,1)
 glm.fit(x,y,fam=binomial(link=logit))$coeff

(from Vipul Devas) $coeff を使って前出力を見ないようにする。
//(from Vipul Devas) Leave off the $coeff to see all output.

/////////////////////////////////////////////////
*8 パッケージ：どこを探したら良いのか [#n36bf58c]
/////////////////////////////////////////////////

**8.1 カテゴリカルデータと多変量モデル [#db4947b7]
//Categorical Data and Multivariate Models   
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

Jim Lindsey の "ordinal" パッケージを参照のこと。http://alpha.luc.ac.be/~plindsey/publications.html
//See Jim Lindsey's package "ordinal": http://alpha.luc.ac.be/~plindsey/publications.html

MASS パッケージの比例オッズロジスティック回帰も参照のこと。
//And also Proportional Odds Logistic Regression in the MASS package

**8.2 散布図を通る滑らかな曲線を描く(Lowess 等) [#mf7e7c37]
//Plot a smooth curve through a scatterplot (Lowess, etc)   
(14/08/2001 Paul Johnson <pauljohn@ukans.edu>)

 library(modreg)

ユーザーはSの lowess() と R の loess との違いについて混乱すると報告している。 Martin Maechler は「デフォルトのloess() はlowess() ほど、抵抗がないし、頑健でもない。loess(....., family = "sym")の利用を薦める。」とアドバイスした。
//Users report confustion about the difference between lowess() in S and loess in R. Martin Maechler advised us, "loess() by default is not resistant/robust where as lowess() is. ... I would recommend using loess(....., family = "sym")"

**8.3 階層的／混合線形モデル [#pb969e9b]
//Hierarchical/Mixed linear models.   
(18/06/2001 Paul Johnson <pauljohn@ukans.edu>)

User asked to estimate what we call a "contextual regression," a model of individuals, many of whom are in the same subsets (organizations, groups, etc). This is a natural case for a mixed regression model

Jos? C. Pinheiro and Douglas M. Bates, Mixed Effect Models in S and S-Plus (Springer,2000).

The nlme package for R has a procedure lme that estimates these models.

See also Jim Lindsey's repeated measurements package (available at http://alpha.luc.ac.be/~lucp0753/rcode.html).

**8.4 時系列用のツール？ [#q10e08bc]
//Time Series tools?   
(06/03/2003 Paul Johnson <pauljohn@ku.edu>)

Some timeseries tools are distributed with R itself, in the ts library. type

 library(ts)
 ?ts
 ?arima0

DSE:Paul Gilbert said "My dse package handles multiple endogenousvariables (y's) and multiple input or conditioning variables (which are often calledexogenous variables, but sometimes are not truly exogenous). Integrating models arehandled but you might need to consider what estimation technique to use. I have notimplemented any integration tests (volunteers?). DSE also handles state space models.It does not implement time varying models or non-linear models, although it wouldprovide a valuable framework for anyone who would like to do that." Look on CRAN!Or check at Paul's site, http://www.bank-banque-canada.ca/pgilbert.

For unevenly spaced time series data, take a look atLindsey's growth and repeated libraries: www.luc.ac.be/~jlindsey/rcode.html (from Jim Lindsey)

In the "nlme" package, there is a continuous AR(1) class, corCAR1, whichshould be used in the case when the times are unequally spaced andmeasured on a continuous scale. (from Jose Pinheiro). The usage of both lme and nlme is described inJose C. Pinheiro and Douglas M. Bates, Mixed-Effects Models in S and S-PLUS,Springer,2000.

**8.5 Durbin-Watson 検定 [#yeba9fcf]
(02/06/2003 Paul Johnson <pauljohn@ku.edu>)

car および lmtest パッケージは二つとも Durbin-Watson テスト関数をもっている。(John Fox)
//The car and lmtest packages both have functions for Durbin-Watson tests. (John Fox)

**8.6 打ち切り回帰 [#p3a63fea]
//Censored regression   
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

survival5 を使えば、打ち切り回帰ができる。
//You can do a censored regression with the survival5 library

 library(survival5)
 survreg(Surv(y,c), dist="normal")

ここで y は従属変数で、 c は観測されていれば１であり、そうでなければ０である。
//where y is the censored dep-var and c is 1 if observed and 0 otherwise. (from Dan Powers)

**8.7 微分方程式を推定する [#d122e6fa]
//Estimate differential equation models?   
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

-you can write your models using your favourite ode solver library (limited C or Fortran) and then dyn.load() the model, so you can use it inside R. It is not too difficult, and if you are used to a certain library where you have your models, is probably the easiest way.
-there is a preliminar porting to R of the nls2 library, which is a library that makes nonlinear regression from differential equation models. It is a bit difficult to install, and is also not easygoing, but the nonlinear regression works well. In that library there is a way to actually compile a differential equation model and integrate it using the lsode routine from ODEPACK. So if you want to see how to plug a ode solver with R in a more general way you can start having a look at it in http://www-bia.inra.fr/J/AB/nls2/welcome.html (from Jesus Maria Frias Celayeta)

**8.8 R から SQL データベースへアクセスするツール [#r36422e3]
//Tools to access SQL databases from R   
(30/01/2001 Paul Johnson <pauljohn@ukans.edu>)

以下を検討してみよう
//Consider

 http://rpgsql.sourceforge.net/

Brian Ripley は「経験上、MySQL については、RMySQL よりも、RODBC の方が動作がよい。」と言っていた。
//Brian Ripley said "RODBC works well with MySQL, in my experience better than RMySQL."

RMySQL というパッケージがある。
//There is a package RMySQL.

CRAN の contrib/Devel には他のパッケージがある。
//and other packages in CRAN's contrib/Devel.

**8.9 判別分析 [#i270cbc2]
//Discriminant Analysis   
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

VR のバンドルパッケージには、MASS パッケージに線形および二次 DA(lda, qda) 、 nnet にニューラルネット、および class パッケージに k nearest neighbour を含んだ他の手法がいくつかある。CRAN のcontrib セクションでバンドルパッケージを見つけられる。　
//The VR bundle of packages has linear and quadratic DA (lda, qda) in the MASS package, neural nets in nnet, and several other methods including k nearest neighbour in the class package. You find it in the contrib section of CRAN.

分類木は、tree および rpart パッケージを使って当てはめをおこなうことができる。
//Classification trees can be fitted using both the tree and rpart packages. (Friedrich Leisch)

**8.10 対数線形モデル用のツール [#h6e92f15]
//Tools for log linear models   
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

loglin() が base パッケージに、loglm() が MASS パッケージにある (from Thomas Lumley)
//loglin() in the base package and loglm() in package MASS (from Thomas Lumley)

**8.11 分布のパラメータを推定するためのツール [#m8ce10c0]
??Tools to estimate parameters of distributions   
(13/08/2000 Paul Johnson <pauljohn@ukans.edu>)

The gnlr and gnlr3 functions in my gnlm library at www.luc.ac.be/~jlindsey/rcode.html will do this for a wide selection of distributions.... The gnlr function in my gnlm library will fit quite a variety of different overdispersed Poisson- and binomial-based distributions (i.e. phi different from one) using their exact likelihoods. (from Jim Lindsey)

Maximize the likelihood directly. There are worked examples in the Venables & Ripley on-line exercises and answers. Now for the gamma there are some shortcuts, as the MLE of the mean is the sample mean, and gamma.shape.glm in package MASS will fit the shape parameter by MLE (from Brian D. Ripley)

The log-likelihood function for the parameters alpha and beta given the data x is sum(dgamma(x, shape = alpha, scale = beta, log = TRUE)) To determine the maximum likelihood estimates, simply put the negative of the log-likelihood into the optim function.

A worked-out example is given as xmp06.12 in the Devore5 library (available in the contrib section at CRAN). Use

 library(Devore5)
 help(xmp06.12)      # shows the description with comments
 example(xmp06.12)   # run the example to see the result

(from Douglas Bates)

**8.12 ブートストラップ用ルーティン [#h706a73a]
(06/04/2001 Paul Johnson <pauljohn@ukans.edu>)

CRAN should have bootstrap and boot, two packages worth looking at. Most people say that "boot" is better, more actively maintained.

**8.13 ロバストな回帰用のツール [#ud308f64]
(14/08/2000 Paul Johnson <pauljohn@ukans.edu>)

ロバストな回帰がSでは利用できる、その名前は rreg である。
//Robust Regression is available in S, its name is rreg.

VR のバンドルパッケージにはもっとよいロバストな回帰がある。
//There's better robust regression in the VR bundle of packages.

 >library(MASS)
 >help(rlm)
 >library(lqs)
 >help(lqs)

(from Thomas Lumley

You could try function rlm in package MASS (in the VR bundle). That has an MM option, and was originally written to work around problems with the (then) version of lmRobMM in S-PLUS. It uses lqs in package lqs, and that is substantailly faster than the ROBETH-based code (as I started with that route). (from Brian D. Ripley)

**8.14 不等分散性の推定 [#l67490d1]
(07/12/2000 Paul Johnson <pauljohn@ukans.edu>)

hmctest() in package lmtest performs the Harrison-McCabe-Test against heteroskedasticity. (from Torsten Hothom)

**8.15 不等分散の Garch/時系列 のモデリング用ツール [#a04d4255]
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

The contrib package tseries (by Adrian Trapletti) contains Garch tools and also:

 NelPlo                   Nelson-Plosser Macroeconomic Time Series
 garch                    Fit GARCH Models to Time Series
 get.hist.quote           Download Historical Finance Data
 jarque.bera.test         Jarque-Bera Test
 na.remove                NA Handling Routines for Time Series

It can also do White's test, not the general heteroskedasticity test, but a more specific White statistic which tests for neglected nonlinearity in the conditional mean, essentially for a regression equation or linear time series model.

Adrian Trapletti, author of tseries package, also added "One very simple way to test conditional heteroskedasticity (time series setup) is to use the Box.test (package ts) on the squared values of a time series as

 Box.test(x^2, lag = 1, type="Ljung")


**8.16 Lowess [#dfdf0346]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

R-base は"lowess"をもっている
//R-base has "lowess"

"modreg" パッケージは "loess" を持っている。後者は線形モデルに似たツールで、線形モデルを取り扱う他の R ベースのツールと一緒に利用できる。たとえば、以下の残差式のように
//The package "modreg" has "loess". The latter is a linear modeling-alike tool,and it can be used with other R-base tools for dealing with linear models, like the residuals function:

 > l <- loess(y1~x1)
 > plot(l)
 > r <- residuals(l)

r は残差を持つ変数である。
//r is now a variable holding residuals.

**8.17 曲線／曲面当てはめ用の局所回帰 [#o36fb73b]
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

http://cm.bell-labs.com/cm/ms/departments/sia/project/locfit/index.html

**8.18 積分のためのツール [#s48045bd]
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

rmutil ライブラリの int 関数で処理できる。www.luc.ac.be/~jlindsey/rcode.html (from Jim Lindsey)
//The int function in my rmutil library will handle this. www.luc.ac.be/~jlindsey/rcode.html (from Jim Lindsey)

**8.19 Box-Cox モデル [#p8070ab8]
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

 boxcox() in MASS

**8.20 更新された V&R パッケージ (polynom 等) [#t448d059]
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

polynom の最新版がほしければ、http://www.stats.ox.ac.uk/pub/MASS3/Sprog/ およびミラーサイトにある V&R の `S Programming' のスクリプトにある。この本（およびスクリプト）で論じられているバージョンは、CRAN (またはベースとなっている S のバージョン)のものよりは、かなり古いものである。(from Brian D. Ripley
//If you want the current version of polynom, it is in the scripts for V&R's `S Programming', at http://www.stats.ox.ac.uk/pub/MASS3/Sprog/ and mirrors. The version discussed in that book (and in those scripts) is considerably later than that on CRAN (or the S versions on which that is based). (from Brian D. Ripley

注意されたいのは、"Sprog.tar.gz" をダウンロードし、unzip で解凍し、章ごとのディレクトリができ、後はこれらのディレクトリに、いつものようにインストールできる R パッケージができているということである。
//Note, you download "Sprog.tar.gz", unzip to find directories for the chapters, then in there you find R packages you you can install in the regular way.

**8.21 R 用の GIS パッケージ [#i9b7c7da]
(06/09/2000 Paul Johnson <pauljohn@ukans.edu>)

GIS パッケージGRASSをためしてみよう。
//Try the GIS package GRASS:

 http://www.baylor.edu/~grass/statsgrasslist.html

ここより、Computers & Geosciences (2000), 26, pp. 1043-1052 で記述されているパッケージにアクセスできるかもしれない。メーリングリストもあり、そこにより発展したものが、 GRASS 5.0 開発の調整に当たっている Markus Neteler と関わりのある仕事が書かれている。現在のパッケージは GRASS_0.1-4 であり、これは以下にある。
//from which the package described in Computers & Geosciences (2000), 26, pp. 1043-1052, may be accessed. There is a mailing-list too, and more progress is described in work with Markus Neteler, who is coordinating GRASS 5.0 development. The current package is GRASS_0.1-4, and is at:

 http://reclus.nhh.no/gc00/GRASS/GRASS_0.1-4.tar.gz

これが役に立ちそうなら、連絡してください。パッケージは開発中で、ユーザのフィードバックが必要( Roger Bivand Roger.Bivand@nhh.no より)。
//Please contact me if this may be helpful - the package is under development, and needs user feedback! (from Roger Bivand Roger.Bivand@nhh.no)

以下の Computers & Geosciences の最近の論文を参照のこと。
//See also a recent paper in Computers & Geosciences:

TI: Using the R statistical data analysis language on GRASS 5.0 GIS data base files AU: R.S. Bivand SO: Computers & Geosciences v.26, no.9/10, pp. 1043-1052. PY: 2000

Paulo Justantiano は以下事柄も指摘している。
//Paulo Justantiano pointed out there is also:

- "spatial" パッケージは MASS に含まれている
//- the package "spatial" is included with MASS
- "sgeostats" パッケージは、CRAN ウェブサイトで入手できる(地球統計分析のみ)
//- the package "sgeostats" is available at the CRAN web site (for geostatistical analysis only)
ー  "splancs" は、ftp://reclus.nhh.no/pub/R/splancs-2.01-1.tar.gz で入手できる(点過程のみ)
//- "splancs" is available at ftp://reclus.nhh.no/pub/R/splancs-2.01-1.tar.gz (for point process only)
- "geoR" は、http://www.maths.lancs.ac.uk/~ribeiro/geoR.html  で入手できる(地球統計分析のみ)
//- "geoR" is available at: http://www.maths.lancs.ac.uk/~ribeiro/geoR.html (for geostatistical analysis only)

Rainer Hurling は追加した。以下を見て欲しい
//Rainer Hurling added, have a look at

 http://reclus.nhh.no/gc00/gc009.htm

GIS ソフト 'GRASS' (http://www.geog.uni-hannover.de/grass/index2.html) は データベースシステム 'PostgreSQL' (http://www.postgresql.org) と'R' (http://www.R-project.org) と一緒にちゃんと動作する。GRASS と PostgreSQL との結合については、http://www.geog.uni-hannover.de/grass/projects/postgresqlgrass を、GRASS と R については、ftp://reclus.nhh.no/pub/R を参照。
//The GIS software 'GRASS' (http://www.geog.uni-hannover.de/grass/index2.html) works fine with database system 'PostgreSQL' (http://www.postgresql.org) and 'R' (http://www.R-project.org). For linking between GRASS and PostgreSQL see http://www.geog.uni-hannover.de/grass/projects/postgresqlgrass and between GRASS and R see ftp://reclus.nhh.no/pub/R .

**8.22 多カテゴリ従属変量モデル [#tea2f3ac]
//Multicategory dependent Variable Models   
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

Brian Ripley said:

Depends on what you mean (multinomial?). There is what I understand by multinomial logistic in function multinom in package nnet in the VR bundle, and nnet there can do several other variants on this theme.

Most such models are not hard to program and fit by a general-purpose optimizer like optim().

One alternative I've not tried:

David Clayton said:

You can also fit these models using the gee() package. The method is not ML, but there are certain other advantages to this method.

See

 http://www.mrc-bsu.cam.ac.uk/pub/publications/submitted/david_clayton/ordcat2.ps

for details.

**8.23 NLME パッケージ: S+ の lme の役目を果たす [#b75df4ee]
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

CRAN で入手可能な "nlme" を参照のこと。
//See "nlme", available at CRAN.

**8.24 区間的打ち切りデータ [#r6085f17]
(08/12/2000 Paul Johnson <pauljohn@ukans.edu>)

If you want parametric models, the gnlr and gnlr3 functions in my gnlm library will do all this: www.luc.ac.e/~jlindsey/rcode.html

**8.25 多次元尺度法 [#q070848e]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

"mva" パッケージの cmdscale および "MASS"の sammon とisoMDS を参照のこと。multiv にもっと同じもの、若しくは少なくとも同じものが多めにある。(Brian D. Ripley より)
//See cmdscale in the package "mva" and sammon and isoMDS in "MASS". There are some more in multiv, or at least more of the same. (from Brian D. Ripley)

/////////////////////////////////////////////////////
*9 R の主たる提供物中にないウェッブ情報源 [#h469734e]
/////////////////////////////////////////////////////

**9.1 心理学用の R [#b1ba2fcf]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

心理学実験およびアンケートのための R 利用についてのJonathan Baron とYuelin Li によるメモ
//Notes on the use of R for psychology experiments and questionnaires by Jonathan Baron and Yuelin Li at

 http://www.psych.upenn.edu/~baron/rpsych.htm and  
 http://www.psych.upenn.edu/~baron/rpsych.pdf and
 http://www.psych.upenn.edu/~baron/rpsych.tex
 ※日本語訳版追記：移転先 http://www.psych.upenn.edu/~baron/rpsych/rpsych.html

このメモは心理学を研究している学生やその他の人向けである。「心理学用」としているのは、このメモは、心理学者の使う（特に人間研究をする）主要な統計手法のカバーを試みているためである。
//It is intended for students and others who are doing research in psychology. What makes it "for psychology" is that it tries to cover the major statistical methods that psychologists use (particularly those who study humans).

Baron 教授も素晴らしい「参考カード」を作った。
//Prof. Baron made a nice "reference card" as well: 
  http://www.psych.upenn.edu/~baron/refcard.pdf

**9.2 生物学用の R [#of1465bb]
(11/08/2000 ? <?>)

 http://www.stat.Berkeley.EDU/users/terry/zarray/Html/Rintro.html

**9.3 S その他の学習用素材 [#z7b87021]
(14/02/2001 Paul Johnson <pauljohn@ukans.edu>)

"S Poetry" is available online at http://www.seanet.com/%7Epburns/Spoetry/

http://www.insightful.com/resources/doc/default.html Splus guides - quite useful, I think, for general approaches especially (free downloads)

http://www.insightful.com/resources/biblio.html list of Splus books - too many!

http://hesweb1.med.virginia.edu/biostat/s/doc/splus.pdf guide to Splus for SAS users (or anyone, really)

and the other documents in CRAN contributed documents

Also, I'd check here, the R site for contributed docs, periodically: http://cran.r-project.org/other-docs.html

**9.4 Pat Altham の misc. S+ ワークシート [#vc4543b4]
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

 http://www.statslab.cam.ac.uk/~pat

**9.5 Keith Worsley の一般化線形モデルコース用の R スクリプト [#n44caeef]
//scripts for his Generalized Linear Models course   
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

These are useful!

 http://euclid.math.mcgill.ca/keith/

**9.6 Mark Myatt の R notes [#d38153c1]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

Mark Myatt は、役に立ついくつかのメモと例を提供している: http://www.myatt.demon.co.uk/ 彼はこれらすべてを一つにまとめた大きなファイル、 "Rex.zip" を提供している。
//Mark Myatt offers some notes and example datasets that I found useful: http://www.myatt.demon.co.uk/ He offers everything in one big bundle called "Rex.zip"

**9.7 オンラインの計量経済学素材: [#xc999054]
(15/08/2000 Paul Johnson <pauljohn@ukans.edu>)

Hans Ehrbar のオンラインコース http://www.econ.utah.edu/ehrbar/ecmet.pdf
//Hans Ehrbar's online course http://www.econ.utah.edu/ehrbar/ecmet.pdf

Jim Lindsey の用 R コード Introductory Statistics: A Modelling Approach. Oxford University Press (1995) および Models for Repeated Measurements. Oxford University Press (1999, Second Edition)
//Jim Lindsey's R code for Introductory Statistics: A Modelling Approach. Oxford University Press (1995) and Models for Repeated Measurements. Oxford University Press (1999, Second Edition) 
http://alpha.luc.ac.be/~jlindsey/books.html 教師用マニュアル www.luc.ac.be/~jlindsey/publications.html も参照のこと。
//http://alpha.luc.ac.be/~jlindsey/books.html See also the instructor's manual: www.luc.ac.be/~jlindsey/publications.html

**9.8 R の電子メイルリストアーカイブ [#jd8a7f22]
(06/09/2000 Paul Johnson <pauljohn@ukans.edu>)

OK, you probably find this in CRAN, but I kept forgetting:

 http://www.ens.gu.edu.au/robertk/R/help/00b/subject.html

//////////////////////////////////////
*10 R の作業スペース [#dd6259bc]
//////////////////////////////////////

**10.1 R コードを書く、保存する、実行する [#x7708553]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

The easiest thing is to use some editor, Emacs or Winedit, write your R code, and then run it into R, and if it fails, change it in the editor, run it it again. Emacs with ESS installed will make this work fine. I've heard of plugins for other editors, did not try them, though.

If you don't like that, you can type commands directly into the R window and they have some handy things.

Up and down arrows cycle through the history of commands.

If you type in a function and want to revise it:

fix(f) or g <- edit(f) should bring up an editor.

I personally don't see any point in this, though, since you should just use an editor.

**10.2 .RData, .RHistory. Help または混乱? [#b861e51a]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

When you quit R, since version 1 or so, it asks if you want to save the workspace. If you say yes, it creates a file .RData. The next time you start R in this working directory, it opens all those objects in .RData again. Type "objects()" to see.

If you want that, OK. If you don't, say NO at quit time, or erase the .RData file before starting R.

**10.3 R オブジェクトのセーブとロード [#qe9d7def]
(31/12/2001 Paul Johnson <pauljohn@ukans.edu>)

以下をやってみて
//If you try

 > a <- c(1,2,3)
 > save(a, file="test.RData")
 > rm(a)
 > load("test.RData")
 > a
 [1] 1 2 3

(from Peter Dalgaard, 12/29/2001)

**10.4 オブジェクトの解析／用法の備忘録 [#w636fcc1]
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

 objects() #lists all objects
 attributes(anObject)
 str(anObject) #-- diagnostic info on anObject
 page(anObject) #-- pages through anObject
 rm(anObject) #-- removes an object
 rm(list=ls()) #-- removes all objects

**10.5 名前中のパターンによるオブジェクトの消去 [#baf65da1]
(31/12/2001 ? <?>)

lm (lm1, lm2, lm3 等で)始まるオブジェクトを除去したい場合
//Suppose you want to remove objects beginning with lm (lm1, lm2, lm3 etc).

 rm(list=objects(pattern="^lm.*")) (from Kjetil Kjernsmo)

g01, g02等のような名前を持つオブジェクトを除去にこれを使わないように。たとえば rm(list=ls(pat="g*")) のように。こうするとすべてのオブジェクトが除去されてしまう！
//Don't do this to remove all objects with names like g01, g02, etc: rm(list=ls(pat="g*")). That removes everything!

To remove objects whose names have "g" anywhere in the string, use rm(list=ls(pat="g")), or names starting with "g" one can use rm(list=ls(pat="^g")).

The "^" and "$" can be used to delimit the beginning and ending of a string, respectively. (from Rashid Nassar)

**10.6 毎日の作業日誌を作成・保管する [#x3ffa8d9]
(31/12/2001 ? <?>)

consider sink("filename")

The file .Rhistory contains commands typed on Unix systems.

Emacs users can run R in a shell and save everything in that shell to a file.

Ripley added: But if you are on Windows you can save the whole console record to a file, or cut-and-paste sections. (And that's what I do on Unix, too.)

**10.7 Rprofile をカスタマイズする [#p3015653]
(31/12/2001 ? <?>)

In your HOME directory, the file ~/.Rprofile. For example, if you think the stars on significance tests are stupid, follow Brian Ripley's example:

 options(show.signif.stars=FALSE)
 ps.options(horizontal=FALSE)

There is a system-wide Rprofile you could edit as well in the distribution


///////////////////////////////////////////////////
*11 OS  とのインタフェイス [#c34a12eb]
////////////////////////////////////////////////////

**11.1 システムに対する作業ディレクトリ変更命令 [#o376489f]
(22/11/2000 Paul Johnson <pauljohn@ukans.edu>)

オペレーティングシステムへのコマンド: system("引用符の間にコマンドを入れる").
//Commands to the operating system: system("insert command here between quotes").

For the particular case of changing directories use getwd/setwd [I think system() spawns its own shell, so changing directories with system() isn't persistent] . For example:

 > getwd()   ## start in my home directory
 [1] "/home/ben"
 > system("cd ..")  ## this executes OK but ...
 > getwd()
 [1] "/home/ben"    ## I'm still in the same working directory
 > setwd("..")      ## this is the way to change working directory
 NULL
 > getwd()          ## now I've moved up one
 [1] "/home"

(from Ben Boker)

**11.2 システム時刻を得る [#t9b23a94]
(30/01/2001 Paul Johnson <pauljohn@ukans.edu>)

 > date()
 [1] "Tue Jan 30 10:22:39 2001"
 > Sys.time()
 [1] "2001-01-30 10:21:29"
 > unclass(Sys.time())
 [1] 980871696

This is the number of seconds since the epoch (Jan 1, 1970). I use it as a "time stamp" on files sometimes.


**11.3 あるファイルが存在するかどうかチェック [#ie4aebc6]
(11/08/2000 Paul Johnson <pauljohn@ukans.edu>)

file.exists() 関数がこれをする。
//The function file.exists() will do this

**11.4 ファイルを名前もしくは名前の一部で探す（正規表現マッチング） [#v1d26531]
(14/08/2001 Paul Johnson <pauljohn@ukans.edu>)

?list.files() を実行する。
//Do ?list.files().

list.files() gives back a vector of character strings, one for each file matching the pattern. Want file names ending in .dat? Try:

 myDat <- list.files(pattern="??.dat$")

Then iterate over this myDat object, using the elements myDat[[i]] if you do something to each file.

Please note patterns in R are supposed to be regular expressions, not ordinary shell wildcards. Regular expressions are fairly standard in unix, but each version has its wrinkles. To see about the grep command, or about apropos, where regexp is used, do:

 ?grep  
#or
 ?apropos

Martin Maechler spelled out a few of the basics for me:

 "."  means  ``any arbitrary character''
 "*"  means  ``the previous [thing] arbitrary many (0,1,...) times''
              where [thing] is a single character (or something more general
                                                   for advanced usage)


**11.5 R パッケージをインストールする [#s70a7ca0]
(30/01/2001 Paul Johnson <pauljohn@ukans.edu>)

 install.packages()

これはパッケージをダウンロードして、インストールする!
//will download a package and install it!

以下のようにタイプすれば、ネットワークよりパッケージをインストールする
//It will get it from the network where you say, as in

 install.packages("VR",lib="c:/r/library",CRAN="http://lib.stat.cmu.edu/R/CRAN")

R のセッションではなくて、オペレーティング・システムでは、以下のようにしてダウンロードしたパッケージをインストールできる。
//If you are in the Operating System, not in an R session, you can install a package you've downloaded with

 R INSTALL packageName

packageName がたとえ zip で圧縮されていても、たとえば packageName.tar.gz が引数として受け付けてくれる。
//This works even if packageName is zipped, i.e., packageName.tar.gz will be accepted as an argument.

"root" でない場合、R に CRAN から直接どこにインストールするのかを告げることができる。
//If you are not "root", tell R where to install the package directly from CRAN:

 install.packages('tree', '~/.Rlibs')

Omitting the 2nd argument causes the routine to issue a warning and try to install in the system package location, which is often what you want, provided you have the required write access.

And, from the operating system shell, there is an equivalent. If you want to install the library in a private collection you can add the -l flag to specify the location of the library. You should then establish an environment variable R_LIBS as a colon-separated list of directories in which to look for libraries.

The call to R INSTALL would look like

 R INSTALL /usr/apps/tree -l ~/.Rlibs (from Douglas Bates)

**11.6 メモリ [#lf7a061f]
(30/01/2001 Paul Johnson <pauljohn@ukans.edu>)

R-1.2 は、ワークスペースを自動的に変更する新しいメモリーのアプローチを実装している。詳細については、次のようにタイプする
//R-1.2 implements a new memory approach that automatically resizes the workspace. To read more about it, type

 ?Memory

または
//or

 help(Memory)

R FAQ (http://cran.r-project.org/doc/FAQ/R-FAQ.html) のセクション7.1を参照できる。
//You can also look at section 7.1 of the R FAQ (http://cran.r-project.org/doc/FAQ/R-FAQ.html).

マニュアルでメモリーを配置する方法についての情報は削除している。この方法はもはや R チームが薦めていない。
//I'm deleting all info about how to manually allocate memory. It is no longer recommended by the R team.

**11.7 スクリプト化された R の実行プログラムに環境変数を引数として渡す [#gca5922d]
(30/01/2001 Paul Johnson <pauljohn@ukans.edu>)

Suppose you want an R program to be called in a script, and you absolutely must pass an argument to the R program to tell it what data to use or what variable name to use, etc.

Sys.getenv() を使って、シェルでセットした値を検索できる。
//Use Sys.getenv() to retrieve value you have set in your shell.

C シェルでは
//For a C shell,

 $ setenv R_INFILE foobar

またBASHでは
//or BASH

 $ export R_INFILE=foobar

Then in your R script grab that from the environment

 > Sys.getenv("R_INFILE")
 R_INFILE 
 "foobar" 
 > sink(Sys.getenv("R_INFILE"))

(from Thomas Lumley )

**11.8 minitab の  "como" を覚えている？ 代わりにスクリプトを使おう！ [#je04396a]
on minitab? Use script for that!   
(13/08/2001 ? <?>)

unix のほとんどシステムは script という名前のプログラムがあり、これでテキストの入出力のスナップショットを保存できる。以下のようにタイプすればよい。
//Most unix systems have a program called script, which can keep a snapshot of text input and output. If you type

 $ script filename
 >R
 ----do whatever R you want
 >quit()
 $exit

ここで $ はシェルプロンプトである。
//The $ are for the shell prompt.

///////////////////////////////////////////////////////////////////////////
*12 とんでもない R のトリック：それ無しでは生きていけない [#ra8ac16c]
//////////////////////////////////////////////////////////////////////////

**12.1 R ファイル中の箇所をコメントアウト [#x361e5da]
(15/08/2000 ? <?>)

Put # at beginning of line, or

"With a good code editor none of these are any problem. There is no reason to write many lines of comments in R code: that's what the .Rd file is for. And if you want to temporarily delete a block, if(0) { ... } will work, as indeed will temporarily deleting in an editor." (from Brian D. Ripley)

Don't forget ESS has the ability to comment-out a highlighted region:

 M-x comment-region

**12.2 R のヘルプページから出力するために example() を使う [#q9e984b2]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

See "example" at end of output in help?

Try

 >?ls
 >example(ls)

Graphs go by too fast? do

 par(ask=TRUE)

and try example(ls) again.

**12.3 ヘルプを使う [#m7258be1]
(13/08/2001 Paul Johnson <pauljohn@ukans.edu>)

 help(function)  # same as ?function
 help.start() # fires up Netscape browser of html pages
 help.search() #searches in all available functions from all loaded packages

Don't forget that help.search takes regular expressions (as its help describes)

 > help.search(".*trans.*")
Help files with name or title matching `.*trans.*':


 boxcox(MASS) 
 "Box-Cox Transformations for Linear Models gilgais(MASS) 
 Line Transect of Soil in Gilgai Territory heart(MASS) 
 The Stanford heart transplant data

Thomas Lumley

There is also an "apropos" function, similar in nature to emacs. If you have a vague idea of what you want, try it. Want to find how to list files? Do:

 apropos("files")

Often you don't know the keyword to look for.

**12.4 もはや必要ないライブラリを取り除く [#u4fbc5ed]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

detach(package:...) を使用する。例を以下に示す。
//Use detach(package:...) , for instance:

 > search()
 [1] ".GlobalEnv"    "package:ctest" "Autoloads"     "package:base" 
 > detach(package:ctest)
 > search()
 [1] ".GlobalEnv"   "Autoloads"    "package:base"

(from Emmanuel Paradis)


////////////////////////////////////////
*13 私が面白いと思ったその他の事柄 [#qa61aa6b]
////////////////////////////////////////////

**13.1 表現式の内部で変数の名前をリストする [#j3c381da]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

 > all.names(expression(sin(x+y)))
 [1] "sin" "+"   "x"   "y"
 > all.names(functions=FALSE,expression(sin(x+y)))
 [1] "x" "y"
 > all.vars(functions=FALSE,expression(sin(x+y)))
 [1] "x" "y"
 > all.vars(expression(sin(x+y)))
 [1] "x" "y"

all.vars() はformula, terms, または expression オブジェクトに対して動作する。
//all.vars() works on a formula, terms, or expression object.

 > all.vars((~j(pmin(3,g(h(x^3*y))))))
 [1] "x" "y"

**13.2 スクリプトの内部での R 環境 [#a3e6aad1]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

This function does not remove objects correctly

 > testLoadSeveralHDF <- function(numFiles)
   {  
       for (i in 0:(numFiles-1)) 
        { filename <- paste("trial",i,".hdf",sep="") 
          print(filename)
          hdf5load(filename) 
          rm(list = ls(pat="^g")) 
       }
   }

Doug Bates said:

Add envir = .GlobalEnv to both the ls and the rm calls. That is

 rm(list = ls(pat = "^g", envir = .GlobalEnv), envir = .GlobalEnv)

**13.3 微分 [#j7f8d1ac]
(10/04/2001 Paul Johnson <pauljohn@ukans.edu>)

first and second derivative of the expression wrt x

 > D(expression(x^2),"x")
 > D(D(expression(x^2),"x"),"x")

The real R FAQ 7.6 explains "How can I get eval() and D() to work?"

前半 [[P_Johnson_tips_1]]へ移動