Rの統計解析関数Tips
R に関する Tips で欠けているもの、そしておそらく一番困難(統計学本来の知識が不可欠)なものは R の本来の目的である統計解析機能に関する Tips でしょう。ここには気がついた有用な Tips をメモ代わりに紹介します。
> df1 <- data.frame(x=1:6, y=rep(1:3, 2)) > fit <- lm(y~x, df1) > Sum <- summary(fit)
# 当てはめオブジェクト fit の要約(普通ユーザーはこの情報だけで十分) > Sum Call: lm(formula = y ~ x, data = df1) Residuals: 1 2 3 4 5 6 -0.4286 0.3429 1.1143 -1.1143 -0.3429 0.4286 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2000 0.8177 1.468 0.216 x 0.2286 0.2100 1.089 0.338 Residual standard error: 0.8783 on 4 degrees of freedom Multiple R-Squared: 0.2286, Adjusted R-squared: 0.03571 F-statistic: 1.185 on 1 and 4 DF, p-value: 0.3375
# 当てはめオブジェクトの要約が持つ情報(リスト形式になっている)一覧 > attributes(Sum) $names [1] "call" "terms" "residuals" "coefficients" [5] "sigma" "df" "r.squared" "adj.r.squared" [9] "fstatistic" "cov.unscaled"
# 回帰係数と関連情報を表形式で得る(summary(fit) が表示する情報の一部!) > coefficients(Sum) # coefficients 関数は線形回帰当てはめオブジェクトから回帰係数を取り出す Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2000000 0.8176622 1.467599 0.2161194 x 0.2285714 0.2099563 1.088662 0.3375019 > Sum$coefficients # これでも良い Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2000000 0.8176622 1.467599 0.2161194 x 0.2285714 0.2099563 1.088662 0.3375019
参考までに述べると lm 関数が返す当てはめオブジェクトは実際は次のような大量の情報を所有している。これらから本当に必要な情報を抜き出すために summary 関数等が必要になる。もちろんこれらはユーザーには直接必要なものではないが、当てはめ結果を用いたその後の二次的解析等で用いられる。自分でそうした二次解析用の関数を書くためには、これらの内部情報を適当に取り出し利用することになる。S/R の最大のメリットの一つが、多様な二次的解析を可能にするこうした解析結果の詳細情報の保持であるが、一方でともかく結果だけが欲しいユーザーにとっつきにくい印象を与えるのは致し方ない。
> str(fit) # str 関数で当てはめオブジェクトの内容を見る List of 12 $ coefficients : Named num [1:2] 1.200 0.229 ..- attr(*, "names")= chr [1:2] "(Intercept)" "x" $ residuals : Named num [1:6] -0.429 0.343 1.114 -1.114 -0.343 ... ..- attr(*, "names")= chr [1:6] "1" "2" "3" "4" ... $ effects : Named num [1:6] -4.89898 0.95618 1.25970 -0.87467 -0.00904 ... ..- attr(*, "names")= chr [1:6] "(Intercept)" "x" "" "" ... $ rank : int 2 $ fitted.values: Named num [1:6] 1.43 1.66 1.89 2.11 2.34 ... ..- attr(*, "names")= chr [1:6] "1" "2" "3" "4" ... $ assign : int [1:2] 0 1 $ qr :List of 5 ..$ qr : num [1:6, 1:2] -2.449 0.408 0.408 0.408 0.408 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:6] "1" "2" "3" "4" ... .. .. ..$ : chr [1:2] "(Intercept)" "x" .. ..- attr(*, "assign")= int [1:2] 0 1 ..$ qraux: num [1:2] 1.41 1.19 ..$ pivot: int [1:2] 1 2 ..$ tol : num 1e-07 ..$ rank : int 2 ..- attr(*, "class")= chr "qr" $ df.residual : int 4 $ xlevels : list() $ call : language lm(formula = y ~ x, data = df1) $ terms :Classes 'terms', 'formula' length 3 y ~ x .. ..- attr(*, "variables")= language list(y, x) .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:2] "y" "x" .. .. .. ..$ : chr "x" .. ..- attr(*, "term.labels")= chr "x" .. ..- attr(*, "order")= int 1 .. ..- attr(*, "intercept")= int 1 .. ..- attr(*, "response")= int 1 .. ..- attr(*, ".Environment")=length 33 <environment> .. ..- attr(*, "predvars")= language list(y, x) $ model :`data.frame': 6 obs. of 2 variables: ..$ y: int [1:6] 1 2 3 1 2 3 ..$ x: int [1:6] 1 2 3 4 5 6 ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 y ~ x .. .. ..- attr(*, "variables")= language list(y, x) .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2] "y" "x" .. .. .. .. ..$ : chr "x" .. .. ..- attr(*, "term.labels")= chr "x" .. .. ..- attr(*, "order")= int 1 .. .. ..- attr(*, "intercept")= int 1 .. .. ..- attr(*, "response")= int 1 .. .. ..- attr(*, ".Environment")=length 33 <environment> .. .. ..- attr(*, "predvars")= language list(y, x) - attr(*, "class")= chr "lm"
> str(Sum) # Sum の持つ内部情報の詳細を見る List of 11 $ call : language lm(formula = y ~ x, data = df1) $ terms :Classes 'terms', 'formula' length 3 y ~ x .. ..- attr(*, "variables")= language list(y, x) .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:2] "y" "x" .. .. .. ..$ : chr "x" .. ..- attr(*, "term.labels")= chr "x" .. ..- attr(*, "order")= int 1 .. ..- attr(*, "intercept")= int 1 .. ..- attr(*, "response")= int 1 .. ..- attr(*, ".Environment")=length 34 <environment> .. ..- attr(*, "predvars")= language list(y, x) $ residuals : Named num [1:6] -0.429 0.343 1.114 -1.114 -0.343 ... ..- attr(*, "names")= chr [1:6] "1" "2" "3" "4" ... $ coefficients : num [1:2, 1:4] 1.200 0.229 0.818 0.210 1.468 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:2] "(Intercept)" "x" .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)" $ aliased : Named logi [1:2] FALSE FALSE ..- attr(*, "names")= chr [1:2] "(Intercept)" "x" $ sigma : num 0.878 $ df : int [1:3] 2 4 2 $ r.squared : num 0.229 $ adj.r.squared: num 0.0357 $ fstatistic : Named num [1:3] 1.19 1.00 4.00 ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf" $ cov.unscaled : num [1:2, 1:2] 0.8667 -0.2000 -0.2000 0.0571 ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:2] "(Intercept)" "x" .. ..$ : chr [1:2] "(Intercept)" "x" - attr(*, "class")= chr "summary.lm"
# 当てはめオブジェクト fit に基づく二次的解析の例(分散分析) > aovfit <- aov(fit) > aovfit Call: aov(formula = fit) Terms: x Residuals Sum of Squares 0.9142857 3.0857143 Deg. of Freedom 1 4 Residual standard error: 0.87831 Estimated effects may be unbalanced > summary(aovfit) Df Sum Sq Mean Sq F value Pr(>F) x 1 0.91429 0.91429 1.1852 0.3375 Residuals 4 3.08571 0.77143 > str(aovfit) List of 12 $ coefficients : Named num [1:2] 1.200 0.229 ..- attr(*, "names")= chr [1:2] "(Intercept)" "x" $ residuals : Named num [1:6] -0.429 0.343 1.114 -1.114 -0.343 ... ..- attr(*, "names")= chr [1:6] "1" "2" "3" "4" ... $ effects : Named num [1:6] -4.89898 0.95618 1.25970 -0.87467 -0.00904 ... ..- attr(*, "names")= chr [1:6] "(Intercept)" "x" "" "" ... $ rank : int 2 $ fitted.values: Named num [1:6] 1.43 1.66 1.89 2.11 2.34 ... ..- attr(*, "names")= chr [1:6] "1" "2" "3" "4" ... $ assign : int [1:2] 0 1 $ qr :List of 5 ..$ qr : num [1:6, 1:2] -2.449 0.408 0.408 0.408 0.408 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:6] "1" "2" "3" "4" ... .. .. ..$ : chr [1:2] "(Intercept)" "x" .. ..- attr(*, "assign")= int [1:2] 0 1 ..$ qraux: num [1:2] 1.41 1.19 ..$ pivot: int [1:2] 1 2 ..$ tol : num 1e-07 ..$ rank : int 2 ..- attr(*, "class")= chr "qr" $ df.residual : int 4 $ xlevels : list() $ call : language aov(formula = fit) $ terms :Classes 'terms', 'formula' length 3 y ~ x .. ..- attr(*, "variables")= language list(y, x) .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:2] "y" "x" .. .. .. ..$ : chr "x" .. ..- attr(*, "term.labels")= chr "x" .. ..- attr(*, "order")= int 1 .. ..- attr(*, "intercept")= int 1 .. ..- attr(*, "response")= int 1 .. ..- attr(*, ".Environment")=length 35 <environment> .. ..- attr(*, "predvars")= language list(y, x) $ model :`data.frame': 6 obs. of 2 variables: ..$ y: int [1:6] 1 2 3 1 2 3 ..$ x: int [1:6] 1 2 3 4 5 6 ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 y ~ x .. .. ..- attr(*, "variables")= language list(y, x) .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1 .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2] "y" "x" .. .. .. .. ..$ : chr "x" .. .. ..- attr(*, "term.labels")= chr "x" .. .. ..- attr(*, "order")= int 1 .. .. ..- attr(*, "intercept")= int 1 .. .. ..- attr(*, "response")= int 1 .. .. ..- attr(*, ".Environment")=length 35 <environment> .. .. ..- attr(*, "predvars")= language list(y, x) - attr(*, "class")= chr [1:2] "aov" "lm"
lm (線形回帰)、aov (分散分析)、glm (一般化線形回帰) 等の線形モデル当てはめ関数では、目的変数と(複数の)説明変数・因子の組合せからなる当てはめモデル式をモデル公式(model formula)と呼ばれる簡略記法で表現する。基本的な記法は
y ~ ... | 左辺の目的変数を、右辺の説明変数の線形式で線形回帰 |
+ A | 説明変数 A を加える |
- A | 説明変数 A を除く |
+ 0 または - 1 | 切片(定数)項の除外 |
A:B | 二つの説明変数 A, B の交互作用項 (A と B の積からなる説明変数) |
A * B | A + B + A:Bと同じ |
(...)*(...) | 右括弧内の変数と左括弧内の変数の掛け合わせ(交互作用)すべての組合せを表す |
ただし A^2, A^2*B 等はそれぞれ A, A*B と解釈 | |
(A+B)^2 | (A+B)*(A+B) と同じ。A + B + A:B の意味 |
A %in% B | 今一意味がわからない? |
より一般に A:B:C や (A+B+C)^3 といった高次交互作用項も同様に表現される。 y を目的変数、A, B, C 等を説明変数・因子とすると
y ~ A | y を A と切片(定数)項で単回帰 |
y ~ A - 1 | 切片項のない、原点を通る単回帰 |
y ~ A + 0 | 同上 |
0 + y ~ A | 同上 |
y ~ A + B | y を A, B と切片(定数)項で重回帰 |
y ~ A + B - 1 | y を A, B で重回帰。切片(定数)項は含まない |
y ~ A + B + A:B | A,B とそれらの二次交互作用項で重回帰,y ~ A * Bと同じ |
y ~ (A+B)^2 | 同上 |
y ~ (A+B)^2 - 1 | A,B とそれらの二次交互作用項で重回帰。切片(定数)項は含まない |
y ~ (A+B+C)^2 | A,B,C とそれらのすべての二次交互作用項で重回帰 |
y ~ A + B + C + A:B + A:C + B:C | 同上 |
y ~ (A+B+C)^2 - B:C | y ~ A + B + C + A:B + A:C の意味 |
y ~ (A+B+C)^3 | A,B,C とそれらのすべての二次及び三次交互作用項で重回帰 |
y ~ A + A %in% B | y ~ A + A:B の意味 |
モデル式には説明変数の関数を使うことが出来る。
y ~ A + log(B) | y を A と B の対数値で説明 |
log(y) ~ exp(A) + log(B) | log(y) を exp(A) と log(B) で説明 |
y ~ A/B | y を A と B の比で説明 |
モデル式で特別の意味を持つオペレータ(例えば +,-,*,^)を、この意味の説明変数の変形と区別するため特殊記号 I() を用いる。I() の中では +,-,*,^ 等は数値演算子と解釈される。
y ~ A + I(B+C) | y を A と B+C (B と C の和) で説明 |
y ~ A + I(B*C) | y を A と B*C (B と C の積) で説明、A + B:C と同じ |
y ~ A + I(B^2) | y を A と B^2 (B の自乗値) で説明 |
非線形モデルの非線形最小自乗推定専用の関数で、独自の最適化アルゴリズム(二種類)を備える。クラス nls のオブジェクトを返す。
nls(formula, data = parent.frame(), start, control = nls.control(), algorithm = "default", trace = FALSE, subset, weights, na.action)
データフレーム DNase は成分 Run (順序つき因子, 二因子)、conc (数値)、density (数値) を持つ。
data( DNase ) DNase1 <- DNase[ DNase$Run == 1, ] # データの一部を取り出す ## 自動スタートモデルを使う。SSlogis はロジスティックモデル関数を作る R 関数 fm1DNase1 <- nls( density ~ SSlogis( log(conc), Asym, xmid, scal ), DNase1 ) summary( fm1DNase1 ) # 当てはめ結果の要約 Formula: density ~ SSlogis(log(conc), Asym, xmid, scal) # 非線形モデル公式 Parameters: # パラメータ推定値、標準誤差、t 値、p 値 Estimate Std. Error t value Pr(>|t|) Asym 2.34518 0.07815 30.01 2.17e-13 *** xmid 1.48309 0.08135 18.23 1.22e-10 *** scal 1.04146 0.03227 32.27 8.51e-14 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.01919 on 13 degrees of freedom Correlation of Parameter Estimates: 推定パラメータの相関値 Asym xmid xmid 0.9868 scal 0.9008 0.9063
## 条件付き線形性を使う fm2DNase1 <- nls( density ~ 1/(1 + exp(( xmid - log(conc) )/scal ) ), data = DNase1, start = list( xmid = 0, scal = 1 ), alg = "plinear", trace = TRUE ) 0.7139315 : 0.000000 1.000000 1.453853 # 当てはめ途中結果のレポート 0.1445295 : 1.640243 1.390186 2.461754 # 残差平方和と推定値 0.008302151 : 1.620899 1.054228 2.478388 0.004794192 : 1.485226 1.043709 2.347334 0.004789569 : 1.483130 1.041468 2.345218 0.004789569 : 1.483090 1.041455 2.345180 summary( fm2DNase1 ) # 当てはめ結果の要約 Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal)) # 非線形モデル公式 Parameters: # パラメータ推定値、標準誤差、t 値、p 値 Estimate Std. Error t value Pr(>|t|) xmid 1.48309 0.08135 18.23 1.22e-10 *** scal 1.04145 0.03227 32.27 8.51e-14 *** .lin 2.34518 0.07815 30.01 2.17e-13 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.01919 on 13 degrees of freedom Correlation of Parameter Estimates: 推定パラメータの相関値 xmid scal scal 0.9063 .lin 0.9868 0.9008
## 条件付き線形性を使わない fm3DNase1 <- nls( density ~ Asym/(1 + exp(( xmid - log(conc) )/scal ) ), data = DNase1, start = list( Asym = 3, xmid = 0, scal = 1 ), trace = TRUE ) 14.32279 : 3 0 1 # 当てはめ途中結果のレポート 0.4542698 : 2.1152456 0.8410193 1.2000640 # 残差平方和と推定値 0.05869602 : 2.446376 1.747516 1.189515 0.005663523 : 2.294087 1.412198 1.020463 0.004791528 : 2.341429 1.479688 1.040758 0.004789569 : 2.345135 1.483047 1.041439 0.004789569 : 2.345179 1.483089 1.041454 summary(fm3DNase1) # 当てはめ結果の要約 Formula: density ~ Asym/(1 + exp((xmid - log(conc))/scal)) # 非線形モデル公式 Parameters: # パラメータ推定値、標準誤差、t 値、p 値 Estimate Std. Error t value Pr(>|t|) Asym 2.34518 0.07815 30.01 2.17e-13 *** xmid 1.48309 0.08135 18.23 1.22e-10 *** scal 1.04145 0.03227 32.27 8.51e-14 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.01919 on 13 degrees of freedom Correlation of Parameter Estimates: 推定パラメータの相関値 Asym xmid xmid 0.9868 scal 0.9008 0.9063
データ Puromycin は成分 conc (数値), rate (数値), state (因子、二水準) を持つデータフレーム
data(Puromycin) Treated <- Puromycin[Puromycin$state == "treated", ] weighted.MM <- function(resp, conc, Vm, K) # 非線形モデルを与える関数の定義 { ## 目的: 白本の p.451 -- RHS にある Michaelis-Menten モデルに対する nls() の重み付き版 pred <- (Vm * conc)/(K + conc) (resp - pred) / sqrt(pred) # sqrt(pred) で割り重みとする } # Vm, K がモデルパラメータ、rate および conc はデータフレームの成分変数名 Pur.wt <- nls(~weighted.MM(rate, conc, Vm, K), data = Treated, start = list(Vm = 200, K = 0.1), trace = TRUE) 112.5978 : 200.0 0.1 17.33824 : 205.67588840 0.04692873 14.6097 : 206.33087396 0.05387279 14.59694 : 206.79883508 0.05457132 14.59690 : 206.83291286 0.05460917 14.59690 : 206.83468191 0.05461109 > summary(Pur.wt) Formula: 0 ~ weighted.MM(rate, conc, Vm, K) # 関数 weighted.MM の値を目的変数とする Parameters: Estimate Std. Error t value Pr(>|t|) Vm 2.068e+02 9.225e+00 22.421 7.00e-10 *** K 5.461e-02 7.979e-03 6.845 4.49e-05 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.208 on 10 degrees of freedom Correlation of Parameter Estimates: Vm K 0.7543
> set.seed(123) # 乱数種 > x <- rgamma(100, shape = 5, rate = 0.1) > fitdistr(x, "gamma") shape rate 6.48592894 0.13649147 (0.89425721) (0.01956555) > fitdistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.01) shape rate 2.156761461 0.010000000 (0.277607887) (0.001433778)
> set.seed(123) # 乱数種 > x2 <- rt(250, df = 9) # t 分布に従う乱数を250個発生 > fitdistr(x2, "t", df = 9) # 最尤推定による自由度 9 の t 分布の当てはめ m s # 二つのパラメータの推定値と推定標準偏差 -0.01080467 1.04402833 ( 0.07222206) ( 0.05433544) > fitdistr(x2, "t") # 自由度も未値パラメータとする m s df # 三つのパラメータの推定値と推定標準偏差 -0.009754149 1.006270714 6.632753559 ( 0.071474570) ( 0.077107965) ( 2.716093153)
> mydt <- function(x, m, s, df) dt((x - m)/s, df)/s # 密度関数を独自定義 > fitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0)) m s # 二つのパラメータの推定値と推定標準偏差 -0.01069635 1.04409435 ( 0.07222562) ( 0.05434249)
> set.seed(123) # 乱数種 > x3 <- rweibull(100, shape = 4, scale = 100) # ワイブル分布に従う乱数を100個発生 > fitdistr(x3, "weibull") # ワイブル分布の当てはめ shape scale # 二つのパラメータの推定値と推定標準偏差 4.0807404 99.9812022 ( 0.3127017) ( 2.5816389)
> set.seed(123) # 乱数種 > x4 <- rnegbin(500, mu = 5, theta = 4) # 負の二項分布に従う乱数500個発生 > fitdistr(x4, "Negative Binomial") # 負の二項分布を当てはめ size mu # 二つのパラメータの推定値と推定標準偏差 4.2178713 4.9439803 (0.5048242) (0.1465538) Warning messages: # 関連警告 1: NaNs produced in: dgamma(x, shape, scale, log) 2: NaNs produced in: dgamma(x, shape, scale, log) 3: NaNs produced in: dgamma(x, shape, scale, log) 4: bounds can only be used with method L-BFGS-B in: optim(start, mylogfn, x = x, hessian = TRUE, ...) 5: bounds can only be used with method L-BFGS-B in: optim(start, myfn, x = x, hessian = TRUE, ...) 6: NaNs produced in: dweibull(x, shape, scale, log) 7: NaNs produced in: dweibull(x, shape, scale, log)
被験者内要因に関して、Mauchlyの球面性検定、G-G補正、H-F補正済みp値を計算します。 ついでに、Ryan法で多重比較もします。
データ形式はaovに渡すものと同じ。 効果はformulaで、
response~S(subject)+B(a,b)+I(c,d)
とします。 Sのなかには被験者のラベル、Bは被験者間要因、Iは被験者内要因を指定。
rep.aov<-function(formula, data=NULL, comment=NULL,...) { Terms <- if (missing(data)) terms(formula, c("I", "S", "B")) else terms(formula, c("I", "S", "B"), data = data) indI <- attr(Terms, "specials")$I indB <- attr(Terms, "specials")$B indS <- attr(Terms, "specials")$S strI<-if(is.null(indI))NULL else attr(Terms,"variables")[[1+indI]] labelsI<-if(is.null(indI))NULL else sapply(2:length(strI),function(i)deparse(strI[[i]])) strB<-if(is.null(indB))NULL else attr(Terms,"variables")[[1+indB]] labelsB<-if(is.null(indB))NULL else sapply(2:length(strB),function(i)deparse(strB[[i]])) labelsS<-deparse(attr(Terms,"variables")[[1+indS]][[2]]) for(i in rev(labelsI))data<-data[order(data[,i]),] data<-data[order(data[,labelsS]),] for(i in rev(labelsB))data<-data[order(data[,i]),] levelsB<-lapply(labelsB,function(n){data[,n]}) names(levelsB)<-labelsB levelsI<-lapply(labelsI,function(n){data[,n]}) names(levelsI)<-labelsI levelsS<-data[,labelsS] resp<-deparse(formula[[2]]) nc<-prod(sapply(levelsI,function(i){length(levels(i))})) dmat<-matrix(data[,resp],ncol=nc,byrow=T) nr<-nrow(dmat) fm.lm<-NULL;ret.lm<-NULL if(is.null(indB)){ fm.lm<-dmat~1 ret.lm<-lm(fm.lm) }else{ lvB<-lapply(labelsB,function(i)levelsB[[i]][1+0:(nr-1)*nc]) names(lvB)<-labelsB local({ eval(parse(text=paste(names(lvB),lvB,sep="="))); fm.lm<<-as.formula(sprintf("dmat ~ %s", paste(labelsB, collapse=" * "))) ret.lm<<-lm(fm.lm) }) } lvI<-NULL if(!is.null(indI)){ lvI<-as.data.frame(unique(data[,labelsI])) colnames(lvI)<-labelsI;rownames(lvI)<-1:dim(lvI)[1] } ret.anova<-ret.mauchly<-ret.mcomp<-anvX<-anvM<-list() terms.anova<-terms(as.formula(sprintf("~ %s",paste(labelsI,collapse="*")))) lv.ta<-attr(terms.anova,"term.labels") od.ta<-attr(terms.anova,"order") ## execute aov lv.b<-paste(paste(c(labelsS,labelsB),collapse=":")) fm.aov<-as.formula(paste(resp,"~",paste(c(labelsI,labelsB),collapse="*"),"+", "Error(",paste(lv.b,paste(lv.b,lv.ta,sep=":", collapse=" + "),sep=" + "),")")) suppressWarnings(ret.aov<-aov(fm.aov,data)) ## mauchly's spherical test and G-G & H-F eps, p.val. for(i in 1:length(lv.ta)){ j<-lv.ta[i] if(od.ta[i]==1){ anvX[[j]]<-~1 anvM[[j]]<-as.formula(paste("~",j)) }else{ j2<-gsub(":","*",j) anvX[[j]]<-as.formula(paste("~",j2,"-",j)) anvM[[j]]<-as.formula(paste("~",j2)) } ret.anova[[j]]<-r<-anova(ret.lm,X=anvX[[j]],M=anvM[[j]],idata=lvI,test="Spherical") if(od.ta[i]==1){ for(k in which(r$Pr<0.05 && r$num>1)){ tmp.aov<-ret.aov[[paste(c(labelsS,labelsB,j),collapse=":")]] df<-summary(tmp.aov)[[1]]$Df;df<-df[length(df)] mse<-summary(tmp.aov)[[1]]$"Mean Sq";mse<-mse[length(mse)] dat.tmp<-data[order(data[,j]),] dat.tmp<-matrix(dat.tmp[,resp],ncol=length(levels(levelsI[[j]]))) ret.mcomp[[j]]<-pairwise.ryan(dat.tmp,mse,df) attr(ret.mcomp[[j]],"Effect")<-j } } suppressWarnings( ret.mauchly[[j]]<-mauchly.test(ret.lm,X=anvX[[j]],M=anvM[[j]],idata=lvI) ) } for(i in 1:length(ret.anova)){ hd<-attr(ret.anova[[i]],"heading") ## G-G and H-F epsiron attr(ret.anova[[i]],"eps")<-c(as.numeric(sub(".*: +","",hd[8])),as.numeric(sub(".*: +","",hd[9]))) ## name of within factor attr(ret.anova[[i]],"within")<-lv.ta[[i]] } ret<-list("anova"=ret.anova, "mauchly"=ret.mauchly, "mcomp"=ret.mcomp, "aov"=ret.aov) attr(ret,"between.factor")<-labelsB attr(ret,"within.factor")<-labelsI attr(ret,"subject.factor")<-labelsS attr(ret,"comment")<-comment class(ret)<-"rep.anova" return(ret) }
rep.anovaの結果をうまいこと出力します。 普通に、
print(rep.aov(...))
とすればいいです。
print.rep.anova<-function(x){ ## mauchly's spherical test ms<-eps<-NULL;for(m in x[[2]])ms<-rbind(ms,c(m$stat,m$p.val)) ## construct table tms1<-tms2<-c();tbl<-NULL for(r in x[[1]]){ eps<-rbind(eps,attr(r,"eps")) tms1<-c(tms1,attr(r,"within")) tr<-r[-dim(r)[1],] tms2<-c(tms2,sapply(paste(rownames(tr),attr(r,"within"),sep=":"),function(n)sub("\\(Intercept\\):","",n))) tbl<-rbind(tbl,r[-dim(r)[1],-1]) } mtest<-data.frame(cbind(ms,eps)) names(mtest)<-c("Mauchley's W"," p value"," G-G epsilon"," H-F epsilon") rownames(mtest)<-tms1 tbl<-data.frame(tbl) names(tbl)<-c("F stat."," DF^", " DF_", " Pr.(>F)", " G-G Pr.", " H-F Pr.") rownames(tbl)<-tms2 ## rename error term in aov table tbl.aov<-NULL tx<-summary(x[[4]]) for(i in 1:length(tx)){ for(j in 1:length(tx[[i]])){ ttx<-tx[[i]][[j]] if(dim(tx[[i]][[j]])[1]>0){ rownames(ttx)[rownames(ttx)=="Residuals"]<-sub("Error: (.*)","Error(\\1)",names(tx)[i]) tbl.aov<-rbind(tbl.aov,ttx) } } } ## reformat aov table and add G-G and H-F p.val gg<-hf<-c();rt<-sapply(strsplit(tms2,":"),function(s)paste(sort(s),collapse=":")) for(i in 1:nrow(tbl.aov)){ rn<-paste(sort(strsplit(sub(" *$","",rownames(tbl.aov)[i]),":")[[1]]),collapse=":") if(any(rn==rt)){gg<-c(gg,tbl[rn,5]);hf<-c(hf,tbl[rn,6])} else{gg<-c(gg,NA);hf<-c(hf,NA)} } tbl.aov<-cbind(tbl.aov,"G-G Pr."=gg," H-F Pr."=hf) ## digits... for(i in 1:4)mtest[,i]<-sprintf("%.4f",mtest[,i]) for(i in c(2,3))tbl.aov[,i]<-ifelse(is.na(tbl.aov[,i]),NA,sprintf("%.2f",tbl.aov[,i])) for(i in c(4,5,6,7))tbl.aov[,i]<-ifelse(is.na(tbl.aov[,i]),NA,sprintf("%.4f",tbl.aov[,i])) ## output cat("\n\nRepeated measured ANOVA\n\n") if(!is.null(cm<-attr(x,"comment")))cat(cm,"\n\n") cat("Between [",paste(attr(x,"between.factor"),collapse=" , "),"] ") cat("Within [",paste(attr(x,"within.factor"),collapse=" , "),"] ") cat("Subject [",attr(x,"subject.factor"),"]\n\n") cat("Mauchley's test of sphericity\n") cat("----------------------------------------------------------------\n") print(mtest) cat("\n") cat("Analysis of Variance Table\n") cat("----------------------------------------------------------------\n") print(as.matrix(tbl.aov),na.print="",quote=F,right=T) cat("\n") for(r in x[[3]])print(r,comment=paste("Main effect of : ",attr(r,"Effect"),"\n\n")) }
ライアン法による多重比較
pairwise.ryan<-function(x,MSe=NULL,dfe=NULL,alpha=0.05){ mx<-mean(x) mc<-apply(x,2,mean) ns<-dim(x)[1] nc<-dim(x)[2] names<-colnames(x) if(is.null(names))names<-1:nc if(is.null(dfe))dfe<-length(x)-ns-nc+1 if(is.null(MSe))MSe<-(sum((x-mx)^2)-sum(ns*(mc-mx)^2)-sum(nc*(apply(x,1,mean)-mx)^2))/dfe ret<-list() ret$method<-"Ryan's method" ret$data.name<-deparse(substitute(x)) ts<-matrix(nrow=nc-1,ncol=nc-1,dimnames=list(names[2:nc],names[1:(nc-1)])) ps<-matrix(nrow=nc-1,ncol=nc-1,dimnames=list(names[2:nc],names[1:(nc-1)])) aps<-matrix(nrow=nc-1,ncol=nc-1,dimnames=list(names[2:nc],names[1:(nc-1)])) table<-NULL for(i in 1:(nc-1)){ for(j in (i+1):nc){ d<-abs(mc[i]-mc[j]) t0<-d/(sqrt(MSe)*sqrt(2/ns)) r<-length(which(min(mc[i],mc[j])<=mc&mc<=max(mc[i],mc[j]))) ap<-2*alpha/(nc*(r-1)) p0<-2*(1-pt(t0,dfe)) ps[i,j-1]<-p0 aps[i,j-1]<-ap table<-rbind(table,data.frame(paste(names[i],"-",names[j]),r,ap,t0,p0,ifelse(p0<ap,"s.","n.s."))) } } table<-data.frame(table) names(table)<-c("pair","r","nominal level","t","p","sig.") rownames(table)<-1:dim(table)[1] ret$p.value<-ps ret$stat<-ts ret$adjusted.p.value<-aps ret$table<-table ret$means<-mc attr(ret,"class")<-"pairwise.ryan" return(ret) }
pairwise.ryanの結果をうまく出力します。
print.pairwise.ryan<-function(x,comment=NULL){ cat("\n\tPairwise comparisons using", x$method, "\n\n") if(!is.null(comment))cat(comment) cat("data: ", x$data.name, "[",paste(x$means,","),"]\n\n") tbl<-x$table tbl[,3]<-sprintf("%.4f",tbl[,3]) tbl[,4]<-sprintf("%.4f",tbl[,4]) tbl[,5]<-sprintf("%.4f",tbl[,5]) print(tbl) }
言語Rによる分散分析( http://mat.isc.chubu.ac.jp/R/tech.html )の例題を使わせてもらいました。 Anova4 on the web ( http://www.hju.ac.jp/~kiriki/anova4/ )の結果と一致することは確認してあります。 但し、G-G補正、H-F補正済みp値については未確認なので、どなたかSPSSを使って確認してもらえると助かります。
> if(1){ + RB <- data.frame( + a = factor(c(rep(1,8), rep(2,8), rep(3,8), rep(4,8))), + s = factor(rep(1:8, 4)), + result = c(9,7,8,8,12,11,8,13, 6,5,6,3,6,7,10,9, + 10,13,8,13,12,14,14,16, 9,11,13,14,16,12,15,14)) + (rep.aov(result ~ I(a) + S(s), RB)) + } Repeated measured ANOVA Between [ ] Within [ a ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon a 0.9751 0.9996 0.9832 1.7774 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. Error(s) 7 77.00 11.00 a 3 217.50 72.50 21.4437 0.0000 0.0000 0.0000 Error(s:a) 21 71.00 3.38 Pairwise comparisons using Ryan's method Main effect of : a data: dat.tmp [ 9.5 , 6.5 , 12.5 , 13 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0250 3.2631 0.0037 s. 2 1 - 3 2 0.0250 3.2631 0.0037 s. 3 1 - 4 3 0.0125 3.8070 0.0010 s. 4 2 - 3 3 0.0125 6.5262 0.0000 s. 5 2 - 4 4 0.0083 7.0701 0.0000 s. 6 3 - 4 2 0.0250 0.5439 0.5923 n.s.
> if(1){ + RBFpq <- data.frame( + a = factor(c(rep(1, 20), rep(2, 20))), + b = factor(rep(c(rep(1, 5), rep(2, 5), rep(3, 5), rep(4, 5)), 2)), + s = factor(rep(1:5, 8)), + result = c(3,3,1,3,5, 4,3,4,5,7, 6,6,6,4,8, 5,7,8,7,9, + 3,5,2,4,6, 2,6,3,6,4, 3,2,3,6,5, 2,3,3,4,6)) + (rep.aov(result ~ I(a,b) + S(s), RBFpq)) + } Repeated measured ANOVA Between [ ] Within [ a , b ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon a 1.0000 NaN 1.0000 1.0000 b 0.6817 0.9618 0.7934 2.0373 a:b 0.2482 0.6036 0.6692 1.3449 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. Error(s) 4 38.15 9.54 a 1 16.90 16.90 8.0958 0.0466 0.0466 0.0466 Error(s:a) 4 8.35 2.09 b 3 19.70 6.57 6.0383 0.0095 0.0173 0.0095 Error(s:b) 12 13.05 1.09 a:b 3 30.50 10.17 7.0725 0.0054 0.0169 0.0054 Error(s:a:b) 12 17.25 1.44 Pairwise comparisons using Ryan's method Main effect of : b data: dat.tmp [ 3.5 , 4.4 , 4.9 , 5.4 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0250 1.9298 0.0776 n.s. 2 1 - 3 3 0.0125 3.0019 0.0110 s. 3 1 - 4 4 0.0083 4.0740 0.0015 s. 4 2 - 3 2 0.0250 1.0721 0.3048 n.s. 5 2 - 4 3 0.0125 2.1442 0.0532 n.s. 6 3 - 4 2 0.0250 1.0721 0.3048 n.s.
> if(1){ + SPFp.q <- data.frame( + a = factor(c(rep(1,20), rep(2,20))), + b = factor(rep(c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)),2)), + s = factor(c(rep(1:5, 4), rep(6:10, 4))), + result = c(3,3,1,3,5, 4,3,4,5,7, 6,6,6,4,8, 5,7,8,7,9, + 3,5,2,4,6, 2,6,3,6,4, 3,2,3,6,5, 2,3,3,4,6)) + rep.aov(result ~ I(b) + B(a) + S(s), SPFp.q) + } Repeated measured ANOVA Between [ a ] Within [ b ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon b 0.8428 0.9502 0.8896 1.3767 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. a 1 16.90 16.90 2.9075 0.1266 Error(s:a) 8 46.50 5.81 b 3 19.70 6.57 5.2013 0.0066 0.0092 0.0066 b:a 3 30.50 10.17 8.0528 0.0007 0.0012 0.0007 Error(s:a:b) 24 30.30 1.26 Pairwise comparisons using Ryan's method Main effect of : b data: dat.tmp [ 3.5 , 4.4 , 4.9 , 5.4 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0250 1.7911 0.0859 n.s. 2 1 - 3 3 0.0125 2.7861 0.0103 s. 3 1 - 4 4 0.0083 3.7811 0.0009 s. 4 2 - 3 2 0.0250 0.9950 0.3296 n.s. 5 2 - 4 3 0.0125 1.9901 0.0581 n.s. 6 3 - 4 2 0.0250 0.9950 0.3296 n.s.
> if(1){ + SPFpq.r <- data.frame( + a = factor(c(rep(1, 24), rep(2, 24))), + b = factor(rep(c(rep(1, 12), rep(2, 12)), 2)), + c = factor(rep(c(rep(1, 4), rep(2, 4), rep(3, 4)), 4)), + s = factor(c(rep(1:4, 3), rep(5:8, 3), rep(9:12, 3), rep(13:16, 3))), + result = c(2,6,5,7, 5,7,9,9, 9,10,13,14, + 6,6,8,10, 3,6,5,8, 6,7,5,6, + 1,2,5,2, 2,1,4,5, 1,5,3,5, + 5,3,4,6, 5,6,6,7, 5,5,9,7)) + (rep.aov(result ~ S(s)+I(c)+B(a,b), SPFpq.r)) + } Repeated measured ANOVA Between [ a , b ] Within [ c ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon c 0.8178 0.3309 0.8459 0.9698 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. a 1 96.33 96.33 15.6216 0.0019 b 1 3.00 3.00 0.4865 0.4988 a:b 1 56.33 56.33 9.1351 0.0106 Error(s:a:b) 12 74.00 6.17 c 2 33.50 16.75 9.5714 0.0009 0.0018 0.0010 c:a 2 6.17 3.08 1.7619 0.1932 0.1995 0.1945 c:b 2 24.50 12.25 7.0000 0.0040 0.0067 0.0044 c:a:b 2 41.17 20.58 11.7619 0.0003 0.0007 0.0003 Error(s:a:b:c) 24 42.00 1.75 Pairwise comparisons using Ryan's method Main effect of : c data: dat.tmp [ 4.875 , 5.5 , 6.875 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0333 1.3363 0.1940 n.s. 2 1 - 3 3 0.0167 4.2762 0.0003 s. 3 2 - 3 2 0.0333 2.9399 0.0072 s.
> if(1){ + SPFp.qr <- data.frame( + a = factor(c(rep(1, 24), rep(2, 24))), + b = factor(rep(c(rep(1, 12), rep(2, 12)), 2)), + c = factor(rep(c(rep(1, 4), rep(2, 4), rep(3, 4)), 4)), + s = factor(c(rep(1:4, 6), rep(5:8, 6))), + result = c(2,6,5,7, 5,7,9,9, 9,10,13,14, + 6,6,8,10, 3,6,5,8, 6,7,5,6, + 1,2,5,2, 2,1,4,5, 1,5,3,5, + 5,3,4,6, 5,6,6,7, 5,5,9,7)) + (r<-rep.aov(result ~ S(s)+I(b,c)+B(a), SPFp.qr)) + } Repeated measured ANOVA Between [ a ] Within [ b , c ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon b 1.0000 NaN 1.0000 1.0000 c 0.6719 0.3700 0.7529 0.9503 b:c 0.8354 0.6378 0.8586 1.1699 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. a 1 96.33 96.33 8.7576 0.0253 Error(s:a) 6 66.00 11.00 b 1 3.00 3.00 2.2500 0.1843 0.1843 0.1843 b:a 1 56.33 56.33 42.2500 0.0006 0.0006 0.0006 Error(s:a:b) 6 8.00 1.33 c 2 33.50 16.75 25.1250 0.0001 0.0003 0.0001 c:a 2 6.17 3.08 4.6250 0.0324 0.0489 0.0352 Error(s:a:c) 12 8.00 0.67 b:c 2 24.50 12.25 4.3235 0.0385 0.0477 0.0385 b:c:a 2 41.17 20.58 7.2647 0.0086 0.0128 0.0086 Error(s:a:b:c) 12 34.00 2.83 Pairwise comparisons using Ryan's method Main effect of : c data: dat.tmp [ 4.875 , 5.5 , 6.875 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0333 2.1651 0.0512 n.s. 2 1 - 3 3 0.0167 6.9282 0.0000 s. 3 2 - 3 2 0.0333 4.7631 0.0005 s.
> if(1){ + RBFpqr <- data.frame( + a = factor(c(rep(1, 24), rep(2, 24))), + b = factor(rep(c(rep(1, 12), rep(2, 12)), 2)), + c = factor(rep(c(rep(1, 4), rep(2, 4), rep(3, 4)), 4)), + s = factor(rep(1:4, 12)), + result = c(2,6,5,7, 5,7,9,9, 9,10,13,14, + 6,6,8,10, 3,6,5,8, 6,7,5,6, + 1,2,5,2, 2,1,4,5, 1,5,3,5, + 5,3,4,6, 5,6,6,7, 5,5,9,7)) + (rep.aov(result ~ S(s)+I(a,b,c), RBFpqr)) + } Repeated measured ANOVA Between [ ] Within [ a , b , c ] Subject [ s ] Mauchley's test of sphericity ---------------------------------------------------------------- Mauchley's W p value G-G epsilon H-F epsilon a 1.0000 NaN 1.0000 1.0000 b 1.0000 NaN 1.0000 1.0000 c 0.8700 0.8700 0.8850 2.0650 a:b 1.0000 NaN 1.0000 1.0000 a:c 0.5422 0.5422 0.6860 1.0710 b:c 0.9838 0.9838 0.9840 2.8450 a:b:c 0.4831 0.4831 0.6593 0.9736 Analysis of Variance Table ---------------------------------------------------------------- Df Sum Sq Mean Sq F value Pr(>F) G-G Pr. H-F Pr. Error(s) 3 60.33 20.11 a 1 96.33 96.33 51.0000 0.0057 0.0057 0.0057 Error(s:a) 3 5.67 1.89 b 1 3.00 3.00 1.4211 0.3189 0.3189 0.3189 Error(s:b) 3 6.33 2.11 c 2 33.50 16.75 60.3000 0.0001 0.0002 0.0001 Error(s:c) 6 1.67 0.28 a:b 1 56.33 56.33 101.4000 0.0021 0.0021 0.0021 Error(s:a:b) 3 1.67 0.56 a:c 2 6.17 3.08 2.9211 0.1301 0.1625 0.1301 Error(s:a:c) 6 6.33 1.06 b:c 2 24.50 12.25 5.3780 0.0459 0.0471 0.0459 Error(s:b:c) 6 13.67 2.28 a:b:c 2 41.17 20.58 6.0738 0.0361 0.0670 0.0379 Error(s:a:b:c) 6 20.33 3.39 Pairwise comparisons using Ryan's method Main effect of : c data: dat.tmp [ 4.875 , 5.5 , 6.875 , ] pair r nominal level t p sig. 1 1 - 2 2 0.0333 3.3541 0.0153 s. 2 1 - 3 3 0.0167 10.7331 0.0000 s. 3 2 - 3 2 0.0333 7.3790 0.0003 s.