R で大規模データを処理するの編集

*R で大規模データを処理する [#n4d25ed1]
R では大規模なデータを扱うのは無理で、そのようなデータは SAS や SPSS、またはなんらかのライブラリを使うということは常識なのか？~
以下に大規模データを処理する方法を書いていくつもり．．．

*RDBMS を使う [#d38ae707]
-[[surveyNG:http://www.user2007.org/program/presentations/lumley.pdf]]
*並列処理を使う [#x3a87573]
大規模な計算をする中で、並列処理が可能な計算を行う場合、以下のようなライブラリを利用することが可能。~
基本的には「分割」→「演算」→「集約」の手順で実装してある。~
もちろんマルチコアかクラスター環境じゃないとあまり意味が無いのだが…。

とりあえず、列挙。使い方はしばし待たれよ…。

-snow:(Simple Network On Workstationの略)並列計算クラスターを作成するためのライブラリ。
-foreach:バックエンドに依存しない形で並列処理をするためのライブラリ。
-doSNOW:foreach内部で利用できるsnow。特にsnowライブラリはいらないが、ちょっとsnowとは使い方が違う。

*大規模なメモリーを扱えるパッケージ [#a7625219]
**汎用 [#uf7bfc22]
-bigmemory: 「big.matrix」形式のデータを扱うための基本ライブラリ。
--biganalytics: 「big.matrix」形式に対応した基本的な分析ツール。applyやbiglm(要biglmライブラリ）などが利用できる。
--bigtabulate: 「big.matrix」形式に対応した基本的な集計ツール。「big.matrix」版table「bigtable」などが利用できる。

-memisc: データ管理、シミュレーションと評価の表示
--[[Management and Analysis of Large Survey Data Sets Using the 'memisc' Package :http://www.statistik.uni-dortmund.de/useR-2008//abstracts/Elff.pdf]]

-filehash
--Peng RD: Interacting with data using the filehash package. R News 6(4):19-24, 2006.

-ff: memory-efficient storage of large atomic vectors and arrays on disk and fast access functions
--[[The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files:http://www.user2007.org/program/presentations/adler.pdf]]
--[[Reading huge files into R:http://www.r-bloggers.com/reading-huge-files-into-r/]] sqldf, ff
--[[If you are into large data and work a lot with package ff:http://www.bnosac.be/index.php/blog/22-if-you-are-into-large-data-and-work-a-lot-package-ff]]
--[[Opening Large CSV Files in R:http://stathack.wordpress.com/2012/12/27/opening-large-csv-files-in-r/]]

-biglm 大規模データ用回帰モデル

-R.huge: 大規模データへのアクセスメソッド

-rindex: Indexing for R

-DatABEL: HDD 上にバイナリ形式で蓄積された巨大行列へのファイルベースのアクセス
//file-based access to large matrices stored on HDD in binary format

*特定用途向け [#x89f00d8]
-biglm

*[[R と Big Data 処理]] [#y1ae4e76]

*参考リンク [#l49d6711]

-[[Quickly reading very large tables as dataframes in R:http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r]] sqldf

タイムスタンプを変更しない