对行组应用data.frame-consuming函数_R

对行组应用data.frame-consuming函数

对行组应用data.frame-consuming函数,r,R,例如，假设我有一些数据.framedf： df <- read.table(text = " P Q R c 1 10 a 1 0 a 2 0 b 2 0 b 1 10 c 2 10 b 1 0 a 2 10 ", stringsAsFactors = FALSE, header=T) …那么我的解决方案是： set.seed(0) sapply(unique(df$P), functio

例如，假设我有一些

数据.frame

df

：

df <- read.table(text = "
P    Q    R
c    1   10
a    1    0
a    2    0
b    2    0
b    1   10
c    2   10
b    1    0
a    2   10
",
stringsAsFactors = FALSE,
header=T)

…那么我的解决方案是：

set.seed(0)
sapply(unique(df$P), function (value) foo(df[df$P == value, ]),
       simplify = FALSE)
## $c
##   P Q  R
## 6 c 2 10
## 
## $a
##   P Q R
## 2 a 1 0
## 
## $b
##   P Q  R
## 5 b 1 10

set.seed(0)
for (value in unique(df$P)) foo(df[df$P == value, ])
## 'data.frame':    1 obs. of  3 variables:
##  $ P: chr "c"
##  $ Q: int 2
##  $ R: int 10
## 'data.frame':    1 obs. of  3 variables:
##  $ P: chr "a"
##  $ Q: int 1
##  $ R: int 0
## 'data.frame':    1 obs. of  3 variables:
##  $ P: chr "b"
##  Q: int 1
##  R: int 10

对于后一种情况（

foo

因其副作用而被调用），假设

foo

为：

## returns a one-row data.frame corresponding to a random row of
## dataframe
## NB: this is *just an example* for the sake of this question
foo <- function (dataframe) {
    dataframe[sample(nrow(dataframe), 1), ]
}

## prints to stdout a one-row data.frame corresponding to a random
## row of dataframe
## NB: this is *just an example* for the sake of this question
foo <- function (dataframe) {
    cat(str(dataframe[sample(nrow(dataframe), 1), ]))
}

您可以通过使用函数

实现这两个用例。但是，为了复制结果，我们将函数更改为返回或输出组的最后一行，而不是随机选择的行。这是必要的，因为组中的行顺序是由

修改的。在实际用例中，这种顺序应该无关紧要。这只很重要，因为您的结果取决于随机数生成器对分组行进行选择
在您的第一个用例中：

foo <- function (dataframe) { dataframe[nrow(dataframe), ] } out1 <- sapply(unique(df$P), function (value) foo(df[df$P == value, ]), simplify = FALSE)
我们可以使用
by
获得相同的结果，它返回类
by
的对象，该类对象是
列表
：

str(out1) ## this displays the structure of the out1 object ##List of 3 ## $ c:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "c" ## ..$ Q: int 2 ## ..$ R: int 10 ## $ a:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "a" ## ..$ Q: int 2 ## ..$ R: int 10 ## $ b:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "b" ## ..$ Q: int 1 ## ..$ R: int 0

by.out1 <- with(df, by(df, P, foo)) str(by.out1) ##List of 3 ## $ a:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "a" ## ..$ Q: int 2 ## ..$ R: int 10 ## $ b:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "b" ## ..$ Q: int 1 ## ..$ R: int 0 ## $ c:'data.frame': 1 obs. of 3 variables: ## ..$ P: chr "c" ## ..$ Q: int 2 ## ..$ R: int 10 ## - attr(*, "dim")= int 3 ## - attr(*, "dimnames")=List of 1 ## ..$ P: chr [1:3] "a" "b" "c" ## - attr(*, "call")= language by.data.frame(data = df, INDICES = P, FUN = foo) ## - attr(*, "class")= chr "by"
同样，我们可以通过，使用
获得相同的结果： with(df, by(df, P, foo)) ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "a" ## $ Q: int 2 ## $ R: int 10 ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "b" ## $ Q: int 1 ## $ R: int 0 ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "c" ## $ Q: int 2 ## $ R: int 10 功能by 位于base R包中。正如Dave2e所提到的，还有许多其他软件包具有类似的数据操作功能。它们中的一些提供了更多的语法糖以便于使用，而另一些提供了更好的优化，或者两者兼而有之。其中一些是：plyr 、dplyr 和数据表。我让你来研究一下。你看过by 的函数了吗？@aichao:by 看起来像我要找的。是否愿意将您的评论作为答案发表？我还建议您查看dplyr软件包和groupby函数。 with(df, by(df, P, foo)) ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "a" ## $ Q: int 2 ## $ R: int 10 ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "b" ## $ Q: int 1 ## $ R: int 0 ##'data.frame': 1 obs. of 3 variables: ## $ P: chr "c" ## $ Q: int 2 ## $ R: int 10