R Dataframe：按组聚合列内、跨行的字符串_R_String_Dataframe_Aggregate Functions_String Concatenation

R Dataframe：按组聚合列内、跨行的字符串

r string dataframe

R Dataframe：按组聚合列内、跨行的字符串,r,string,dataframe,aggregate-functions,string-concatenation,R,String,Dataframe,Aggregate Functions,String Concatenation,对于一个特殊的问题，我有一个似乎效率很低的解决方案。我有文本数据，由于各种原因，这些数据以随机间隔在数据帧的行之间断开。然而，基于数据帧中其他变量的唯一组合，已知的某些子集属于一起。例如，请参见演示结构和我的初始解决方案的MWE： # Data df <- read.table(text="page passage person index text 1 123 A 1 hello 1 123 A 2 my 1 123 A 3 name 1 1

对于一个特殊的问题，我有一个似乎效率很低的解决方案。我有文本数据，由于各种原因，这些数据以随机间隔在数据帧的行之间断开。然而，基于数据帧中其他变量的唯一组合，已知的某些子集属于一起。例如，请参见演示结构和我的初始解决方案的MWE：

# Data
df <- read.table(text="page passage  person index text
1  123   A   1 hello      
1  123   A   2 my
1  123   A   3 name
1  123   A   4 is
1  123   A   5 guy
1  124   B   1 well
1  124   B   2 hello
1  124   B   3 guy",header=T,stringsAsFactors=F)

master<-data.frame()
for (i in 123:max(df$passage)) {
  print(paste0('passage ',i))
  tempset <- df[df$passage==i,]
  concat<-''
  for (j in 1:nrow(tempset)) {
    print(paste0('index ',j))
    concat<-paste(concat, tempset$text[j])
  }
  tempdf<-data.frame(tempset$page[1],tempset$passage[1], tempset$person[1], concat, stringsAsFactors = FALSE)
  master<-rbind(master, tempdf)
  rm(concat, tempset, tempdf)
}
master
> master
  tempset.page.1. tempset.passage.1. tempset.person.1.                concat
1               1                123                 A  hello my name is guy
2               1                124                 B        well hello guy

#数据
df数据。表这里有一种方法：
require(data.table)
DT <- data.table(df)

DT[,.(concat=paste0(text,collapse=" ")),by=.(page,passage,person)]
#    page passage person               concat
# 1:    1     123      A hello my name is guy
# 2:    1     124      B       well hello guy


基本R一种方法是：
df$concat <- with(df,ave(text,passage,FUN=function(x)paste0(x,collapse=" ")))
unique(df[,which(names(df)%in%c("page","passage","person","concat"))])
#   page passage person               concat
# 1    1     123      A hello my name is guy
# 6    1     124      B       well hello guy

df$concat这里有两种方法：
基本R
aggregate(
    text ~ page + passage + person, 
    data=df, 
    FUN=paste, collapse=' '
)

dplyr
library(dplyr)
df %>% 
    group_by_(~page, ~passage, ~person) %>%
    summarize_(text=~paste(text, collapse=' '))

不需要选择
。你在做什么？类似地，对于聚合
，不需要对df
进行子集data=df
似乎有效。谢谢，你说得对。查看vignette（'nse'，package='dplyr'）我之所以选择这个答案，是因为它稍微简单一点，并且使用了base——尽管所有其他方法都同样有效。谢谢大家！回答得好，+1。但是有一件事：为什么不让data.table答案像其他答案一样有一个显式标记的列呢？
library(dplyr)
df %>% 
    group_by_(~page, ~passage, ~person) %>%
    summarize_(text=~paste(text, collapse=' '))