R:按索引合并文本文档
我有一个如下所示的数据框:R:按索引合并文本文档,r,text-mining,R,Text Mining,我有一个如下所示的数据框: _________________id ________________text______ 1 | 7821 | "some text here" 2 | 7821 | "here as well" 3 | 7821 | "and here" 4 | 567 | "etcetera" 5 | 567
_________________id ________________text______
1 | 7821 | "some text here"
2 | 7821 | "here as well"
3 | 7821 | "and here"
4 | 567 | "etcetera"
5 | 567 | "more text"
6 | 231 | "other text"
我想按ID对文本进行分组,这样我可以运行聚类算法:
________________id___________________text______
1 | 7821 | "some text here here as well and here"
2 | 567 | "etcetera more text"
3 | 231 | "other text"
有没有办法做到这一点?我正在从数据库表导入数据,而且我有很多数据,因此无法手动执行 您实际上是在寻找
aggregate
,而不是merge
,应该有很多示例来演示不同的聚合选项。下面是最基本和最直接的方法,使用公式方法指定要聚合的列
这是您的数据的副本和粘贴形式
mydata <- structure(list(id = c(7821L, 7821L, 7821L, 567L, 567L, 231L),
text = structure(c(6L, 3L, 1L, 2L, 4L, 5L), .Label = c("and here",
"etcetera", "here as well", "more text", "other text", "some text here"
), class = "factor")), .Names = c("id", "text"), class = "data.frame",
row.names = c(NA, -6L))
当然,还有data.table
,它的语法非常紧凑(速度也非常快):
>库(data.table)
>DT DT[,粘贴(文本,折叠=),by=“id”]
id V1
1:7821这里和这里都有一些文字
2:567等更多文本
3:231其他文本
谢谢,它工作得很好!我会尽快(在4分钟内)接受你的回答@Arun,完全同意,但是当你遇到这样的表格时,这里有一个技巧:复制并粘贴除第一行以外的所有内容,并使用read.table
和sep=“|”
和strip.white=TRUE
aggregate(text ~ id, mydata, paste, collapse = " ")
# id text
# 1 231 other text
# 2 567 etcetera more text
# 3 7821 some text here here as well and here
> library(data.table)
> DT <- data.table(mydata)
> DT[, paste(text, collapse = " "), by = "id"]
id V1
1: 7821 some text here here as well and here
2: 567 etcetera more text
3: 231 other text