R 将多行合并为单行
我在R中的数据帧有一些问题。 我的数据框如下所示:R 将多行合并为单行,r,rows,reshape,cbind,R,Rows,Reshape,Cbind,我在R中的数据帧有一些问题。 我的数据框如下所示: ID TIME DAY URL_NAME VALUE TIME_SPEND 1 12:15 Monday HOME 4 30 1 13:15 Tuesday CUSTOMERS 5 21 1 15:00 Thursday PLANTS 8 8 1 16:21 Frida
ID TIME DAY URL_NAME VALUE TIME_SPEND
1 12:15 Monday HOME 4 30
1 13:15 Tuesday CUSTOMERS 5 21
1 15:00 Thursday PLANTS 8 8
1 16:21 Friday MANAGEMENT 1 6
....
ID TIME DAY URL_NAME VALUE TIME_SPEND TIME1 DAY1 URL_NAME1 VALUE1 TIME_SPEND1 TIME2 DAY2 URL_NAME2 VALUE2 TIME_SPEND2 TIME3 DAY3 URL_NAME3 VALUE3 TIME_SPEND3
1 12:15 Monday HOME 4 30 13:15 Tuesday CUSTOMERS 5 21 15:00 Thursday PLANTS 8 8 16:21 Friday MANAGEMENT 1 6
因此,我想将包含相同“ID”的行写入一行。
看起来像这样:
ID TIME DAY URL_NAME VALUE TIME_SPEND
1 12:15 Monday HOME 4 30
1 13:15 Tuesday CUSTOMERS 5 21
1 15:00 Thursday PLANTS 8 8
1 16:21 Friday MANAGEMENT 1 6
....
ID TIME DAY URL_NAME VALUE TIME_SPEND TIME1 DAY1 URL_NAME1 VALUE1 TIME_SPEND1 TIME2 DAY2 URL_NAME2 VALUE2 TIME_SPEND2 TIME3 DAY3 URL_NAME3 VALUE3 TIME_SPEND3
1 12:15 Monday HOME 4 30 13:15 Tuesday CUSTOMERS 5 21 15:00 Thursday PLANTS 8 8 16:21 Friday MANAGEMENT 1 6
我的第二个问题是,大约有1.500.00个唯一ID,我希望对整个数据帧执行此操作
我没有找到任何适合我的问题的解决办法。
我很乐意提供任何解决方案或链接来处理我的问题。我建议使用“data.table”软件包中的
dcast
,它允许您一次重塑多个度量变量
例如:
library(data.table)
as.data.table(mydf)[, dcast(.SD, ID ~ rowid(ID), value.var = names(mydf)[-1])]
# ID TIME_1 TIME_2 TIME_3 DAY_1 DAY_2 DAY_3 URL_NAME_1 URL_NAME_2 URL_NAME_3 VALUE_1 VALUE_2
# 1: 1 12:15 13:15 15:00 Monday Tuesday Thursday HOME CUSTOMERS PLANTS 4 5
# 2: 2 14:15 10:19 NA Tuesday Monday NA CUSTOMERS CUSTOMERS NA 2 9
# VALUE_3 TIME_SPEND_1 TIME_SPEND_2 TIME_SPEND_3
# 1: 8 30 19 40
# 2: NA 21 8 NA
以下是使用的示例数据:
mydf <- data.frame(
ID = c(1, 1, 1, 2, 2),
TIME = c("12:15", "13:15", "15:00", "14:15", "10:19"),
DAY = c("Monday", "Tuesday", "Thursday", "Tuesday", "Monday"),
URL_NAME = c("HOME", "CUSTOMERS", "PLANTS", "CUSTOMERS", "CUSTOMERS"),
VALUE = c(4, 5, 8, 2, 9),
TIME_SPEND = c(30, 19, 40, 21, 8)
)
mydf
# ID TIME DAY URL_NAME VALUE TIME_SPEND
# 1 1 12:15 Monday HOME 4 30
# 2 1 13:15 Tuesday CUSTOMERS 5 19
# 3 1 15:00 Thursday PLANTS 8 40
# 4 2 14:15 Tuesday CUSTOMERS 2 21
# 5 2 10:19 Monday CUSTOMERS 9 8
mydf试试这个tidyverse
解决方案,它将产生接近您想要的输出。您可以按时间
分组,然后创建一个顺序id来标识未来的列。在此之后,将形状改为长(pivot_longer())将变量名与id组合,然后将形状改为宽(pivot_longer())。这是我使用自己的数据集的代码
df1 <- data.frame(Components = c(rep("ABC",5),rep("BCD",5)),
Size = c(sample(1:100,5),sample(45:100,5)),
Age = c(sample(1:100,5),sample(45:100,5)))
输出如下所示:
# A tibble: 2 x 11
# Groups: Components [2]
Components Size.1 Age.1 Size.2 Age.2 Size.3 Age.3 Size.4 Age.4 Size.5 Age.5
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 ABC 23 94 52 89 15 25 76 38 33 99
2 BCD 59 62 55 81 81 61 80 83 97 68
# A tibble: 2 x 11
# Components Size.1 Age.1 Size.2 Age.2 Size.3 Age.3 Size.4 Age.4 Size.5 Age.5
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 ABC 11 16 79 57 70 2 80 6 91 24
#2 BCD 67 81 63 77 48 73 52 100 49 76
#一个tible:2 x 11
#组件尺寸。1个年龄。1个尺寸。2个年龄。2个尺寸。3个年龄。3个尺寸。4个年龄。4个尺寸。5个年龄。5
#
#1 ABC 11 16 79 57 70 2 80 6 91 24
#2 BCD 67 81 63 77 48 73 52 100 49 76
这两种解决方案都是@Duck和@akrun的创意。非常感谢他们。您的问题似乎与此答案非常相似-您确定这是必要的/正确的方法吗?在此之后,您想对数据做什么?最后,我想分析访问URL的导航链。我不知道有哪种解决方案能够成功地以通用格式分析这些数据。