R 在数据帧中添加特定行
我试图将数据帧的特定行添加到一起 除了使用grepl查找行,然后再将它们定位到底部,我不确定是否有更好的方法 这是我的输入:R 在数据帧中添加特定行,r,list,dataframe,structure,R,List,Dataframe,Structure,我试图将数据帧的特定行添加到一起 除了使用grepl查找行,然后再将它们定位到底部,我不确定是否有更好的方法 这是我的输入: input = structure(list( V1 = c("Sales", "Sales", "Sales", "Sales", "Sales","Sales"), V2 = c("Johnny", "Meg", "Fred", "Johnny", "Meg", "Fred"), V3 = c("Australia", "Australia", "Australia"
input = structure(list(
V1 = c("Sales", "Sales", "Sales", "Sales", "Sales","Sales"),
V2 = c("Johnny", "Meg", "Fred", "Johnny", "Meg", "Fred"),
V3 = c("Australia", "Australia", "Australia", "NZ", "NZ","NZ"),
V4 = c(154L, 1898L, 175L, 1235L, 23L, 255L)), row.names = c(NA,6L),
class = "data.frame")
这是我的预期输出:
structure(list(
V1 = c("Sales", "Sales", "Sales", "Sales", "Sales",
"Sales", "Sales", "Sales", "Sales", "Sales", "Sales", "Sales"),
V2 = c("Johnny", "Meg", "Fred", "Johnny", "Meg", "Fred", "Johnny + Fred",
"Meg + Fred", "Johnny + Meg + Fred", "Johnny + Fred", "Meg + Fred",
"Johnny + Meg + Fred"),
V3 = c("Australia", "Australia", "Australia", "NZ",
"NZ", "NZ", "Australia", "Australia", "Australia", "NZ", "NZ", "NZ"),
V4 = c(154L, 1898L, 175L, 1235L, 23L, 255L, 329L, 2073L, 2227L, 1490L, 278L, 1513L)),
class = "data.frame", row.names = c(NA, -12L)
)
我认为有一个更好的方法来添加这些行,过滤,然后添加,然后加入,等等
有人能给我指出我应该寻找的正确方向吗?我使用
combn
数据输入部分
解决方案
结果
结果(comb2)
#一个tibble:6x4
V3 V1 V2 V4
1澳大利亚销售约翰尼+梅格2052
2新西兰销售约翰尼+梅格1258
3澳大利亚销售约翰尼+弗雷德329
4新西兰销售约翰尼+弗雷德1490
5澳大利亚销售Meg+Fred 2073
6新西兰销售Meg+Fred 278
结果(3)
#一个tibble:2x4
V3 V1 V2 V4
1澳大利亚销售约翰尼+梅格+弗雷德2227
2新西兰销售约翰尼+梅格+弗雷德1513
finalResult=bind_行(A、B、输入)%>%
选择(V1、V2、V3、V4)%>%过滤器(!V2%位于%c('Johnny+Meg'))
>最终结果
#一个tibble:12x4
V1 V2 V3 V4
1销售约翰尼+弗雷德澳大利亚329
2销售约翰尼+弗雷德新西兰1490
3销售梅格+弗雷德澳大利亚2073
4销售Meg+Fred NZ 278
5销售约翰尼+梅格+弗雷德澳大利亚2227
6销售约翰尼+梅格+弗雷德新西兰1513
7销售约翰尼澳大利亚154
8销售Meg澳大利亚1898
9澳大利亚销售部175
10销售约翰尼新西兰1235
11新西兰销售Meg 23
12新西兰销售部255
使用tidyverse
我们可以首先基于V3
拆分数据帧,然后创建名称组合并添加sum
以创建新的TIBLE并将其绑定到原始数据帧
library(tidyverse)
input %>%
bind_rows(input %>%
group_split(V3) %>%
map_dfr(function(x) map_dfr(2:nrow(x), ~tibble(
V1 = first(x$V1),
V2 = combn(x$V2, ., paste, collapse = " + "),
V3 = first(x$V3),
V4 = combn(x$V4, .,sum)) %>%
filter(grepl("\\bFred\\b", V2)))))
# V1 V2 V3 V4
#1 Sales Johnny Australia 154
#2 Sales Meg Australia 1898
#3 Sales Fred Australia 175
#4 Sales Johnny NZ 1235
#5 Sales Meg NZ 23
#6 Sales Fred NZ 255
#7 Sales Johnny + Fred Australia 329
#8 Sales Meg + Fred Australia 2073
#9 Sales Johnny + Meg + Fred Australia 2227
#10 Sales Johnny + Fred NZ 1490
#11 Sales Meg + Fred NZ 278
#12 Sales Johnny + Meg + Fred NZ 1513
使用相同的逻辑,但以R为基数,我们可以
rbind(input, do.call(rbind, lapply(split(input, input$V3), function(x)
do.call(rbind, lapply(2:nrow(x), function(y)
subset(data.frame(V1 = x$V1[1],
V2 = combn(x$V2, y, paste, collapse = " + "),
V3 = x$V3[1],
V4 = combn(x$V4, y, sum)),
grepl("\\bFred\\b", V2)))))))
道歉@RonakShah-感谢您发现错误!很好,你可以避免双重
do.call(rbind
使用by
而不是split
@jay.sf啊,是的,你是对的。by
是我经常忘记的一个重要函数。它是split
+lapply
在一个函数中的组合。
result(comb2)
# A tibble: 6 x 4
V3 V1 V2 V4
<chr> <chr> <chr> <int>
1 Australia Sales Johnny+Meg 2052
2 NZ Sales Johnny+Meg 1258
3 Australia Sales Johnny+Fred 329
4 NZ Sales Johnny+Fred 1490
5 Australia Sales Meg+Fred 2073
6 NZ Sales Meg+Fred 278
result(comb3)
# A tibble: 2 x 4
V3 V1 V2 V4
<chr> <chr> <chr> <int>
1 Australia Sales Johnny+Meg+Fred 2227
2 NZ Sales Johnny+Meg+Fred 1513
finalResult = bind_rows(A,B,input) %>%
select(V1,V2,V3,V4) %>% filter(! V2 %in% c('Johnny+Meg'))
> finalResult
# A tibble: 12 x 4
V1 V2 V3 V4
<chr> <chr> <chr> <int>
1 Sales Johnny+Fred Australia 329
2 Sales Johnny+Fred NZ 1490
3 Sales Meg+Fred Australia 2073
4 Sales Meg+Fred NZ 278
5 Sales Johnny+Meg+Fred Australia 2227
6 Sales Johnny+Meg+Fred NZ 1513
7 Sales Johnny Australia 154
8 Sales Meg Australia 1898
9 Sales Fred Australia 175
10 Sales Johnny NZ 1235
11 Sales Meg NZ 23
12 Sales Fred NZ 255
library(tidyverse)
input %>%
bind_rows(input %>%
group_split(V3) %>%
map_dfr(function(x) map_dfr(2:nrow(x), ~tibble(
V1 = first(x$V1),
V2 = combn(x$V2, ., paste, collapse = " + "),
V3 = first(x$V3),
V4 = combn(x$V4, .,sum)) %>%
filter(grepl("\\bFred\\b", V2)))))
# V1 V2 V3 V4
#1 Sales Johnny Australia 154
#2 Sales Meg Australia 1898
#3 Sales Fred Australia 175
#4 Sales Johnny NZ 1235
#5 Sales Meg NZ 23
#6 Sales Fred NZ 255
#7 Sales Johnny + Fred Australia 329
#8 Sales Meg + Fred Australia 2073
#9 Sales Johnny + Meg + Fred Australia 2227
#10 Sales Johnny + Fred NZ 1490
#11 Sales Meg + Fred NZ 278
#12 Sales Johnny + Meg + Fred NZ 1513
rbind(input, do.call(rbind, lapply(split(input, input$V3), function(x)
do.call(rbind, lapply(2:nrow(x), function(y)
subset(data.frame(V1 = x$V1[1],
V2 = combn(x$V2, y, paste, collapse = " + "),
V3 = x$V3[1],
V4 = combn(x$V4, y, sum)),
grepl("\\bFred\\b", V2)))))))