R 在列中连接特定字符串
我有这样一个数据框:R 在列中连接特定字符串,r,paste,R,Paste,我有这样一个数据框: df <- data.frame("region" = c("Spain", "Barcelona", "Madrid", "France", "Paris", "Lyon", "Belgium", "Bruges", "Brussels"), "2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, N
df <- data.frame("region" = c("Spain", "Barcelona", "Madrid",
"France", "Paris", "Lyon",
"Belgium", "Bruges", "Brussels"),
"2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))
desired_df <- data.frame("region" = c("Spain_Spain", "Spain_Barcelona", "Spain_Madrid",
"France_France", "France_Paris", "France_Lyon",
"Belgium_Belgium", "Belgium_Bruges", "Belgium_Brussels"),
"2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))
df我们可以根据国家名称的出现创建一个分组变量,并将“region”的第一个
元素与“region”的其他元素粘贴在一起,以更新“region”列
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(region %in% c("Spain", "France", "Belgium"))) %>%
mutate(region = str_c(first(region), region, sep="_")) %>%
ungroup %>%
select(-grp)
# A tibble: 9 x 3
# region X2010 X2011
# <chr> <int> <dbl>
#1 Spain_Spain 1 NA
#2 Spain_Barcelona 2 1
#3 Spain_Madrid 3 2
#4 France_France 4 NA
#5 France_Paris 5 3
#6 France_Lyon 6 4
#7 Belgium_Belgium 7 NA
#8 Belgium_Bruges 8 5
#9 Belgium_Brussels 9 6
使用tidyverse
的通用解决方案需要从其他数据中过滤出国家,并将数据重新加入:
df %>%
mutate(gr = cumsum(is.na(X2011))) %>%
filter(!is.na(X2011)) %>%
left_join(countries %>%
select(region, gr) %>%
rename("country" = "region"), by = "gr") %>%
mutate(new_region = paste(country,region, sep = "_")) %>%
select(-gr)
在实际数据中,每个国家是否总是有2个城市?不,这取决于实际数据中的国家。您可以通过cumsum(is.na(X2011)
概括cumsum()部分,而不是列出国家。
df %>%
mutate(gr = cumsum(is.na(X2011))) %>%
filter(!is.na(X2011)) %>%
left_join(countries %>%
select(region, gr) %>%
rename("country" = "region"), by = "gr") %>%
mutate(new_region = paste(country,region, sep = "_")) %>%
select(-gr)
library(dplyr)
library(tidyr)
df %>%
mutate(country = if_else(is.na(X2011), region, NULL)) %>%
fill(country) %>%
unite("region", c(country,region))