dataframe:将由星号分隔的数据块从长格式重新格式化为宽格式
我正在制作一个数据表,它讨论由一列组成的街道。 每条街道都是一排长度可变的街道 第一行包含街道名称,其他行包含各种详细信息 每条街道由一个包含4颗星的单元与另一条街道隔开。 如何重新组织数据dataframe:将由星号分隔的数据块从长格式重新格式化为宽格式,r,R,我正在制作一个数据表,它讨论由一列组成的街道。 每条街道都是一排长度可变的街道 第一行包含街道名称,其他行包含各种详细信息 每条街道由一个包含4颗星的单元与另一条街道隔开。 如何重新组织数据 dataset <- c("Rosa street", "London", "From to", "Description : lorem ipsum", "****", "Main stre
dataset <- c("Rosa street", "London", "From to", "Description : lorem ipsum", "****", "Main street", "Bristol", "From to", "Description : dolor sit amet", "coordinates", "****"
dataset <- as.data.frame(dataset)
预期产量
var1 |var2 |var3 |var4 |var5 |
------------------------------------------------------------------------------------
1 Rosa street | London | From to | Description : lorem ipsum |NA |
2 Main street | Bristol | From to | Description : dolor sit amet | coordinates |
这里有一个使用tidyverse的选项-
这里有一个使用tidyverse的选项-
这是否有效:
library(tidyr)
library(dplyr)
library(stringr)
dataset %>% mutate(grp = cumsum(street == '****')) %>% filter(street != '****') %>%
mutate(vars = case_when(str_detect(street, 'street') ~ 'var1',
str_detect(street, 'From to') ~ 'var3',
str_detect(street, 'Description :') ~ 'var4',
str_detect(street, 'coordinates') ~ 'var5',
TRUE ~ 'var2')) %>% pivot_wider(id_cols = grp, names_from = vars, values_from = street) %>% select(-grp)
# A tibble: 2 x 5
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 Rosa street London From to Description : lorem ipsum NA
2 Main street Bristol From to Description : dolor sit amet coordinates
使用的数据:
dataset <- data.frame( street = c("Rosa street", "London", "From to", "Description : lorem ipsum", "****", "Main street", "Bristol", "From to", "Description : dolor sit amet", "coordinates", "****"))
这是否有效:
library(tidyr)
library(dplyr)
library(stringr)
dataset %>% mutate(grp = cumsum(street == '****')) %>% filter(street != '****') %>%
mutate(vars = case_when(str_detect(street, 'street') ~ 'var1',
str_detect(street, 'From to') ~ 'var3',
str_detect(street, 'Description :') ~ 'var4',
str_detect(street, 'coordinates') ~ 'var5',
TRUE ~ 'var2')) %>% pivot_wider(id_cols = grp, names_from = vars, values_from = street) %>% select(-grp)
# A tibble: 2 x 5
var1 var2 var3 var4 var5
<chr> <chr> <chr> <chr> <chr>
1 Rosa street London From to Description : lorem ipsum NA
2 Main street Bristol From to Description : dolor sit amet coordinates
使用的数据:
dataset <- data.frame( street = c("Rosa street", "London", "From to", "Description : lorem ipsum", "****", "Main street", "Bristol", "From to", "Description : dolor sit amet", "coordinates", "****"))