“我怎么能?”;扩大;使用R?
我有一个数据框,包含四行编好的地址。有些地址在“2500-2598 Main St”中使用了连字符。我需要扩展这些连字符地址,以便我的数据帧中有2500 Main St、2502 Main St、2504 Main St等的新行,直到我达到2598 Main St的上限 以下是创建我的数据框的代码:“我怎么能?”;扩大;使用R?,r,R,我有一个数据框,包含四行编好的地址。有些地址在“2500-2598 Main St”中使用了连字符。我需要扩展这些连字符地址,以便我的数据帧中有2500 Main St、2502 Main St、2504 Main St等的新行,直到我达到2598 Main St的上限 以下是创建我的数据框的代码: # Create data frame of addresses, two of which need to be split df <- data.frame(c('314 Wedgewoo
# Create data frame of addresses, two of which need to be split
df <- data.frame(c('314 Wedgewood Ave, Claremont, California, 92054',
'2500-2598 Main St, El Cajon, California, 92020',
'826-838 N Bounty Ave, El Cajon, California, 92020',
'240 E Madison Ave, Chino Hills, California, 91786'))
colnames(df) <- 'address'
# Extract just the numbers and put in a separate column
df$street.num <- trimws(gsub("\\s+", " ", df$address))
df$street.num <- gsub("^(.*?),.*", "\\1", df$street.num) # Get address only
df$street.num <- gsub(" .*$", "", df$street.num) # Get street number only
df$street.lb <- as.numeric(substr(df$street.num, 1, regexpr("-", df$street.num, fixed = TRUE) - 1)) # Get street lower bound if hyphenated
df$street.ub <- as.numeric(substr(df$street.num, regexpr("-", df$street.num, fixed = TRUE) + 1, nchar(df$street.num))) # Get street upper bound if hyphenated
df$street.lb <- ifelse(is.na(df$street.lb), df$street.ub, df$street.lb) # Set lb equal to ub if NA
df$unexpanded <- ifelse(df$street.ub > df$street.lb, 1, 0)
address street.num street.lb street.ub unexpanded
1 314 Wedgewood Ave, Claremont, California, 92054 314 314 314 0
2 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1
3 826-838 N Bounty Ave, El Cajon, California, 92020 826-838 826 838 1
4 240 E Madison Ave, Chino Hills, California, 91786 240 240 240 0
到目前为止,我的想法是创建新的数据框行(可能是新的扩展街道编号列),这样我可以得到如下结果:
street.num street.lb street.ub unexpanded expanded.str.num
1 314 314 314 0 314
2 2500-2598 2500 2598 1 2500
3 2500-2598 2500 2598 1 2502
4 2500-2598 2500 2598 1 2504
... ... ... ... ...
52 2500-2598 2500 2598 1 2598
53 826-838 826 838 1 826
54 826-838 826 838 1 828
... ... ... ... ...
如果我可以像这样获得扩展的街道编号,我可以在以后附加街道名称、城市等。我们可以拆分该列,然后使用seq或
:
和unnest
library(dplyr)
library(tidyr)
library(purrr)
df %>%
mutate( expanded.str.num = map(strsplit(street.num, '-'), ~
if(length(.x) ==2) seq(as.numeric(.x[1]), as.numeric(.x[2]), by = 2) else as.numeric(.x))) %>%
unnest(c( expanded.str.num))
# A tibble: 59 x 6
# address street.num street.lb street.ub unexpanded expanded.str.num
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 314 Wedgewood Ave, Claremont, California, 92054 314 314 314 0 314
# 2 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2500
# 3 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2502
# 4 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2504
# 5 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2506
# 6 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2508
# 7 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2510
# 8 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2512
# 9 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2514
#10 2500-2598 Main St, El Cajon, California, 92020 2500-2598 2500 2598 1 2516
# … with 49 more rows
df %>%
separate_rows(street.num, convert = TRUE) %>%
group_by(address) %>%
summarise(expanded.str.num = list(seq(first(street.num), last(street.num), by = 2))) %>%
left_join(df) %>%
unnest(c(expanded.str.num))