R 计算字符串列中提到每个美国州名称的次数-为所有州创建51个新列
我有一个这样的专栏:R 计算字符串列中提到每个美国州名称的次数-为所有州创建51个新列,r,R,我有一个这样的专栏: df <- data.frame("travel_history" = c("Jane Doe travelled to indiana, stayed there for 1 year, then went to Austin, Texas for a week, then New York and then back to Indiana", "John was in Colombus, Ohio, then we
df <- data.frame("travel_history" = c("Jane Doe travelled to indiana, stayed there for 1 year, then went to Austin, Texas for a week, then New York and then back to Indiana", "John was in Colombus, Ohio, then went to alabama, then Indiana, and then went to California. he visited Alabama again, but eventually settled in California"))
df$Alabama <- str_count(df$travel_history, "Alabama")
df我们可以循环“state.name”并应用stru count
library(stringr)
df[state.name] <- lapply(tolower(state.name),
function(x) str_count(tolower(df$travel_history), x))
或者使用map
library(purrr)
library(dplyr)
map_dfc(state.name, ~ df %>%
transmute(!! .x := str_count(travel_history,
regex(.x, ignore_case = TRUE)))) %>%
bind_cols(df, .)
travel_history
#1 Jane Doe travelled to indiana, stayed there for 1 year, then went to Austin, Texas for a week, then New York and then back to Indiana
#2 John was in Colombus, Ohio, then went to alabama, then Indiana, and then went to California. he visited Alabama again, but eventually settled in California
# Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky
#1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0
#2 2 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0
# Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
#1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
#2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia
#1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#2 0 0 1 0 0 0 0 0 0 0 0 0 0 0
# Washington West Virginia Wisconsin Wyoming
#1 0 0 0 0
#2 0 0 0 0
可能想在那里添加一个tolower
@akrun作为df[state.name]@ChuckP state.name给出state.name#[1]“阿拉巴马州”。
。我在想OP希望这个案子是骆驼案case@ChuckP谢谢,你说得对。我在想OP在示例数据中有一个输入错误,您在第一行没有捕捉到例如Indiana两次,因为有一个条目正确大写,而另一个没有
df[state.name] <- lapply(state.name, function(x) str_count(df$travel_history, regex(x, ignore_case = TRUE)))
library(purrr)
library(dplyr)
map_dfc(state.name, ~ df %>%
transmute(!! .x := str_count(travel_history,
regex(.x, ignore_case = TRUE)))) %>%
bind_cols(df, .)
travel_history
#1 Jane Doe travelled to indiana, stayed there for 1 year, then went to Austin, Texas for a week, then New York and then back to Indiana
#2 John was in Colombus, Ohio, then went to alabama, then Indiana, and then went to California. he visited Alabama again, but eventually settled in California
# Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky
#1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0
#2 2 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0
# Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
#1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
#2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia
#1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#2 0 0 1 0 0 0 0 0 0 0 0 0 0 0
# Washington West Virginia Wisconsin Wyoming
#1 0 0 0 0
#2 0 0 0 0