R 将由文本映射组成的列中的值转换为整数并进行聚合
我有一个数据帧,如下所示:R 将由文本映射组成的列中的值转换为整数并进行聚合,r,dataframe,R,Dataframe,我有一个数据帧,如下所示: data.frame("id" = 1:2, "tag" = c("a,b,c","a,d")) id tag 1 a,b,c 2 a,d 标签中,如果A或B是LAN,D=C=CON,则将A和B看作LAN,D和C考虑为CON,然后我们想在以下列中的每一行中计算LAN和CON的数量,如表: 我想创建两列,它们是a、b、c的集合,如下所示: i
data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))
id tag
1 a,b,c
2 a,d
标签中,如果A或B是LAN,D=C=CON,则将A和B看作LAN,D和C考虑为CON,然后我们想在以下列中的每一行中计算LAN和CON的数量,如表:
我想创建两列,它们是a、b、c的集合,如下所示:id tag. lan_count. con_count
1 a,b,c 2 1
2 a,d 1 1
你能告诉我怎么做吗。这里的主要问题是你的数据不整洁。因此,我的解决方案分为两部分:首先是数据,然后是总结。一旦数据整理好,总结就变得微不足道了
library(tidyverse)
# Adjust to suit your real data
maxCols <- 10
d <- data.frame(id = 1:2, tag = c("a,b,c","a,d"))
d %>%
separate(
tag,
sep=",",
into=paste0("Element", 1:maxCols),
extra="drop",
fill="right",
remove=FALSE
) %>%
pivot_longer(
cols=starts_with("Element"),
values_to="Value",
names_prefix="Element"
) %>%
select(-name) %>%
# Remove unused Values
filter(!is.na(Value)) %>%
# At this point the data frame is tidy
group_by(tag) %>%
# Translate tags into "categories". Add more if required. or write a function
mutate(
lan=Value %in% c("a", "b"),
con=Value %in% c("c", "d")
) %>%
# Adjust the column specification if more categories are added.
# Or use a factor instead of binary indicators
summarise(across(lan:con, sum))
# A tibble: 2 x 3
tag lan con
* <fct> <int> <int>
1 a,b,c 2 1
2 a,d 1 1
这里的主要问题是您的数据不整洁。因此,我的解决方案分为两部分:首先是数据,然后是总结。一旦数据整理好,总结就变得微不足道了
library(tidyverse)
# Adjust to suit your real data
maxCols <- 10
d <- data.frame(id = 1:2, tag = c("a,b,c","a,d"))
d %>%
separate(
tag,
sep=",",
into=paste0("Element", 1:maxCols),
extra="drop",
fill="right",
remove=FALSE
) %>%
pivot_longer(
cols=starts_with("Element"),
values_to="Value",
names_prefix="Element"
) %>%
select(-name) %>%
# Remove unused Values
filter(!is.na(Value)) %>%
# At this point the data frame is tidy
group_by(tag) %>%
# Translate tags into "categories". Add more if required. or write a function
mutate(
lan=Value %in% c("a", "b"),
con=Value %in% c("c", "d")
) %>%
# Adjust the column specification if more categories are added.
# Or use a factor instead of binary indicators
summarise(across(lan:con, sum))
# A tibble: 2 x 3
tag lan con
* <fct> <int> <int>
1 a,b,c 2 1
2 a,d 1 1
您还可以使用以下代码:
library(dplyr)
library(tidyr)
df <- data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))
df %>%
separate_rows(tag, sep = ",") %>%
group_by(id) %>%
add_count(tag) %>%
pivot_wider(id, names_from = tag, values_from = n) %>%
rowwise() %>%
mutate(lan_count = sum(c_across(a:b), na.rm = TRUE),
con_count = sum(c_across(c:d), na.rm = TRUE)) %>%
select(-c(a:d))
# A tibble: 2 x 3
# Rowwise: id
id lan_count con_count
<int> <int> <int>
1 1 2 1
2 2 1 1
您还可以使用以下代码:
library(dplyr)
library(tidyr)
df <- data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))
df %>%
separate_rows(tag, sep = ",") %>%
group_by(id) %>%
add_count(tag) %>%
pivot_wider(id, names_from = tag, values_from = n) %>%
rowwise() %>%
mutate(lan_count = sum(c_across(a:b), na.rm = TRUE),
con_count = sum(c_across(c:d), na.rm = TRUE)) %>%
select(-c(a:d))
# A tibble: 2 x 3
# Rowwise: id
id lan_count con_count
<int> <int> <int>
1 1 2 1
2 2 1 1
请输入数据。你现在所做的对我来说毫无意义。我修改了我的问题。请输入数据。你现在所拥有的对我来说毫无意义。我修改了我的问题谢谢这很有效我怎样才能添加id谢谢这很有效我怎样才能添加id