Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将由文本映射组成的列中的值转换为整数并进行聚合_R_Dataframe - Fatal编程技术网

R 将由文本映射组成的列中的值转换为整数并进行聚合

R 将由文本映射组成的列中的值转换为整数并进行聚合,r,dataframe,R,Dataframe,我有一个数据帧,如下所示: data.frame("id" = 1:2, "tag" = c("a,b,c","a,d")) id tag 1 a,b,c 2 a,d 标签中,如果A或B是LAN,D=C=CON,则将A和B看作LAN,D和C考虑为CON,然后我们想在以下列中的每一行中计算LAN和CON的数量,如表: 我想创建两列,它们是a、b、c的集合,如下所示: i

我有一个数据帧,如下所示:

data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))

 id       tag
  1       a,b,c
  2       a,d

标签中,如果A或B是LAN,D=C=CON,则将A和B看作LAN,D和C考虑为CON,然后我们想在以下列中的每一行中计算LAN和CON的数量,如表:

我想创建两列,它们是a、b、c的集合,如下所示:

id  tag.  lan_count.   con_count
1  a,b,c    2            1
2    a,d    1            1

你能告诉我怎么做吗。

这里的主要问题是你的数据不整洁。因此,我的解决方案分为两部分:首先是数据,然后是总结。一旦数据整理好,总结就变得微不足道了

library(tidyverse)

# Adjust to suit your real data
maxCols <- 10
d <- data.frame(id = 1:2, tag = c("a,b,c","a,d"))
d %>% 
  separate(
    tag, 
    sep=",", 
    into=paste0("Element", 1:maxCols), 
    extra="drop", 
    fill="right", 
    remove=FALSE
  )  %>% 
  pivot_longer(
    cols=starts_with("Element"), 
    values_to="Value", 
    names_prefix="Element"
  ) %>%
  select(-name) %>%
  # Remove unused Values
  filter(!is.na(Value)) %>%
  # At this point the data frame is tidy
  group_by(tag) %>%
  # Translate tags into  "categories".  Add more if required.  or write a function
  mutate(
    lan=Value %in% c("a", "b"),
    con=Value %in% c("c", "d")
  ) %>%
  # Adjust the column specification if more categories are added.  
  # Or use a factor instead of binary indicators
  summarise(across(lan:con, sum))
# A tibble: 2 x 3
  tag     lan   con
* <fct> <int> <int>
1 a,b,c     2     1
2 a,d       1     1

这里的主要问题是您的数据不整洁。因此,我的解决方案分为两部分:首先是数据,然后是总结。一旦数据整理好,总结就变得微不足道了

library(tidyverse)

# Adjust to suit your real data
maxCols <- 10
d <- data.frame(id = 1:2, tag = c("a,b,c","a,d"))
d %>% 
  separate(
    tag, 
    sep=",", 
    into=paste0("Element", 1:maxCols), 
    extra="drop", 
    fill="right", 
    remove=FALSE
  )  %>% 
  pivot_longer(
    cols=starts_with("Element"), 
    values_to="Value", 
    names_prefix="Element"
  ) %>%
  select(-name) %>%
  # Remove unused Values
  filter(!is.na(Value)) %>%
  # At this point the data frame is tidy
  group_by(tag) %>%
  # Translate tags into  "categories".  Add more if required.  or write a function
  mutate(
    lan=Value %in% c("a", "b"),
    con=Value %in% c("c", "d")
  ) %>%
  # Adjust the column specification if more categories are added.  
  # Or use a factor instead of binary indicators
  summarise(across(lan:con, sum))
# A tibble: 2 x 3
  tag     lan   con
* <fct> <int> <int>
1 a,b,c     2     1
2 a,d       1     1

您还可以使用以下代码:

library(dplyr)
library(tidyr)

df <- data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))

df %>%
  separate_rows(tag, sep = ",") %>%
  group_by(id) %>%
  add_count(tag) %>%
  pivot_wider(id, names_from = tag, values_from = n) %>%
  rowwise() %>%
  mutate(lan_count = sum(c_across(a:b), na.rm = TRUE), 
         con_count = sum(c_across(c:d), na.rm = TRUE)) %>%
  select(-c(a:d))

# A tibble: 2 x 3
# Rowwise:  id
     id lan_count con_count
  <int>     <int>     <int>
1     1         2         1
2     2         1         1


您还可以使用以下代码:

library(dplyr)
library(tidyr)

df <- data.frame("id" = 1:2, "tag" = c("a,b,c","a,d"))

df %>%
  separate_rows(tag, sep = ",") %>%
  group_by(id) %>%
  add_count(tag) %>%
  pivot_wider(id, names_from = tag, values_from = n) %>%
  rowwise() %>%
  mutate(lan_count = sum(c_across(a:b), na.rm = TRUE), 
         con_count = sum(c_across(c:d), na.rm = TRUE)) %>%
  select(-c(a:d))

# A tibble: 2 x 3
# Rowwise:  id
     id lan_count con_count
  <int>     <int>     <int>
1     1         2         1
2     2         1         1


请输入数据。你现在所做的对我来说毫无意义。我修改了我的问题。请输入数据。你现在所拥有的对我来说毫无意义。我修改了我的问题谢谢这很有效我怎样才能添加id谢谢这很有效我怎样才能添加id