Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
拆分新列值依赖于原始数据的dataframe列_R_Tidyr - Fatal编程技术网

拆分新列值依赖于原始数据的dataframe列

拆分新列值依赖于原始数据的dataframe列,r,tidyr,R,Tidyr,我经常使用数据帧,这些数据帧中的列具有需要分隔的字符串值。这是由于数据输入程序中的“选择多个”选项造成的(很遗憾,我无法更改该选项)。我尝试了tidyr::separate,但结果排序不正确。例如: require(tidyr) df = data.frame( x = 1:3, sick = c(NA, "malaria", "diarrhoea malaria")) df <- df %>% separate(sick, c("diarrhoea", "cough"

我经常使用数据帧,这些数据帧中的列具有需要分隔的字符串值。这是由于数据输入程序中的“选择多个”选项造成的(很遗憾,我无法更改该选项)。我尝试了
tidyr::separate
,但结果排序不正确。例如:

require(tidyr)
df = data.frame(
  x = 1:3,
  sick = c(NA, "malaria", "diarrhoea malaria"))

df <- df %>%
  separate(sick, c("diarrhoea", "cough", "malaria"),
           sep = " ", fill = "right", remove = FALSE)

如果有任何正确的帮助,我们将不胜感激。

我们可以尝试使用
单独的行
dcast

library(tidyr)
library(reshape2)
library(dplyr)
separate_rows(df, sick) %>%
  mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>% 
  dcast(., x~sick, value.var = "sick1", drop=FALSE) %>%
  bind_cols(., df[2]) %>%
  select(x, sick, diarrhoea, cough, malaria) 
#  x              sick diarrhoea cough malaria
#1 1              <NA>      <NA>  <NA>    <NA>
#2 2           malaria      <NA>  <NA> malaria
#3 3 diarrhoea malaria diarrhoea  <NA> malaria
library(tidyr)
library(reshape2)
library(dplyr)
separate_rows(df, sick) %>%
  mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>% 
  dcast(., x~sick, value.var = "sick1", drop=FALSE) %>%
  bind_cols(., df[2]) %>%
  select(x, sick, diarrhoea, cough, malaria) 
#  x              sick diarrhoea cough malaria
#1 1              <NA>      <NA>  <NA>    <NA>
#2 2           malaria      <NA>  <NA> malaria
#3 3 diarrhoea malaria diarrhoea  <NA> malaria
library(splitstackshape)
dcast(cSplit(df, "sick", " ", "long")[, sick:= factor(sick, levels = 
    c("diarrhoea", "cough", "malaria"))], x~sick, value.var = "sick", drop = FALSE)[,
       sick := df$sick][]