R 如何将一个变量的多个因子与数据框中的同一条目相关联?

R 如何将一个变量的多个因子与数据框中的同一条目相关联?,r,list,factors,R,List,Factors,这是我进入Stackoverflow社区的第一个问题。首先,非常感谢我在过去5年里在这里找到的所有答案。你们都帮了大忙,但现在我没能找到答案 这就是我的情况。在一个更大的数据帧中,有一个变量给我带来了麻烦:天气。它由定义天气的因素组成,例如:下雨、多云、晴朗等。我的问题是,有些条目由多个因素定义,例如下雨、多雾。因此,R将这些因素的组合视为新的独立因素,这是我不想要的 以下是数据帧的示例: df <- read.table(text = '"Date.Time","Year","Month

这是我进入Stackoverflow社区的第一个问题。首先,非常感谢我在过去5年里在这里找到的所有答案。你们都帮了大忙,但现在我没能找到答案

这就是我的情况。在一个更大的数据帧中,有一个变量给我带来了麻烦:天气。它由定义天气的因素组成,例如:下雨、多云、晴朗等。我的问题是,有些条目由多个因素定义,例如下雨、多雾。因此,R将这些因素的组合视为新的独立因素,这是我不想要的

以下是数据帧的示例:

df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
例如,我的最终目标是能够仅选择标记为Fog的条目,包括同时包含雨和雾的条目

我的解决方案是应用一个字符分割,并将结果插入到将被放入天气变量的列表中,但我还无法做到这一点,也许有一个更简单更有趣的方法。 以下是我天真的尝试:

for (i in dim(df)[1]){
  df[i,] <- as.factor(list(strsplit(dda[i,], ",")))
}
tldr;我想将一个因子(如a、B、C)转换为多个因子(如a、B、C),并将其转换为数据帧中同一列、同一行的同一元素

提前感谢您的时间,请不要犹豫,对我的问题的格式发表评论

df <- read.table(text =
'"Date.Time","Year","Month","Day","Weekday","Hour","Temperature","Rel.humidity","Wind.dir","Wind.dir2","Wind.speed","Atm.pressure","Weather"
2015-04-01 00:00:00,"2015","4","1","Wednesday","00:00",-3.4,44,30,"NW",10,100.83,"Clear"
2015-04-02 23:00:00,"2015","4","2","Thursday","23:00",3.4,94,36,"N",2,99.8,"Rain,Fog"
2015-05-11 12:00:00,"2015","5","11","Monday","12:00",9.5,93,3,"NE",27,101.5,"Mist,Shower,Fog"',
header = TRUE, stringsAsFactors = FALSE, sep = ",")
修复for循环:

df[["Weather_split"]] <- as.list(rep(NA, nrow(df)))
for (i in seq_len(nrow(df))) {
  df[["Weather_split"]][[i]] <- strsplit(df[["Weather"]][[i]], ",")[[1]]
}
同样,更简单:

df[["Weather_split"]] <- strsplit(df[["Weather"]], ",")
str(df$Weather)
# chr [1:3] "Clear" "Rain,Fog" "Mist,Shower,Fog"
str(df$Weather_split)
# List of 3
#  $ : chr "Clear"
#  $ : chr [1:2] "Rain" "Fog"
#  $ : chr [1:3] "Mist" "Shower" "Fog"
进一步利用@Stephen Henderson的想法:

Weather_levels <- unique(unlist(df[["Weather_split"]]))
for (lvl in Weather_levels) {
  df[[lvl]] <- unlist(lapply(df$Weather_split, "%in%", x = lvl))
}

df
#             Date.Time Year Month Day   Weekday  Hour Temperature Rel.humidity Wind.dir Wind.dir2 Wind.speed Atm.pressure         Weather     Weather_split Clear  Rain   Fog  Mist Shower
# 1 2015-04-01 00:00:00 2015     4   1 Wednesday 00:00        -3.4           44       30        NW         10       100.83           Clear             Clear  TRUE FALSE FALSE FALSE  FALSE
# 2 2015-04-02 23:00:00 2015     4   2  Thursday 23:00         3.4           94       36         N          2        99.80        Rain,Fog         Rain, Fog FALSE  TRUE  TRUE FALSE  FALSE
# 3 2015-05-11 12:00:00 2015     5  11    Monday 12:00         9.5           93        3        NE         27       101.50 Mist,Shower,Fog Mist, Shower, Fog FALSE FALSE  TRUE  TRUE   TRUE
编辑:

根据您的问题,如果您确实需要因子而不是字符向量,则完全可行:

df$Weather_split <- lapply(df$Weather_split, factor, levels = Weather_levels)
df$Weather_split
# [[1]]
# [1] Clear
# Levels: Clear Rain Fog Mist Shower
# 
# [[2]]
# [1] Rain Fog 
# Levels: Clear Rain Fog Mist Shower
# 
# [[3]]
# [1] Mist   Shower Fog   
# Levels: Clear Rain Fog Mist Shower
str(df$Weather_split)
# List of 3
#  $ : Factor w/ 5 levels "Clear","Rain",..: 1
#  $ : Factor w/ 5 levels "Clear","Rain",..: 2 3
#  $ : Factor w/ 5 levels "Clear","Rain",..: 4 5 3

我不认为你可以在一列因素中嵌入一个因素列表,好吧,而不是在这里和所有下游代码中给自己带来麻烦。我建议您简单地将这些因素拆分为值为TRUE/FALSE的列。我认为,至少在这种情况下,从长远来看,扩大投资会更简单。谢谢你的评论。实际上,@StephenHenderson的建议就是解决办法。然而,这会使我的最终数据帧太大。因此,我现在开始了另一个更简单的解决方案:根据它们的影响程度来排序这些因素,并且只考虑其中不止一个时最有影响的因素。谢谢你们修复我的循环并执行斯蒂芬·亨德森的想法!然而,生成的列表不容易根据特定因素进行打印或过滤。。。。因此,我要么继续使用实际的简单解决方案,要么使用多逻辑变量。