R-基于单个列的多个条件从数据帧中删除行
我在R中有以下示例数据帧:R-基于单个列的多个条件从数据帧中删除行,r,dataframe,conditional-statements,R,Dataframe,Conditional Statements,我在R中有以下示例数据帧: SampleID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D", "E&q
SampleID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D", "E", "E", "E", "E", "F", "F")
Analyte <- c("A1", "A1", "A2", "A2", "B1", "B2", "C1", "C1", "C1", "C2", "C2", "C2", "D1", "D2", "E1", "E1", "E2", "E2", "F1", "F2")
Fraction <- c("Dissolved", "Total", "Dissolved", "Total", "Total", "Total", "Dissolved", "Suspended", "Total", "Dissolved", "Suspended", "Total", "Unknown", "Unknown", "Dissolved", "Suspended", "Dissolved", "Suspended", "Dissolved", "Dissolved")
Concentration <- c(4.2, 5.6, 8.6, 11.2, 2.1, 9.6, 15.6, 28.7, 42.3, 18.3, 23.2, 48.6, 6.4, 28.8, 9.1, 32.5, 36.4, 24.5, 10.7, 3.4)
MyData <- data.frame(SampleID, Analyte, Fraction, Concentration)
我想做以下工作:
SampleID Analyte Fraction Concentration
2 A A1 Total 5.6
4 A A2 Total 11.2
5 B B1 Total 2.1
6 B B2 Total 9.6
9 C C1 Total 42.3
12 C C2 Total 48.6
13 D D1 Unknown 6.4
14 D D2 Unknown 28.8
15 E E1 Total 41.6
17 E E2 Total 60.9
19 F F1 Dissolved 10.7
20 F F2 Dissolved 3.4
样本ID
,如果分析物
报告了“总计”分数
,则仅保留分析物
的该行,并移除该分析物具有任何其他分数
值(即溶解、悬浮)的行
样本ID的分析物
在分数
列中包括溶解和悬浮物(并且分数
没有其他值),将溶解和悬浮的浓度相加,并为该分析物添加一行,其中分数列标记为总计,浓度列列出总和。移除该分析物的溶解和悬浮原始行
SampleID
“A”的两个分析物
已溶解并总计,因此我想删除含有溶解分数的行。对于SampleID
“C”,我希望去除这两种分析物的溶解和悬浮部分
,并仅保留包含总计的行。最后,对于SampleID
“E”,两种分析物中的每一种的溶解和悬浮分数
将相加,结果将是每种分析物的一个新行,代表总和(重新标记为总计),与溶解和悬浮的部分相关的行将被删除
上述数据帧MyData
的输出如下:
SampleID Analyte Fraction Concentration
2 A A1 Total 5.6
4 A A2 Total 11.2
5 B B1 Total 2.1
6 B B2 Total 9.6
9 C C1 Total 42.3
12 C C2 Total 48.6
13 D D1 Unknown 6.4
14 D D2 Unknown 28.8
15 E E1 Total 41.6
17 E E2 Total 60.9
19 F F1 Dissolved 10.7
20 F F2 Dissolved 3.4
请注意,我提供的示例只是一个大得多的数据集的一小部分,其中包含数百个样本ID
,但分数
列只能等于上面原始数据框中列出的值(即,溶解、暂停、总计或未知)
谢谢大家! 这可以通过以下方式完成:
library(tidyverse)
MyData %>%
pivot_wider(c(SampleID, Analyte),Fraction, values_from = Concentration) %>%
mutate(Total = coalesce(Total, Dissolved + Suspended),
Dissolved = ifelse(is.na(Total)&is.na(Suspended), Dissolved, NA),
Suspended = ifelse(is.na(Total)&is.na(Dissolved), Suspended, NA)) %>%
pivot_longer(-c(SampleID, Analyte), values_drop_na = TRUE)
# A tibble: 12 x 4
SampleID Analyte name value
<chr> <chr> <chr> <dbl>
1 A A1 Total 5.6
2 A A2 Total 11.2
3 B B1 Total 2.1
4 B B2 Total 9.6
5 C C1 Total 42.3
6 C C2 Total 48.6
7 D D1 Unknown 6.4
8 D D2 Unknown 28.8
9 E E1 Total 41.6
10 E E2 Total 60.9
11 F F1 Dissolved 10.7
12 F F2 Dissolved 3.4
库(tidyverse)
MyData%>%
枢轴宽度(c(样品,分析物),分数,值=浓度)%>%
突变(总=聚结(总、溶解+悬浮),
溶解=ifelse(is.na(总)和is.na(悬浮),溶解,na),
悬浮=ifelse(is.na(总)和is.na(溶解),悬浮,na))%>%
枢轴长度(-c(样本ID,分析物),数值下降\u na=TRUE)
#一个tibble:12x4
样本ID分析物名称值
1 A A1总计5.6
2 A A2总计11.2
3 B B1总计2.1
4 B B2总计9.6
5 C C1总计42.3
6 C C2总计48.6
7 D D1未知6.4
8 D D2未知28.8
9东E1总计41.6
10 E E2总计60.9
11楼1层10.7
12 F F2溶解3.4
您也可以使用以下解决方案。这听起来可能有点冗长,但也能完成工作:
library(dplyr)
library(purrr)
MyData %>%
group_split(SampleID, Analyte) %>%
map(~ if("Total" %in% .x$Fraction) {
.x %>% filter(Fraction == "Total")} else {
.x
}) %>%
map(~ if(all(c("Dissolved", "Suspended") %in% .x$Fraction)) {
add_row(.x, SampleID = .x$SampleID[1], Analyte = .x$Analyte[1],
Fraction = "Total", Concentration = sum(.x$Concentration))
} else {
.x
}) %>%
map_dfr(~ if("Total" %in% .x$Fraction) {
.x %>% filter(Fraction == "Total")} else {
.x
})
# A tibble: 12 x 4
SampleID Analyte Fraction Concentration
<chr> <chr> <chr> <dbl>
1 A A1 Total 5.6
2 A A2 Total 11.2
3 B B1 Total 2.1
4 B B2 Total 9.6
5 C C1 Total 42.3
6 C C2 Total 48.6
7 D D1 Unknown 6.4
8 D D2 Unknown 28.8
9 E E1 Total 41.6
10 E E2 Total 60.9
11 F F1 Dissolved 10.7
12 F F2 Dissolved 3.4
库(dplyr)
图书馆(purrr)
MyData%>%
组分割(样本、分析物)%>%
映射(~if(“总计”%in%.x$分数){
.x%>%过滤器(分数=“总数”)}其他{
.x
}) %>%
map(~if(全部(c(“已解散”、“暂停”)%单位为%.x$分数)){
添加_行(.x,SampleID=.x$SampleID[1],分析物=.x$Analyte[1],
分数=“总”,浓度=总和(.x$浓度))
}否则{
.x
}) %>%
映射\u dfr(~if(“总计”%in%.x$分数){
.x%>%过滤器(分数=“总数”)}其他{
.x
})
#一个tibble:12x4
样品分析物分数浓度
1 A A1总计5.6
2 A A2总计11.2
3 B B1总计2.1
4 B B2总计9.6
5 C C1总计42.3
6 C C2总计48.6
7 D D1未知6.4
8 D D2未知28.8
9东E1总计41.6
10 E E2总计60.9
11楼1层10.7
12 F F2溶解3.4
你有没有在一张地图中尝试过if/else if/else
,我在你的帖子中发现你用map
循环了3次。你在每个地图中都有一个if/else
。或者你是说在每个过滤器之后它应该是顺序的。在某些情况下,它是需要顺序的,即all(c(“dissoled”,“Suspended”)
如果不进行过滤,它可能不是真的。我还没有运行整个代码。对我来说,它工作得很好,这非常有效!