R 使用循环在数据帧中的列上迭代操作
我有一个带有列名的数据框,其中包括W1_2019格式的周和年标识符以及其他文本。完整的数据框架包含52周的数据,每个数据框架包含5列。我的目标是使用下面的代码,它完全符合我在第1周和第2周的要求,并将其放入x=1到52的循环中,这样我就不必使用52次相同的半打行R 使用循环在数据帧中的列上迭代操作,r,R,我有一个带有列名的数据框,其中包括W1_2019格式的周和年标识符以及其他文本。完整的数据框架包含52周的数据,每个数据框架包含5列。我的目标是使用下面的代码,它完全符合我在第1周和第2周的要求,并将其放入x=1到52的循环中,这样我就不必使用52次相同的半打行 eidsr <- dget(file="test1.txt") mode_xmt <- data.frame(District=eidsr$district) #Initializes dataframe mode_xmt
eidsr <- dget(file="test1.txt")
mode_xmt <- data.frame(District=eidsr$district) #Initializes dataframe mode_xmt with only 1 column containing District names
wtmp <- select(eidsr, contains("W1_2019"))
wtmp$mode <- "NoRep"
wtmp$mode[wtmp$W1_2019_EIDSR_Total_Malaria_cases>0] <- "Report"
wtmp$mode[wtmp$`W1_2019_EIDSR-Mobile_SMS`==1] <- "Mobile_SMS"
wtmp$mode[wtmp$`W1_2019_EIDSR-Mobile_Internet`==1] <- "Mobile_Internet"
#At this point the dataframe wtmp looks like the example below.
mode_xmt$`2019_W1` <- wtmp$mode #Appends ONLY the W1_2019 column to mode_xmt
rm(wtmp)
wtmp <- select(eidsr, contains("W2_2019"))
wtmp$mode <- "NoRep"
wtmp$mode[wtmp$W2_2019_EIDSR_Total_Malaria_cases>0] <- "Report"
wtmp$mode[wtmp$`W2_2019_EIDSR-Mobile_SMS`==1] <- "Mobile_SMS"
wtmp$mode[wtmp$`W2_2019_EIDSR-Mobile_Internet`==1] <- "Mobile_Internet"
mode_xmt$`2019_W2` <- wtmp$mode
rm(wtmp)
一旦我完成了W2的第二次迭代,mode_xmt如下所示:
`W1_2019_EIDSR-Timely_~ W1_2019_EIDSR_Total_Mala~ W1_2019_EIDSR_Date_R~ `W1_2019_EIDSR-Mobile_~ `W1_2019_EIDSR-Mobi~ mode
<dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 NA 0 NA NA NA NoRep
2 NA NA NA NA NA NoRep
3 NA 51 NA NA NA Repo~
4 NA NA NA NA NA NoRep
5 NA 64 NA NA NA Repo~
6 NA 86 NA NA NA Repo~
7 NA 92 NA NA NA Repo~
8 NA 47 NA NA NA Repo~
9 NA 46 NA NA NA Repo~
10 NA 35 NA NA NA Repo~
District 2019_W01
1 Bo NoRep
2 Bo NoRep
3 Bo Report
4 Bo NoRep
5 Bo Report
6 Bo Report
7 Bo Report
8 Bo Report
9 Bo Report
10 Bo Report
District 2019_W01 2019_W02
1 Bo NoRep Report
2 Bo NoRep NoRep
3 Bo Report Report
4 Bo NoRep NoRep
5 Bo Report Report
6 Bo Report Report
7 Bo Report Report
8 Bo Report Report
9 Bo Report Report
10 Bo Report Report
起泡,冲洗,重复。时报52。正如@DS_UNI所观察到的,虽然将周和年分开列会很好,但它们会破坏最终目的,即一个超过一年的时间序列。。。但是为了不让自己完全发疯,如果我能重复一年中的52周,我会很高兴
正如我所说,上面的代码是有效的。我只是在寻找一种循环的方法,而不是像往常一样重复它
以下是工作目录中另存为test1.txt的截断数据的dput文本:
structure(list(`W1_2019_EIDSR-Timely_Report` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), W1_2019_EIDSR_Total_Malaria_cases = c(0, NA, 51, NA, 64, 86, 92, 47, 46, 35, 33, NA, NA, 77, 35, 7, 24, 27, 14, 72), W1_2019_EIDSR_Date_Received = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), `W1_2019_EIDSR-Mobile_Internet` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W1_2019_EIDSR-Mobile_SMS` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W2_2019_EIDSR-Timely_Report`
= c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), W2_2019_EIDSR_Total_Malaria_cases = c(55, NA, 44, NA, 38, 26, 29, 40, 59, 18, 48, NA, NA, 37, 34, 51, 34, 38, 13, 56), W2_2019_EIDSR_Date_Received = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), `W2_2019_EIDSR-Mobile_Internet` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W2_2019_EIDSR-Mobile_SMS` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), district = c("Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo")), .Names = c("W1_2019_EIDSR-Timely_Report", "W1_2019_EIDSR_Total_Malaria_cases", "W1_2019_EIDSR_Date_Received", "W1_2019_EIDSR-Mobile_Internet", "W1_2019_EIDSR-Mobile_SMS", "W2_2019_EIDSR-Timely_Report", "W2_2019_EIDSR_Total_Malaria_cases", "W2_2019_EIDSR_Date_Received", "W2_2019_EIDSR-Mobile_Internet", "W2_2019_EIDSR-Mobile_SMS", "district"), row.names = c(NA, -20L ), class = c("tbl_df", "tbl", "data.frame"))
你的数据应该是这样的,我也希望每周有一列,每年有一列。最有可能的是,有一种方法可以操纵你得到你想要的东西
library(dplyr)
library(reshape2)
eidsr %>%
# values should be in a column (not in headers)
melt(id.var = 'district') %>%
# extract the new variables
mutate(week_year = substr(variable, 1, 7),
variable = sub(".*EIDSR[- _]", "", variable)) %>%
# assuming missing values don't have a specific meaning you can just remove them
na.omit()
# district variable value week_year
# 21 Bo Total_Malaria_cases 0 W1_2019
# 23 Bo Total_Malaria_cases 51 W1_2019
# 25 Bo Total_Malaria_cases 64 W1_2019
# 26 Bo Total_Malaria_cases 86 W1_2019
# 27 Bo Total_Malaria_cases 92 W1_2019
# 28 Bo Total_Malaria_cases 47 W1_2019
# 29 Bo Total_Malaria_cases 46 W1_2019
# 30 Bo Total_Malaria_cases 35 W1_2019
我可以看出您正在失去耐心,因此如果必须使用循环,则应使用其中一个应用函数,对于那些需要在向量或列表上重复应用函数的函数:
wacky_fun <- function(x_chr){
malaria_col <- paste0(x_chr, '_EIDSR_Total_Malaria_cases')
sms_col <- paste0(x_chr, '_EIDSR-Mobile_SMS')
internet_col <- paste0(x_chr, '_EIDSR-Mobile_Internet')
mode_col <- rep("NoRep", nrow(eidsr))
mode_col[eidsr[malaria_col]>0] <- "Report"
mode_col[eidsr[sms_col]==1] <- "Mobile_SMS"
mode_col[eidsr[internet_col]==1] <- "Mobile_Internet"
return(mode_col)
}
我们将对数据中的所有周应用该函数
# get the unique weeks in the headers
weeks <- names(eidsr)[grepl('W[[:digit:]]_[[:digit:]]{4}', names(eidsr))] %>%
substr(1, 7) %>%
unique()
# apply the function on all the weeks, bind them with the district, and convert to data.frame
cbind('district' = eidsr$district, sapply(weeks, wacky_fun)) %>%
as.data.frame()
我建议您看看什么是,以及如何重塑数据以优化分析,处理类似数据,tbh我不建议使用循环来解决这个问题,也就是说,如果没有可复制的示例,帮助解决这个问题非常具有挑战性,这可能会给您一些关于如何提供样本数据的想法,可重复的例子,一个预期的结果我担心你会说。。。这些数据太乱了,我要花很长时间才能创建虚拟数据进行复制。我试试……:对不起!但你也可以看看我在第一条评论中添加的问题,看看答案可能会有所帮助。我知道这是大量的新手材料,但我经常遇到这样的问题,因为我不知道如何创建与我正在处理的导入数据类似的示例数据。你有没有任何创建虚拟数据的链接,你可以告诉我吗?不,这不是我的目标。看起来我无法格式化评论,因此将在上面的问题中重新编辑。谢谢你的耐心。再加上我迟来的感谢@DS_UNI。你的古怪乐趣完美地工作了,代码行比我的克鲁格少得多。我有很多分析要做,所以代码的进一步学习将不得不推迟到以后,但我感谢您的帮助,并完全打算回来分析您的代码,直到我理解它!