R 计算数年内月份的中值

R 计算数年内月份的中值,r,dplyr,median,date-conversion,dbplyr,R,Dplyr,Median,Date Conversion,Dbplyr,我最近开始了与R的冒险,并试图解决以下问题。我有一个数据框架,包括一年中特定月份的到达和离开。我必须找出这些年来每个月的中位数。我的结果应保存在.csv中。以下仅为样本,全部观察结果包括截至2017年的日期(总计1548 obs): 我决定分几个步骤来完成,我尝试做的第一件事是从字符串接收正确的日期格式: library(dbplyr) step_1 = as_tibble(flights) step_2 = step_1 %>% transmute( date_format

我最近开始了与R的冒险,并试图解决以下问题。我有一个数据框架,包括一年中特定月份的到达和离开。我必须找出这些年来每个月的中位数。我的结果应保存在.csv中。以下仅为样本,全部观察结果包括截至2017年的日期(总计1548 obs):

我决定分几个步骤来完成,我尝试做的第一件事是从字符串接收正确的日期格式:

library(dbplyr)
step_1 = as_tibble(flights)

step_2 = step_1 %>%
  transmute(
    date_format = as.POSIXct(strptime(ReportPeriod, format = "%m/%d/%Y")),
    even_new_date = as.Date(date_format, format = "%Y"),
    Arrival_Departure, 
    FlightOpsCount)
这对我来说确实很棘手。。我不明白如何正确地做到这一点,为什么有两种方式来获取日期格式,例如2006-01-01和2005-12-31?在这种情况下,哪一个是正确的

现在,假设2006-01-01是正确的,我可以使用as.POSIXct in months()函数来获取月份:

step2 = step_1 %>%
transmute(
month = months(as.POSIXct(strptime(ReportPeriod, format = "%m/%d/%Y"))), 
Arrival_Departure, 
FlightOpsCount)
下一步需要分组操作:

step_3 = step_2 %>%
  group_by(month, Arrival_Departure) %>% 
  summarize(median = median(FlightOpsCount))
当我将其写入csv时,得到的值小得离谱

"","month","Arrival_Departure","median"
"1","April","Arrival",102.5
"2","April","Departure",3061
"3","August","Arrival",1412.5
"4","August","Departure",3667.5
"5","December","Arrival",102
"6","December","Departure",1738
"7","February","Arrival",116
"8","February","Departure",116
"9","January","Arrival",284
"10","January","Departure",1708
"11","July","Arrival",95.5
"12","July","Departure",3571
"13","June","Arrival",119
"14","June","Departure",3292
"15","March","Arrival",115
"16","March","Departure",1759
"17","May","Arrival",1609.5
"18","May","Departure",3121
"19","November","Arrival",93.5
"20","November","Departure",93.5
"21","October","Arrival",2359
"22","October","Departure",2756
"23","September","Arrival",1228
"24","September","Departure",3187.5
有人能给我指点迷津,告诉我解决问题的正确方法吗


如果有任何帮助,我将不胜感激。

尽管我建议您使用dplyr:

# Step 1: Convert dates using as.Date function
flights$ReportPeriod <- as.Date(flights$ReportPeriod, "%m/%d/%Y")

# Step 2: Use dplyr to summarize information
require(dplyr)
flights <- flights %>% 
             group_by(ReportPeriod, Arrival_Departure) %>%
             summarise(FlightOpsCount = median(FlightOpsCount)) %>% 
             as.data.frame() 

# Step 3: Convert date to string for month name
flights <- flights %>%
             mutate(ReportPeriod = months(ReportPeriod)) %>%
             rename(month = ReportPeriod) # If you need to rename the column to be "months"


# Alternate Step 3: If you want to add in year as well
require(lubridate)
flights <- flights %>%
             mutate(ReportPeriod = paste(months(ReportPeriod), 
                                         year(ReportPeriod), 
                                         sep = " ")) %>%
             rename(month = ReportPeriod) # If you need to rename the column to be "months"

# Step 4: Write to csv
write.csv(flights, "file_name.csv", row.names = FALSE)
#步骤1:使用as.Date函数转换日期
航班$ReportPeriod%
总结(FlightOpsCount=中值(FlightOpsCount))%>%
as.data.frame()
#步骤3:将日期转换为月份名称的字符串
航班%
突变(报告期=月(报告期))%>%
重命名(月=ReportPeriod)#如果需要将列重命名为“月”
#备选步骤3:如果您还想添加年份
要求(润滑)
航班%
突变(报告期=粘贴(月(报告期)),
年度(报告期),
九月=”)%>%
重命名(月=ReportPeriod)#如果需要将列重命名为“月”
#步骤4:写入csv
write.csv(flights,“file_name.csv”,row.names=FALSE)

干杯。

这里有一个
数据。表
方法:

library(data.table)
library(lubridate)
dat <- fread("sample_data.txt", col.names = c("dte", "flight", "typ1","typ2","flt_count"))
dat$dte <- as.POSIXct(strptime(dat$dte, format = "%m/%d/%Y %H:%M:%S"), tz = "GMT")

new_dat <- dat[, sum(flt_count), by = list(month(dte),typ1)]
这似乎就是你要找的<代码>数据。表对于大型数据集非常有用

对于写入结果,请使用
write.csv(new\u dat,“new\u file.csv”,row\u names=F)


希望这是有帮助的

我认为这要简单得多。请注意,
months
的格式与您的格式略有不同

library(zoo)

months <- as.yearmon(flights$ReportPeriod, "%m/%d/%Y %H:%M:%S")
agg <- aggregate(FlightOpsCount ~ months + Arrival_Departure, flights, median)

有关可能的格式的详细列表,请参阅帮助页
?strtime

谢谢大家的帮助!我确实收到特定月份的正确值,这是我的代码:

#summarize Arrival & Departures through the years
step_1 <-  flights %>% 
  group_by(ReportPeriod, Arrival_Departure) %>% 
  summarise(sum = sum(FlightOpsCount)) %>% 
  arrange(ReportPeriod) %>% 
  ungroup()

#modify date format in ReportPeriod column to receive months
step_2 <- step_1 %>% 
  transmute(month = months(as.Date(ReportPeriod,"%m/%d/%Y")),
            Arrival_Departure,
            sum) %>% 
  group_by(month, Arrival_Departure) %>%
  summarise(FlightOpsCount = median(sum)) %>% 
  write.csv(., "flights_output.csv", row.names = FALSE, quote = FALSE)
#总结历年来的到达和离开情况
步骤1%
分组依据(报告期、到达和离开)%>%
总结(总和=总和(飞行侦察))%>%
安排(报告期)%>%
解组()
#修改ReportPeriod列中的日期格式以接收月份
步骤2%
转换(月=月(截止日期(报告期,%m/%d/%Y)),
到达/离开,
总和)%%>%
分组依据(月份、到达和离开)%>%
总结(FlightOpsCount=中位数(总和))%>%
write.csv(,“flights\u output.csv”,row.names=FALSE,quote=FALSE)
然而,我得到的是按字母顺序排列的月份,而不是按时间顺序排列的月份。我在这里的某个地方找到了解决方案,但它不能正常工作,我只得到了NAs。显然,我是在将任何内容写入.csv并在步骤2末尾添加ungroup()之前调用它的

step_3 <- step_2 %>% 
  mutate(month = factor(month.name[month], levels = month.name)) %>% 
  arrange(month)
步骤3%
突变(月=因子(月.名称[月],级别=月.名称))%>%
安排(月)

请使用dput并创建一个可复制的示例
library(zoo)

months <- as.yearmon(flights$ReportPeriod, "%m/%d/%Y %H:%M:%S")
agg <- aggregate(FlightOpsCount ~ months + Arrival_Departure, flights, median)
format(as.Date(months), "%Y %B")   # or "%B %Y"
#summarize Arrival & Departures through the years
step_1 <-  flights %>% 
  group_by(ReportPeriod, Arrival_Departure) %>% 
  summarise(sum = sum(FlightOpsCount)) %>% 
  arrange(ReportPeriod) %>% 
  ungroup()

#modify date format in ReportPeriod column to receive months
step_2 <- step_1 %>% 
  transmute(month = months(as.Date(ReportPeriod,"%m/%d/%Y")),
            Arrival_Departure,
            sum) %>% 
  group_by(month, Arrival_Departure) %>%
  summarise(FlightOpsCount = median(sum)) %>% 
  write.csv(., "flights_output.csv", row.names = FALSE, quote = FALSE)
step_3 <- step_2 %>% 
  mutate(month = factor(month.name[month], levels = month.name)) %>% 
  arrange(month)