Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
获取数据帧R的所有列的平均值_R_Dataframe_Sum_Multiple Columns_Mean - Fatal编程技术网

获取数据帧R的所有列的平均值

获取数据帧R的所有列的平均值,r,dataframe,sum,multiple-columns,mean,R,Dataframe,Sum,Multiple Columns,Mean,我有一个由多列组成的数据框。每列代表一年中的一天(我有365列),每行代表一个特定城市的平均温度。我想得到所有列的平均值,所以我得到了全年的平均温度。我还想得到每个月的平均值(即01(1月)、02(2月)等的平均值),以及一年中每个季度的平均值 我的数据看起来像这样 data <- data.frame(City = c("London", "Stockholm", "Paris", "Prag", "

我有一个由多列组成的数据框。每列代表一年中的一天(我有365列),每行代表一个特定城市的平均温度。我想得到所有列的平均值,所以我得到了全年的平均温度。我还想得到每个月的平均值(即01(1月)、02(2月)等的平均值),以及一年中每个季度的平均值

我的数据看起来像这样

data <- data.frame(City = c("London", "Stockholm", "Paris", "Prag", "Berlin", "Copenhagen"), 
                   20100101 = c(4, 5, 3, 4, 6, 7), 20100102 = c(2, 5, 8, 6, 1, 3), 
                   20100205 = c(4, 7, 6, 1, 3, 4), 20100305 = c(0, 3, 7, 9, 3, 2), 
                   20100525 = c(9, 8, 7, 6, 5, 4), 20100719 = c(9, 10, 5, 6, 7, 8), 
                   20101011 = c(15, 3, 5, 7, 8, 9), 20101112 = c(3, 7, 1, 1, 1, 1), 
                   20101212 = c(0, 0, 0, 5, 2, 1))

数据如果您以长格式获取数据,则处理此问题会容易得多

library(dplyr)

long_data <- data %>% 
             tidyr::pivot_longer(cols = -City) %>% 
             mutate(name = as.Date(name, '%Y%m%d'))
月平均数:

long_data %>%
  group_by(City, month = lubridate::month(name)) %>%
  #For quarter
  #group_by(City, quarter = quarter(name)) %>%
  summarise(month_mean = mean(value,na.rm = TRUE))

我们可以在
baser
中使用
rowMeans
split.default

# // convert the date columns to `Date` class
dates <- as.Date(names(data)[-1], "%Y%m%d")
# // get the row wise mean of numeric columns (except the first column)
city_means <- rowMeans(data[-1])
names(city_means) <- data$City
 
# // split the data into list of data.frame based on the month
# // loop over the list with sapply and get the rowMeans
month_means <- sapply(split.default(data[-1], format(dates, "%b")),
      rowMeans, na.rm = TRUE)
row.names(month_means) <- data$City

# // split by year quarters and get the rowMeans for each list element
quarter_means <- sapply(split.default(data[-1], paste(format(dates, "%Y"), 
               quarters(dates))), rowMeans, na.rm = TRUE)
row.names(quarter_means) <- data$City
#//将日期列转换为'date'类

谢谢Ronak!我只是对第一个代码有点小问题。当我将它转换为一个长表并添加一行mutate时,我得到(总共)三列。一列为城市名称,填充的是同一个城市;一列为“名称”,填充的是NA(?),最后一列为“值”,填充的是以前的值。为什么我得到了“姓名”栏?是不是因为我的专栏实际上是这样的?2015_02_15而不是20150215?@paula456在这种情况下,将
mutate
行更改为
mutate(name=as.Date(name,“%Y\u%m\u%d”)
。您也可以在这里使用
lubridate
ymd
,比如
mutate(name=lubridate::ymd(name))
我意识到了问题所在。我将一个sf对象转换为sp对象,我在“年份”列中的所有数据在所有值前面都有一个X。我刚刚添加了“X%Y\u%m\u%d”,效果很好:)非常感谢!!
# // convert the date columns to `Date` class
dates <- as.Date(names(data)[-1], "%Y%m%d")
# // get the row wise mean of numeric columns (except the first column)
city_means <- rowMeans(data[-1])
names(city_means) <- data$City
 
# // split the data into list of data.frame based on the month
# // loop over the list with sapply and get the rowMeans
month_means <- sapply(split.default(data[-1], format(dates, "%b")),
      rowMeans, na.rm = TRUE)
row.names(month_means) <- data$City

# // split by year quarters and get the rowMeans for each list element
quarter_means <- sapply(split.default(data[-1], paste(format(dates, "%Y"), 
               quarters(dates))), rowMeans, na.rm = TRUE)
row.names(quarter_means) <- data$City
data <- structure(list(City = c("London", "Stockholm", "Paris", "Prag", 
"Berlin", "Copenhagen"), `20100101` = c(4, 5, 3, 4, 6, 7), `20100102` = c(2, 
5, 8, 6, 1, 3), `20100205` = c(4, 7, 6, 1, 3, 4), `20100305` = c(0, 
3, 7, 9, 3, 2), `20100525` = c(9, 8, 7, 6, 5, 4), `20100719` = c(9, 
10, 5, 6, 7, 8), `20101011` = c(15, 3, 5, 7, 8, 9), `20101112` = c(3, 
7, 1, 1, 1, 1), `20101212` = c(0, 0, 0, 5, 2, 1)), 
class = "data.frame", row.names = c(NA, 
-6L))