R 如何仅为非不同时间戳取列值的中值

R 如何仅为非不同时间戳取列值的中值,r,dataframe,dplyr,xts,tidyr,R,Dataframe,Dplyr,Xts,Tidyr,我正在尝试清理一些滴答声数据。我的数据格式很长。当我将其转换为宽时,它会显示 错误:行的标识符重复。时间列有几天的时间戳。SYM列中有许多股票的股票符号。这是我的示例数据: dput(jojo) structure(list(Time = structure(c(1459481850, 1459481850, 1459482302, 1459482305, 1459482305, 1459482307, 1459482307, 1459482309, 1459482312, 14594823

我正在尝试清理一些滴答声数据。我的数据格式很长。当我将其转换为宽时,它会显示
错误:行的标识符重复
。时间列有几天的时间戳。SYM列中有许多股票的股票符号。这是我的示例数据:

dput(jojo)
structure(list(Time = structure(c(1459481850, 1459481850, 1459482302, 
1459482305, 1459482305, 1459482307, 1459482307, 1459482309, 1459482312, 
1459482312, 1459482314, 1459482314, 1459482316, 1459482316, 1459482317, 
1459482317, 1459482318, 1459482319, 1459482319, 1459482320), class = c("POSIXct", 
"POSIXt"), tzone = "Asia/Calcutta"), PRICE = c(1371.25, 1371.25, 
1373.95, 1373, 1373, 1373.95, 1373.95, 1373.9, 1374, 1374, 1374.15, 
1374.15, 1374, 1374, 1373.85, 1373.85, 1372.55, 1374.05, 1374.05, 
1374.15), SIZE = c(39, 58, 5, 4, 7, 20, 5, 10, 21, 179, 10, 100, 
98, 78, 14, 11, 30, 10, 11, 39), SYM = c("A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B")), .Names = c("Time", "PRICE", "SIZE", "SYM"), row.names = c(NA, 
20L), class = "data.frame")
我需要首先找到相同的时间戳,然后为这些时间戳取价格和大小的中位数,并将这些相同的时间戳行替换为数据集中包含价格和大小中位数的单行。但我的代码总结了整个列,而不是股票符号的相同时间戳行。这是我的尝试:

#Cleaning duplicate time stamps
tt<- jojo %>%group_by(SYM )%>% summarise(Time = ifelse(n() >= 2, median, mean))
#Making wide form
tt<-spread(tt, SYM, PRICE)

请建议更正。如果我不使用高频软件包就可以进行清洁,那就太好了。

您需要选择是使用
dplyr
还是
xts
范例。它们不能很好地结合在一起,主要是因为
dplyr
需要数据帧和
xts
对象是矩阵
dplyr
还屏蔽了
stats::lag
generic,这会阻止方法分派(例如,在顶层运行
lag(.xts(1,1))
将不会达到预期效果)

要使用
xts
范例解决此问题,请执行以下操作:

# create a function to convert to xts and take medians of the two columns
unDuplicate <- function(x) {
  # create xts object
  X <- xts(x[,c("PRICE","SIZE")], x[,"Time"])
  # set column names so they will be unique in wide format
  colnames(X) <- paste(colnames(X), x[1,"SYM"], sep = ".")
  # function to take median of each column
  colMedian <- function(obj, ...) {
    apply(obj, 2, median, ...)
  }
  # aggregate by seconds
  period.apply(X, endpoints(X, "seconds"), colMedian)
}
# now you can call the function on each symbol, then merge the results
do.call(merge, lapply(split(jojo, jojo$SYM), unDuplicate))
#创建一个函数以转换为xts并获取两列的中间值

取消复制您提供的示例的预期输出是什么?
错误:不是矢量
是因为您没有为
中值
均值定义变量
我想为每个股票的重复时间戳取价格和大小的中值。sp
jojojo%>%group_by(time,SYM)%%>%mutate(PRICE=中值(PRICE),SIZE=中值(SIZE))%%>%filter(duplicated(time))
?@Sotos它应该首先找到相同的时间戳,然后为这些时间戳取价格和大小的中位数,并将这些相同的时间戳行替换为包含数据集中价格和大小中位数的单行。哦,好的,所以总结而不是变异(
jojojo%>%groupby(time,SYM)%%>%summary(价格=中位数(价格)),SIZE=median(SIZE))
)请帮我回答一个类似的问题
# create a function to convert to xts and take medians of the two columns
unDuplicate <- function(x) {
  # create xts object
  X <- xts(x[,c("PRICE","SIZE")], x[,"Time"])
  # set column names so they will be unique in wide format
  colnames(X) <- paste(colnames(X), x[1,"SYM"], sep = ".")
  # function to take median of each column
  colMedian <- function(obj, ...) {
    apply(obj, 2, median, ...)
  }
  # aggregate by seconds
  period.apply(X, endpoints(X, "seconds"), colMedian)
}
# now you can call the function on each symbol, then merge the results
do.call(merge, lapply(split(jojo, jojo$SYM), unDuplicate))