在R中,是否有一种方法可以在只提取部分列名的同时收集数据帧?
概述 所以,我希望整理我的数据框架。我已经找到了解决问题的方法,但在处理大型数据集时,效率似乎很低。目前,我的代码收集我的数据帧,应用一个单独的函数将股票代码与指标分开,然后适当地传播数据。请参见下面的示例 数据帧在R中,是否有一种方法可以在只提取部分列名的同时收集数据帧?,r,tidyr,data-wrangling,R,Tidyr,Data Wrangling,概述 所以,我希望整理我的数据框架。我已经找到了解决问题的方法,但在处理大型数据集时,效率似乎很低。目前,我的代码收集我的数据帧,应用一个单独的函数将股票代码与指标分开,然后适当地传播数据。请参见下面的示例 数据帧 structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06", "2009-07-07", "2009-07-08&
structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06",
"2009-07-07", "2009-07-08"), PRED.Open = c(0.5, 0.5, 0.7, 0.7,
0.7), PRED.High = c(0.5, 0.6, 0.7, 0.7, 0.7), PRED.Low = c(0.5,
0.5, 0.5, 0.7, 0.7), PRED.Close = c(0.5, 0.6, 0.5, 0.7, 0.7),
PRED.Volume = c(0L, 300L, 200L, 0L, 0L), PRED.Adjusted = c(0.5,
0.6, 0.5, 0.7, 0.7), GDM.Open = c(1041.02002, 1085.109985,
1052.02002, 1011.429993, 1006.630005), GDM.High = c(1097.790039,
1085.109985, 1052.02002, 1029.290039, 1006.630005), GDM.Low = c(1041.02002,
1038.540039, 995.450012, 1005.280029, 948.73999), GDM.Close = c(1085.109985,
1052.02002, 1011.429993, 1006.630005, 966.22998), GDM.Volume = c(0L,
0L, 0L, 0L, 0L), GDM.Adjusted = c(1085.109985, 1052.02002,
1011.429993, 1006.630005, 966.22998), NBL.Open = c(29.885,
29.325001, 27.370001, 27.485001, 26.815001), NBL.High = c(30.35,
29.325001, 27.545, 27.610001, 27.18), NBL.Low = c(29.83,
28.07, 26.825001, 26.605, 25.745001)), row.names = c(NA,
-5L), class = "data.frame")
当前解决方案
df <- df %>% gather(c(2:ncol(df)), key = "ticker", value = "val")
df <- separate(df, col = "ticker", into = c("ticker", "metric"), sep = "\\.") %>%
ungroup() %>%
spread(key = "metric", value = "val") %>%
arrange(ticker, date)
df%聚集(c(2:ncol(df)),key=“ticker”,value=“val”)
df%
解组()%>%
价差(key=“metric”,value=“val”)%>%
安排(股票代码、日期)
期望的结果
structure(list(date = c("2009-07-01", "2009-07-02", "2009-07-06",
"2009-07-07", "2009-07-08"), PRED.Open = c(0.5, 0.5, 0.7, 0.7,
0.7), PRED.High = c(0.5, 0.6, 0.7, 0.7, 0.7), PRED.Low = c(0.5,
0.5, 0.5, 0.7, 0.7), PRED.Close = c(0.5, 0.6, 0.5, 0.7, 0.7),
PRED.Volume = c(0L, 300L, 200L, 0L, 0L), PRED.Adjusted = c(0.5,
0.6, 0.5, 0.7, 0.7), GDM.Open = c(1041.02002, 1085.109985,
1052.02002, 1011.429993, 1006.630005), GDM.High = c(1097.790039,
1085.109985, 1052.02002, 1029.290039, 1006.630005), GDM.Low = c(1041.02002,
1038.540039, 995.450012, 1005.280029, 948.73999), GDM.Close = c(1085.109985,
1052.02002, 1011.429993, 1006.630005, 966.22998), GDM.Volume = c(0L,
0L, 0L, 0L, 0L), GDM.Adjusted = c(1085.109985, 1052.02002,
1011.429993, 1006.630005, 966.22998), NBL.Open = c(29.885,
29.325001, 27.370001, 27.485001, 26.815001), NBL.High = c(30.35,
29.325001, 27.545, 27.610001, 27.18), NBL.Low = c(29.83,
28.07, 26.825001, 26.605, 25.745001)), row.names = c(NA,
-5L), class = "data.frame")
问题
df <- df %>% gather(c(2:ncol(df)), key = "ticker", value = "val")
df <- separate(df, col = "ticker", into = c("ticker", "metric"), sep = "\\.") %>%
ungroup() %>%
spread(key = "metric", value = "val") %>%
arrange(ticker, date)
有没有更有效的方法来实现这一点?如果您从
tidyr
1.0.0使用pivot\u更长的时间
tidyr::pivot_longer(df,
cols = -date,
names_to = c('ticker', '.value'),
names_sep = '\\.') %>%
dplyr::arrange(ticker, date)
# A tibble: 15 x 8
# date ticker Open High Low Close Volume Adjusted
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
# 1 2009-07-01 GDM 1041.0 1097.8 1041.0 1085.1 0 1085.1
# 2 2009-07-02 GDM 1085.1 1085.1 1038.5 1052.0 0 1052.0
# 3 2009-07-06 GDM 1052.0 1052.0 995.45 1011.4 0 1011.4
# 4 2009-07-07 GDM 1011.4 1029.3 1005.3 1006.6 0 1006.6
# 5 2009-07-08 GDM 1006.6 1006.6 948.74 966.23 0 966.23
# 6 2009-07-01 NBL 29.885 30.35 29.83 NA NA NA
# 7 2009-07-02 NBL 29.325 29.325 28.07 NA NA NA
# 8 2009-07-06 NBL 27.370 27.545 26.825 NA NA NA
# 9 2009-07-07 NBL 27.485 27.610 26.605 NA NA NA
#10 2009-07-08 NBL 26.815 27.18 25.745 NA NA NA
#11 2009-07-01 PRED 0.5 0.5 0.5 0.5 0 0.5
#12 2009-07-02 PRED 0.5 0.6 0.5 0.6 300 0.6
#13 2009-07-06 PRED 0.7 0.7 0.5 0.5 200 0.5
#14 2009-07-07 PRED 0.7 0.7 0.7 0.7 0 0.7
#15 2009-07-08 PRED 0.7 0.7 0.7 0.7 0 0.7
tidyr::pivot_更长(df,
cols=-date,
name_to=c('ticker','.value'),
名称\u sep='\\.')%>%
dplyr::排列(股票代码、日期)
#一个tibble:15x8
#日期股市开盘价高低收盘价调整
#
#1 2009-07-01 GDM 1041.0 1097.8 1041.0 1085.1 0 1085.1
#2 2009-07-02 GDM 1085.1 1085.1 1038.5 1052.0 0 1052.0
#3 2009-07-06 GDM 1052.0 1052.0 995.45 1011.4 0 1011.4
#4 2009-07-07 GDM 1011.4 1029.3 1005.3 1006.6 0 1006.6
#5 2009-07-08 GDM 1006.61006.6948.74966.230966.23
#6 2009-07-01 NBL 29.885 30.35 29.83不适用
#7 2009-07-02 NBL 29.325 29.325 28.07不适用
#8 2009-07-06 NBL 27.370 27.545 26.825不适用
#9 2009-07-07 NBL 27.485 27.610 26.605不适用
#10 2009-07-08 NBL 26.815 27.18 25.745不适用
#11 2009-07-01预测日期0.50.50.50.50.50.5
#12 2009-07-02 PRED 0.50.6 0.50.6 300 0.6
#13 2009-07-06 PRED 0.7 0.7 0.5 0.5 200 0.5
#14 2009-07-07 PRED 0.70.70.70.70.70 0.7
#15 2009-07-08 PRED 0.70.70.70.70.70 0.7