R 将多个数据帧合并到一列中
大家好,我再次来到这里是想问你们一些问题 我已经用dmy HM格式的两列“温度和日期”分隔了csv,这些文档来自单个数字温度计,一次只能存储4个月。 我想阅读所有这些文件,并将它们的可变温度放入一个新的df(Union)中,仅在一列中 为了制作单个文档,我创建了一个名为“date”的df,其日期时间序列比任何其他csv都大,以便将此“n”文档与列“date”合并,以便在日期相同时粘贴值 我的输入如下:R 将多个数据帧合并到一列中,r,data.table,R,Data.table,大家好,我再次来到这里是想问你们一些问题 我已经用dmy HM格式的两列“温度和日期”分隔了csv,这些文档来自单个数字温度计,一次只能存储4个月。 我想阅读所有这些文件,并将它们的可变温度放入一个新的df(Union)中,仅在一列中 为了制作单个文档,我创建了一个名为“date”的df,其日期时间序列比任何其他csv都大,以便将此“n”文档与列“date”合并,以便在日期相同时粘贴值 我的输入如下: Date<- seq(as.POSIXlt("2017-01-01 00:00:00",
Date<- seq(as.POSIXlt("2017-01-01 00:00:00", tz="UTC"),
as.POSIXlt("2017-03-01 00:00:00", tz="UTC"),
by="60 min")
temp = runif(1417, min = 32, max = 100)
df1 <- data.frame(Date,temp)
Date<- seq(as.POSIXlt("2017-03-01 00:00:00", tz="UTC"),
as.POSIXlt("2017-06-01 00:00:00", tz="UTC"),
by="60 min")
temp = runif(2209, min = 32, max = 100)
df2 <- data.frame(Date,temp)
这只适用于1个df,但是,我如何能够自动将2个df中的多个合并合并到Union中的单个列中呢
我希望你能帮助我。
谢谢我将生成一些示例数据,以演示如何使用
Reduce
和merge
。我假设您想要一个“宽”格式,每个位置都有一列
set.seed(123)
# list of 10 data.tables with columns date and temp
ldat <- lapply(1:10, function(x) data.table(date = sample(seq(as.Date('2016/01/01'), as.Date('2016/01/31'), by="day"), 12),
temp = runif(12, min = 32, max = 100)))
# right now, each table in the list has the same column name
# change the 'temp' column name to the location it was collected
loc_vect <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Indiana", "Iowa", "Kansas")
ldat <- lapply(1:10, function(x) setnames(ldat[[x]], c("date", "temp"), c("date", loc_vect[x])))
如果您希望它采用“高”格式(只有两列表示日期、临时,可能还有第三列表示位置),那么通过添加“位置”列并使用rbindlist(dat)
就更容易了。但是,从当前表来看,可以使用melt
melt(dat,
id.vars = "date",
variable.name = "Location",
value.name = "Temp")[!is.na(Temp)]
导致:
date Location Temp
1: 2016-01-02 Alabama 34.86005
2: 2016-01-09 Alabama 78.07480
3: 2016-01-10 Alabama 99.61034
4: 2016-01-11 Alabama 79.11063
5: 2016-01-12 Alabama 38.99888
---
116: 2016-01-19 Kansas 49.73827
117: 2016-01-24 Kansas 74.04788
118: 2016-01-25 Kansas 35.97654
119: 2016-01-26 Kansas 61.13266
120: 2016-01-31 Kansas 42.39633
如果没有样本数据集和所需的解决方案,我不能完全确定您的要求。您可以使用
stack()
> x <- data.frame(x = seq(1,3), y = seq(11,13), z = seq(21,23))
> x
x y z
1 1 11 21
2 2 12 22
3 3 13 23
> stack(x)
values ind
1 1 x
2 2 x
3 3 x
4 11 y
5 12 y
6 13 y
7 21 z
8 22 z
9 23 z
>x
x y z
1 1 11 21
2 2 12 22
3 3 13 23
>堆栈(x)
价值观
1×11
2×2
3 x
4.11年
5年12月
6月13日
7 21 z
8 22 z
9 23 z
我假设x、y、z在您的示例中是不同的气象站,这是在您创建了一个大表并希望从主表中获得一个两列数据集之后。Wow mate您的答案令人惊讶,适用于我的下一步,但这不是我需要的。我需要放在一列中的所有文件都是一个数字温度计的观测值,它只能存储4个月的观测值。所以,根据你的例子;我需要为每个数据帧设置4个不同月份的序列,其中日期从不相交,但我希望有更大的日期-时间序列,当信息为无时,它将显示NA。然后将所有温度数据合并到一个列中。我很难想象您正在尝试做什么。您能否提供几个示例文档,以及您希望通过组合它们获得的输出?
date Alabama Alaska Arizona Arkansas California Colorado Connecticut Indiana Iowa Kansas
1: 2016-01-01 NA 47.84632 NA 61.57271 NA NA NA NA NA NA
2: 2016-01-02 34.86005 NA 58.10994 NA NA NA NA NA NA 45.44664
3: 2016-01-03 NA NA 86.01528 76.41093 42.00244 61.88134 46.82337 NA NA NA
4: 2016-01-04 NA 60.18915 62.49911 NA NA 98.62789 NA NA 64.77890 NA
5: 2016-01-05 NA NA 87.24249 NA NA NA NA NA NA NA
6: 2016-01-06 NA NA NA 55.35912 NA NA 68.29078 50.02121 46.70533 NA
7: 2016-01-07 NA NA NA 92.72748 NA 76.86901 46.74871 NA NA NA
8: 2016-01-08 NA 41.71040 74.78704 NA NA NA 47.03500 53.24648 NA 68.86146
9: 2016-01-09 78.07480 NA 77.22783 40.88731 96.44543 67.43723 56.17029 94.09680 NA 88.57107
10: 2016-01-10 99.61034 63.68545 NA NA NA 82.12129 NA NA NA NA
11: 2016-01-11 79.11063 NA NA 92.27990 NA NA NA NA 43.60388 79.48179
12: 2016-01-12 38.99888 NA NA NA 69.35136 NA 57.48055 93.32746 NA 86.63246
13: 2016-01-13 92.48867 NA 50.65809 NA 81.00055 NA NA 89.10421 35.24113 94.26648
14: 2016-01-14 54.29861 NA NA 98.97707 95.60039 32.71176 NA NA 51.60027 NA
15: 2016-01-15 NA NA 87.08438 76.65955 52.48357 NA NA 70.39215 NA NA
16: 2016-01-16 NA 53.63631 NA 43.90358 NA 59.84430 NA NA NA NA
17: 2016-01-17 NA 47.75055 61.90855 NA 36.12900 NA NA 50.02223 NA 59.00633
18: 2016-01-18 NA 41.43881 NA NA NA 44.50177 NA NA 43.70768 NA
19: 2016-01-19 NA NA 83.30431 NA NA NA NA 80.16374 91.85676 49.73827
20: 2016-01-20 NA NA NA NA 96.87820 NA 56.06551 64.72771 NA NA
21: 2016-01-21 75.55446 83.57525 NA NA NA 77.76394 NA NA NA NA
22: 2016-01-22 96.90625 46.71574 87.39552 NA 71.81287 NA 76.19899 53.86083 77.85759 NA
23: 2016-01-23 NA NA NA 38.99480 41.67601 NA NA NA NA NA
24: 2016-01-24 70.93907 NA NA NA NA NA NA 72.41534 NA 74.04788
25: 2016-01-25 93.18810 60.13325 NA 53.78538 59.92690 53.19575 NA NA NA 35.97654
26: 2016-01-26 48.73397 NA 38.44916 44.76300 NA 85.46715 NA NA NA 61.13266
27: 2016-01-27 NA NA NA NA NA NA 70.89160 NA 79.65801 NA
28: 2016-01-28 NA NA NA NA NA NA 82.34274 NA 56.75825 NA
29: 2016-01-29 NA 42.36624 NA NA NA NA 66.15637 50.64333 49.20162 NA
30: 2016-01-30 NA 57.08149 NA NA NA 87.88277 62.24422 NA NA NA
31: 2016-01-31 NA NA NA NA 59.50670 NA NA NA 59.37499 42.39633
date Alabama Alaska Arizona Arkansas California Colorado Connecticut Indiana Iowa Kansas
melt(dat,
id.vars = "date",
variable.name = "Location",
value.name = "Temp")[!is.na(Temp)]
date Location Temp
1: 2016-01-02 Alabama 34.86005
2: 2016-01-09 Alabama 78.07480
3: 2016-01-10 Alabama 99.61034
4: 2016-01-11 Alabama 79.11063
5: 2016-01-12 Alabama 38.99888
---
116: 2016-01-19 Kansas 49.73827
117: 2016-01-24 Kansas 74.04788
118: 2016-01-25 Kansas 35.97654
119: 2016-01-26 Kansas 61.13266
120: 2016-01-31 Kansas 42.39633
> x <- data.frame(x = seq(1,3), y = seq(11,13), z = seq(21,23))
> x
x y z
1 1 11 21
2 2 12 22
3 3 13 23
> stack(x)
values ind
1 1 x
2 2 x
3 3 x
4 11 y
5 12 y
6 13 y
7 21 z
8 22 z
9 23 z