R 合并来自同一数据帧的行
我想知道如果数据帧上的不同行有一个公共字段,是否可以合并它们: 输入:R 合并来自同一数据帧的行,r,merge,R,Merge,我想知道如果数据帧上的不同行有一个公共字段,是否可以合并它们: 输入: df = rbind(c("01/01/2016",01:02:30,"100","character(0)","file A"), c("02/01/2016",9:02:30,"character(0)", 3, "file A"), c("02/01/2016",8:30:30,"200","character(0)","file B"), c("03/
df = rbind(c("01/01/2016",01:02:30,"100","character(0)","file A"),
c("02/01/2016",9:02:30,"character(0)", 3, "file A"),
c("02/01/2016",8:30:30,"200","character(0)","file B"),
c("03/01/2016",8:25:30,"50","character(0)","file C"),
c("04/01/2016",17:20:30,"character(0)","600","file B"))
输出:
df = rbind(c(01/01/2016,01:02:30,"100",3,"file A"),
c(02/01/2016,8:30:30,"200",600,"file B"),
c(03/01/2016,8:25:30,"50","character(0)","file C"))
正如你所看到的,我们根据最后一个值(文件A、文件B或文件C)合并行。
我需要保留最早的日期。例如,对于“文件A”,我们有两个日期:2016年1月1日和2016年1月2日,我们希望保留
每个值合并的行数不超过2行
我们希望保留最早的日期基于您的评论您希望根据分组列(在您的案例中为“文件a/B/C”列)为每列查找非缺失值的第一个实例(按一列排序) 首先,您必须稍微清理一下数据。由于时间戳周围有一些错误的引号,数据加载步骤有问题。另外,我假设您希望用
字符(0)
值来表示缺少的值。如果是,则使用NA
s。以下是数据初始化和清理步骤:
# prepare your data
df = data.frame(V1 = c("01/01/2016 01:02:30","02/01/2016 9:02:30","02/01/2016 8:30:30",
"03/01/2016 8:25:30","04/01/2016 17:20:30"),
V2 = c("100","character(0)","200","50","character(0)"),
V3 = c("character(0)", "3", "character(0)","character(0)", "600"),
V4 = c("file A", "file A", "file B", "file C", "file B"))
# replace the character(0)s with NAs as they are missing values
df[df == "character(0)"] <- NA
# convert character dates to time
df$V1 <- strptime(as.character(df[ ,1]), format = "%d/%m/%Y %H:%M:%S")
以下是原始数据帧:
> df
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 <NA> file A
2 2016-01-02 09:02:30 <NA> 3 file A
3 2016-01-02 08:30:30 200 <NA> file B
4 2016-01-03 08:25:30 50 <NA> file C
5 2016-01-04 17:20:30 <NA> 600 file B
>df
V1 V2 V3 V4
1 2016-01-01:02:30 100文件A
2 2016-01-02 09:02:30 3文件A
3 2016-01-02 08:30:30 200文件B
4 2016-01-03 08:25:30 50文件C
5 2016-01-04 17:20:30 600文件B
由函数生成的,与您在问题中描述的匹配:
> custom_row_merge(df, "V1", "V4")
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 3 file A
3 2016-01-02 08:30:30 200 600 file B
4 2016-01-03 08:25:30 50 <NA> file C
>自定义行合并(df,“V1”,“V4”)
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 3文件A
3 2016-01-02 08:30:30 200 600文件B
4 2016-01-03 08:25:30 50文件C
当然,如果愿意,您可以使用
字符(0)
值来填充缺少的值。根据您的注释,您希望根据分组列(在您的情况下为“文件a/B/C”列)为每列查找非缺少值的第一个实例(按一列排序)
首先,您必须稍微清理一下数据。由于时间戳周围有一些错误的引号,数据加载步骤有问题。另外,我假设您希望用字符(0)
值来表示缺少的值。如果是,则使用NA
s。以下是数据初始化和清理步骤:
# prepare your data
df = data.frame(V1 = c("01/01/2016 01:02:30","02/01/2016 9:02:30","02/01/2016 8:30:30",
"03/01/2016 8:25:30","04/01/2016 17:20:30"),
V2 = c("100","character(0)","200","50","character(0)"),
V3 = c("character(0)", "3", "character(0)","character(0)", "600"),
V4 = c("file A", "file A", "file B", "file C", "file B"))
# replace the character(0)s with NAs as they are missing values
df[df == "character(0)"] <- NA
# convert character dates to time
df$V1 <- strptime(as.character(df[ ,1]), format = "%d/%m/%Y %H:%M:%S")
以下是原始数据帧:
> df
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 <NA> file A
2 2016-01-02 09:02:30 <NA> 3 file A
3 2016-01-02 08:30:30 200 <NA> file B
4 2016-01-03 08:25:30 50 <NA> file C
5 2016-01-04 17:20:30 <NA> 600 file B
>df
V1 V2 V3 V4
1 2016-01-01:02:30 100文件A
2 2016-01-02 09:02:30 3文件A
3 2016-01-02 08:30:30 200文件B
4 2016-01-03 08:25:30 50文件C
5 2016-01-04 17:20:30 600文件B
由函数生成的,与您在问题中描述的匹配:
> custom_row_merge(df, "V1", "V4")
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 3 file A
3 2016-01-02 08:30:30 200 600 file B
4 2016-01-03 08:25:30 50 <NA> file C
>自定义行合并(df,“V1”,“V4”)
V1 V2 V3 V4
1 2016-01-01 01:02:30 100 3文件A
3 2016-01-02 08:30:30 200 600文件B
4 2016-01-03 08:25:30 50文件C
当然,如果愿意,您可以使用
character(0)
值填充缺少的值。您可以使用dplyr::group_by
或数据。表的by=
和getmin()
值。顺便说一句,您的输入和输出示例会抛出警告。您可以使用dplyr::group_by
或数据。表的by=
并获取min()
值。顺便说一句,您的输入和输出示例会抛出警告。我得到以下消息:df[,1]@ManuelSopenaBallesteros中有错误您尝试过我的df赋值吗?你的问题有一个打字错误,H:M:S部分时间是分开的,没有评论。请告诉我这是否解决了您的问题。@manuelsopenaballestor抱歉,数据加载步骤有问题。查看我的更新答案。抱歉,这对我没有帮助。我需要合并两行,而不仅仅是第一行…@ManuelSopenaBallesteros哦,好的。现在我明白了。但是你想如何处理日期呢?你想要每个列的第一个日期和第一个非缺失值,对吗?我得到了:df[,1]@ManuelSopenaBallesteros中的错误你尝试过我的df赋值吗?你的问题有一个打字错误,H:M:S部分时间是分开的,没有评论。请告诉我这是否解决了您的问题。@manuelsopenaballestor抱歉,数据加载步骤有问题。查看我的更新答案。抱歉,这对我没有帮助。我需要合并两行,而不仅仅是第一行…@ManuelSopenaBallesteros哦,好的。现在我明白了。但是你想如何处理日期呢?您需要每个列的第一个日期和第一个非缺失值,对吗?