在R中添加和合并两个数据帧
我有两个数据帧:在R中添加和合并两个数据帧,r,dataframe,R,Dataframe,我有两个数据帧: > df1 Long Short EURUSD 47295 16057 GBPUSD 17385 6861 USDJPY 7146 9369 USDCHF 2704 5162 USDCAD 4705 11947 AUDUSD 13041 6654 NZDUSD 7184 4000 > df2 Long Short EURUSD 318 408 GBPUSD 181 276 USDJPY 217
> df1
Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000
> df2
Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138
我需要最终数据帧如下所示:
> Final
Long Short
EURUSD 47613 16465
... ... ...
NZDUSD 7279 4138
合并/连接方法不起作用。谢谢你的帮助 如果数据没有行名(我个人的偏好,不总是可控的),这里有三种方法
您的数据:
df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000", header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "Symbol Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138", header = TRUE, stringsAsFactors = FALSE)
方法2:基R合并
此方法不依赖于两种情况下的有序行或甚至行的存在。为了演示这一点,我将从其中一个数据帧中删除一行:
df2 <- df2[-3,]
以及工作:
library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
mutate(
Long = psum(Long, Long2, na.rm = TRUE),
Short = psum(Short, Short2, na.rm = TRUE)
) %>%
select(-Long2, -Short2)
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7146 9369
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
编辑
你问题中的数据不具有代表性。根据您的评论,您真正拥有的似乎是:
str(df1)
# 'data.frame': 7 obs. of 2 variables:
# $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
# $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1
(为了将来的参考,如果您以明确的消费品形式提供数据,例如:
# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
.Names = c("Long", "Short"),
row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
class = "data.frame")
要从您的df1
获得我在上面读到的内容,只需执行以下操作:
# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)
#将新生因子转换为数字
df1[]df1+df2
不起作用吗?如果您的第一列是因子变量,它将按照@Vandenman的建议在尝试简单加法时输出NA
。在这种情况下,请使用cbind(df1[,1],df1[,2:3]+df2[,2:3])
。您的第一列(因子是如何实现的没有列名称?它看起来像行名称,这不应该影响df1+df2
这件事。如果Leo's不为您做这件事,您能通过包含dput(head(x))
的输出和“不工作”的意思(警告、错误等)来让它更具可复制性吗?Yes@r2evans它们是行名,我手动这么做是因为数据被刮取了。给行名一个列名会有帮助吗?Leo的解决方案给我一个错误“error in'[.data.frame'(df1,2:3):未定义的列被选中”虽然它们在美学上看起来很好,但我不喜欢在一般情况下使用行名称:它们可能很脆弱,一些实用程序无法保留它们(因此您需要努力使它们保持有序,但并不总是显而易见).怎么样,Andrew.G,这解决了你的问题吗?我正在尝试让选项起作用。I+1是因为你付出了所有的努力,但我还不能勾选答案,因为我无法让它起作用。具体地说,我的数据是从动态网页中刮取的,所以我不能做第一步。就像在“键入数据”中一样,我尝试用usi剥离行名问题是数字被视为因素。当我尝试使用df1[,c(1,2)]转换它们时,请阅读您的评论。(在我的回答中没有提到的原因是,您的问题中最初没有任何东西表明它们不是数字。如果您的样本数据是用类似于dput
的东西给出的,那么会更清楚。)
df2 <- df2[-3,]
library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
mutate(
Long = psum(Long, Long2, na.rm = TRUE),
Short = psum(Short, Short2, na.rm = TRUE)
) %>%
select(-Long2, -Short2)
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7146 9369
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
str(df1)
# 'data.frame': 7 obs. of 2 variables:
# $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
# $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1
# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
.Names = c("Long", "Short"),
row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
class = "data.frame")
# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)