按隐式列合并R中的两个表
我有两张桌子按隐式列合并R中的两个表,r,dplyr,lapply,R,Dplyr,Lapply,我有两张桌子 tab1=structure(list(generated_id = c(482160724447511, 482160724447511 ), utc_time = structure(c(1L, 1L), .Label = "30.09.2018 12:46", class = "factor"), local_time = structure(c(1L, 1L), .Label = "30.09.2018 15:46", class = "factor"),
tab1=structure(list(generated_id = c(482160724447511, 482160724447511
), utc_time = structure(c(1L, 1L), .Label = "30.09.2018 12:46", class = "factor"),
local_time = structure(c(1L, 1L), .Label = "30.09.2018 15:46", class = "factor"),
user_locale = structure(c(1L, 1L), .Label = "en", class = "factor"),
network = structure(c(1L, 1L), .Label = "Facebook Installs", class = "factor"),
campaign = structure(c(1L, 1L), .Label = "(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120752)", class = "factor"),
adgroup = structure(c(1L, 1L), .Label = "(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)", class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
tab2=
structure(list(date = structure(c(1L, 1L), .Label = "10.10.2018", class = "factor"),
campaign_id = c(2.38431e+16, 2.38431e+16), ad_set_id = c(2.38431e+16,
2.38431e+16), spent = c(1.77, 13.85)), class = "data.frame", row.names = c(NA,
-2L))
tab2$campaign_id=tab1$campaign
tab2$ad_set_id=tab1$adgroup
通常我使用单函数合并
merge(tab1,tab2 , by =c("campaign", "adgroup"
))
但在这种情况下,我有困难,因为tab1$campaign
在括号中的末尾有id
(GR23)(BGM)(AND)(FB).... (***23843105743590752***)
(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (***23843105743590752***)
其中(***)是用于合并的id
在这种情况下,如何按活动和adgroup合并tab1和tab2,如果在括号末尾的tab1键id中?如果我正确理解您的问题,现在的问题是合并列子字符串上的表。 实现这一点的一种方法是提取该子字符串并将其添加到
tab1
由于tab1
中的行是相同的,并且tab2
中的id与tab1
中的任何一个都不匹配,因此我使用了不同的集合:
tab1 <- structure(list(campaign = c("(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120752)",
"(GR23)(BGM)(AND)(FB)(App Events)(US)(W35+)(27.09.2018) (23843105742120753)"),
adgroup = c("(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)",
"(GR23)(BGM)(AND)(FB)(META)(US)(W35+)(NONE)(APP_EV)(NONE)(PURCHASE)(NONE)(27.09.2018) (23843105743590752)"),
generated_id = c(482160724447511, 482160724447511)),
row.names = c(NA, -2L), class = "data.frame")
tab2 <- structure(list(campaign_id = c("23843105742120752", "23843105742120753"),
ad_set_id = c("23843105743590752", "23843105743590752"),
date = c("10.10.2018", "10.10.2018"), spent = c(1.77, 13.85)),
row.names = c(NA, -2L), class = "data.frame")
# Create a function that extracts the id from the last part
extract_id <- function(x){
s <- strsplit(as.character(x), " ")
s_id <- sapply(s, function(si) si[length(si)])
ids <- gsub("[^[:digit:] ]", "", s_id) # Remove all but digits/numbers
return(ids)
}
# Add the extracted id's to tab1
tab1$campaign_id <- extract_id(tab1$campaign)
tab1$adgroup_id <- extract_id(tab1$adgroup)
# Your result
result <- merge(tab1, tab2,
by.x = c("campaign_id", "adgroup_id"),
by.y = c("campaign_id", "ad_set_id"))
tab1要合并两个数据帧,您至少需要一个共同的列名,这里不是这种情况。请适当设置by.x
和by.y
。您给出的tab1
示例包含两个相同的行。这是一个有代表性的案例吗?此外,tab2
中的idcampaign\u id
和ad\u set\u id
采用科学记数法,这导致两个数据框中没有匹配的id