Mysql 如果某些字段为空且具有来自不同列的相关值,则更新表

Mysql 如果某些字段为空且具有来自不同列的相关值,则更新表,mysql,kettle,Mysql,Kettle,我正在kettle pentaho中编写一个ETL,以从包括google analytics在内的各种来源创建一个表 所以表1=从网站加入到谷歌分析信息的所有数据 表2=表1中所有与谷歌分析信息关联的重复数据 我的问题是,表1中的一些信息缺少google analytics信息,但表2显示了相同参考号上的google analytics数据 所以我想做的是从表1到表2查找[reference_number],并填充表1,其中一些列在表2中的信息中为空 快速示例编辑* Table 1 (Main

我正在kettle pentaho中编写一个ETL,以从包括google analytics在内的各种来源创建一个表

所以表1=从网站加入到谷歌分析信息的所有数据 表2=表1中所有与谷歌分析信息关联的重复数据

我的问题是,表1中的一些信息缺少google analytics信息,但表2显示了相同参考号上的google analytics数据

所以我想做的是从表1到表2查找[reference_number],并填充表1,其中一些列在表2中的信息中为空

快速示例编辑*

Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
  website_Reference_number   GA_info_1   GA_info_2 
  A1              null       null
  A2               x           y

Table 2 (Duplicates from Table 1)           
  eventlabel   GA_info_1   GA_info_2
  A1               z            z
  A2               x            y
我的输出应如下所示

Table 1 (Main Table)
Ref_number   GA_info_1   GA_info_2 
A1               z            z
A2               x            y
我正在使用My_SQL数据库

UPDATE mytable
LEFT JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL
将所有可能为空的字段放入where子句中

如果该字段不为null,则不会更新,因为它是
coalesce
函数中的第一个参数,如果为null,则将由另一个表的字段更新

编辑:您也可以这样尝试:

UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL
关于性能问题:(如评论中所述)

由于没有使用主键或外键来联接表,因此必须在Ref_number列上设置索引以加快联接速度

    UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;
将所有可能为空的字段放入where子句中

如果该字段不为null,则不会更新,因为它是
coalesce
函数中的第一个参数,如果为null,则将由另一个表的字段更新

编辑:您也可以这样尝试:

UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL
关于性能问题:(如评论中所述)

由于没有使用主键或外键来联接表,因此必须在Ref_number列上设置索引以加快联接速度

    UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;
--我只检查一个,因为如果其中一个为null,那么需要更改的其余列也可能为null


--我只检查一个,因为如果其中一个为null,那么需要更改的其他列也可能为null。这个速度很慢。有什么方法可以改进这个查询的运行时吗。我的代码如下
ID
您的主键在表上?我误读了你的帖子,你必须更改连接,使其使用
Ref\u number
,然后你可以在该链接上设置索引。如果使用
where CONCAT(mytable.GA_info_1,mytable.GA_info_2)为NULL,则where可能会更快。这些都是重复的,所以id连接不起作用,我不得不编写java来清理数据,现在表1中的链接是来自2的事件标签上的网站参考号。您能更新问题中表的结构吗?我们需要讨论相同的列,在你的样本中没有网站参考号。我加入的这些列是参考号。只是在不同的表中它们的名称不同。GA使用eventlabel,查询使用网站\参考\编号Hey Philipp。这个速度很慢。有什么方法可以改进这个查询的运行时吗。我的代码如下
ID
您的主键在表上?我误读了你的帖子,你必须更改连接,使其使用
Ref\u number
,然后你可以在该链接上设置索引。如果使用
where CONCAT(mytable.GA_info_1,mytable.GA_info_2)为NULL,则where可能会更快。这些都是重复的,所以id连接不起作用,我不得不编写java来清理数据,现在表1中的链接是来自2的事件标签上的网站参考号。您能更新问题中表的结构吗?我们需要讨论相同的列,在你的样本中没有网站参考号。我加入的这些列是参考号。只是在不同的表中它们的名称不同。GA使用事件标签,查询使用网站参考号如果其中一个检查为空,则所有检查都可能为空如果其中一个检查为空,则所有检查都可能为空