Sql server 在SSMS中连接多个表而不重复计算记录

Sql server 在SSMS中连接多个表而不重复计算记录,sql-server,Sql Server,我的数据仓库中有几个新表,我需要找到正确连接的方法。我的最终目标是根据客户的首次计划注册查看其完整信息 提前道歉,因为这篇文章的背景很长 为此,我在SSMS工作。这里有7个相关表格,以及三种计划类型(活动、联盟、日营)。下面是虚拟数据 个人 personID firstname lastname 1 mark smith 2 mike boy 活动 activityID activityName createdDate ac

我的数据仓库中有几个新表,我需要找到正确连接的方法。我的最终目标是根据客户的首次计划注册查看其完整信息

提前道歉,因为这篇文章的背景很长

为此,我在SSMS工作。这里有7个相关表格,以及三种计划类型(活动、联盟、日营)。下面是虚拟数据

个人

personID  firstname  lastname
1         mark       smith
2         mike       boy
活动

activityID   activityName   createdDate  activityType
100          skating        01-01-2019   january
200          hockey         01-10-2019   february
活动注册

activityID  activityName  personID  createdDate  paidAmount
100         skating       1         01-06-2019   10
200         hockey        1         01-12-2019   25
100         skating       2         01-13-2019   10
leagueID  leagueName   personID  createdDate  paidAmount
1         Adult Hockey 1         01-16-19     100
1         Adult Hockey 2         01-12-19     100
联盟

leagueID  leagueName    createdDate   leagueType
1         Adult Hockey  01-10-19      West
personID firstName lastName firstProgramSource firstProgramID firstProgramName firstProgramType totalPrograms  totalSpend
1        mark      smith    Activity           100            skating          january          3              135 
2        mike      boy      League             1              Adult Hockey     West             3              110  
联盟注册

activityID  activityName  personID  createdDate  paidAmount
100         skating       1         01-06-2019   10
200         hockey        1         01-12-2019   25
100         skating       2         01-13-2019   10
leagueID  leagueName   personID  createdDate  paidAmount
1         Adult Hockey 1         01-16-19     100
1         Adult Hockey 2         01-12-19     100
还有日营日营注册表格,其数据设置与上述四个表格相同

select I.personid, 
       I.firstname, 
       I.lastname,
       'Activity' as Source,
       (isnull(ActivityPay,0) + isnull(LeaguePay,0) + isnull(DCPay,0)) as 'TotalPaid',
       (isnull(TotalActivities,0) + isnull(TotalLeagues,0) + isnull(TotalDCs,0)) as 'TotalRegistrations'
from Individuals I

       left join (
            select PersonID, sum(paidamount) as 'ActivityPay', count(registrationid) as 'TotalActivities'
            from ActivityRegistration
            group by PersonID
                 ) A on I.PersonID = A.PersonID

       left join (
            select personid, sum(PaidAmount) as 'LeaguePay', count(registrationid) as 'TotalLeagues'
            from ro.vw_MaxGalaxy_LeaguePlayerRegistrations
            group by PersonID, ArenaName
                 ) L on I.PersonID = L.PersonID

where I.PersonID in
   (
   select PersonID
   from ActivityRegistration
   where CreatedDate in (
      select
         (
         select min(Event)
         from (values (firstleague), (firstactivity), (firstdaycamp)) as v (Event)
         ) as FirstRegistration
         from
             (
             select i.personid, i.FirstName, i.LastName, min(l.createddate) as 'firstleague', min(a.createddate) as 'firstactivity', min(d.createddate) as 'firstdaycamp'
             from Individuals I
             left join ActivityRegistration A on I.PersonID = A.PersonID
             left join LeaguePlayerRegistration L on I.PersonID = L.PersonID
             left join DayCampRegistration D on I.PersonID = D.PersonID
             group by i.PersonID, i.firstname, i.lastname 
             ) as derived
         )
    )
这基本上就是我想出来的。这导致了一个错误的假设,即createdDate可以用作唯一标识符,并且它一次只查看一种程序类型(请注意它如何仅从
ActivityRegistration
中提取;我与SSMS环境中的其他两种程序类型合并)。这有助于我了解一个人及其总计划/总支出,但不允许我查看第一个计划

我曾尝试以其他方式拉取它,但我不断地在拉取min(createdDate)和拉取ActivityID时挂断。如果我按ActivityID和PersonID分组,我会得到每个ActivityID的min(createdDate)

最终的目标是创建一个表,将所有这些信息关联回客户级别(包括一个简单的
“活动”作为源代码行)

目标表

leagueID  leagueName    createdDate   leagueType
1         Adult Hockey  01-10-19      West
personID firstName lastName firstProgramSource firstProgramID firstProgramName firstProgramType totalPrograms  totalSpend
1        mark      smith    Activity           100            skating          january          3              135 
2        mike      boy      League             1              Adult Hockey     West             3              110  

如果我没有说太多,有什么方法可以实现我的目标吗?

你已经很接近了。看起来你被WHERE条款卡住了。一个更简单的策略是收集两种不同类型的聚合:将SUM/Count与Min/Max分开。您的查询看起来更像这样:

select I.personid, 
        I.firstname, 
        I.lastname,
        --'Activity' as Source,
        CASE WHEN IsNull(A1.FirstDate,'1/1/1900') < IsNull(L1.FirstDate,'1/1/1900') THEN 'Activity' 
            WHEN IsNull(A1.FirstDate,'1/1/1900') > IsNull(L1.FirstDate,'1/1/1900') THEN 'League'
            ELSE 'Neither'
        END AS FirstProgramSource,
        CASE WHEN IsNull(A1.FirstDate,'1/1/1900') < IsNull(L1.FirstDate,'1/1/1900') THEN A1.ActivityName 
            WHEN IsNull(A1.FirstDate,'1/1/1900') > IsNull(L1.FirstDate,'1/1/1900') THEN L1.LeagueName
            ELSE 'Neither'
        END AS FirstProgramName,
        CASE WHEN IsNull(A1.FirstDate,'1/1/1900') < IsNull(L1.FirstDate,'1/1/1900') THEN A1.ActivityType 
            WHEN IsNull(A1.FirstDate,'1/1/1900') > IsNull(L1.FirstDate,'1/1/1900') THEN L1.LeagueType
            ELSE 'Neither'
        END AS FirstProgramType,       
        (isnull(ActivityPay,0) + isnull(LeaguePay,0) + isnull(DCPay,0)) as TotalPaid,
        (isnull(TotalActivities,0) + isnull(TotalLeagues,0) + isnull(TotalDCs,0)) as TotalRegistrations
from Individuals I

        left join (
            select PersonID, sum(paidamount) as ActivityPay, count(registrationid) as TotalActivities
            from ActivityRegistration
            group by PersonID
                    ) A on I.PersonID = A.PersonID

        left join (
            select PersonID, sum(PaidAmount) as LeaguePay, count(registrationid) as TotalLeagues
            from LeagueRegistrations
            group by PersonID--, ArenaName
                    ) L on I.PersonID = L.PersonID

--   Get the "First Activity" separately from your other aggregate (sum, count, etc).
        left join ( --TOP 1 will eliminate duplicates, if you have two with the same FirstDate
            select TOP 1 PersonID, A.ActivityID, ActivityName, ProgramType, FirstDate 
            from (   -- SELECT PersonID, ActivityID, Min(CreatedDate) FirstDate 
                    SELECT PersonID, Min(CreatedDate) FirstDate 
                FROM ActivityRegistration 
                GROUP BY PersonID --, ActivityID
                ) AFirst
                INNER JOIN ActivityRegistration AR ON AFirst.PersonID = AR.PersonID 
                    AND AFirst.FirstDate = AR.CreatedDate
                INNER JOIN Activity A ON AR.ActivityID = A.ActivityID
            ) A1 on I.PersonID = A1.PersonID

        left join (
            select PersonID, L.LeagueID, LeagueName, LeagueType, FirstDate 
            from (SELECT PersonID, LeagueID, Min(CreatedDate) FirstDate 
                    FROM LeagueRegistration 
                    GROUP BY PersonID, LeagueID
                    ) LR 
                    INNER JOIN League L ON LR.LeagueID = L.LeagueID
            ) L1 on I.PersonID = L1.PersonID

谢谢,这在概念上很有意义。这个解决方案不起作用的地方和让我感到困惑的地方是在最后两个连接中,您需要按PersonID和ActivityID进行分组。根据该逻辑,每次出现具有ActivityID的PersonID时都会返回一条记录。在我在文章中的原始数据中,它为马克·史密斯的两项活动返回两行,然后为他的联盟返回第三行。我只想看一个,好的。我明白问题所在。我会修改我的答案,把我的错误注释掉,换成更好的答案。这个新策略是只抓取一个人第一次(有史以来)活动的日期,然后将其他活动与之联系起来。我没有展示联盟的匹配逻辑。如果第一个查询有效,我可以补充一点。哦,最后一件事:添加额外的连接而不是子查询或WHERE子句的原因是因为我发现这些类型的连接子句运行得更快(出于某种原因)。此外,我发现有时通过隔离(注释掉其他子句)更容易进行故障排除。分而治之的策略。这是如此接近工作。我现在看到的唯一问题是TOP 1命令没有引用任何PersonID分组,因此它只给出所有记录中的TOP 1。此问题导致FirstProgramSource/Name/Type对每条记录都不适用。有没有可能用我的数据来拉这个?好的。如果您没有任何
ActivityRegistration
记录,其中一个人有两个相同的“第一次约会”,那么您可以删除
TOP 1
语法。