Sql 如何通过一个不与表交叉的值是SAS例如

Sql 如何通过一个不与表交叉的值是SAS例如,sql,sas,proc-sql,cross-join,enterprise-guide,Sql,Sas,Proc Sql,Cross Join,Enterprise Guide,我有两张桌子: 第一个是客户id和店铺id:每个客户都有几个他访问过的店铺id。 第二个有所有店铺标识。 我需要从表1中获取客户访问过的随机商店id。它可能是表1中的minshop id 以及客户未访问过的表2中的随机商店id。 似乎交叉连接可以帮助: proc sql; select a.client_id, min(a.shop_id) as id_1, min(b.shop_id) as id_2     from table_1 a, table_2 b where a.shop_id

我有两张桌子:

第一个是客户id和店铺id:每个客户都有几个他访问过的店铺id。 第二个有所有店铺标识。 我需要从表1中获取客户访问过的随机商店id。它可能是表1中的minshop id

以及客户未访问过的表2中的随机商店id。 似乎交叉连接可以帮助:

proc sql;
select a.client_id, min(a.shop_id) as id_1, min(b.shop_id) as id_2
    from table_1 a, table_2 b
where a.shop_id <> b.shop_id
group by 1
;quit;
但问题是表格非常庞大,这种方法需要无限长的时间。 你能帮忙吗? 这里有一个使用左连接的方法:

这里有一个使用Except操作符的方法,假设您还有一个clients表,从所有客户/店铺对的集合中减去访问的店铺集合。如果要排除没有访问过任何店铺或访问过所有店铺的客户,只需将两个左联接更改为常规联接

以下是我使用以下测试脚本在电脑上获得的性能:

原始查询执行时间:32.53秒CPU时间

更新的查询执行时间:0.10秒CPU时间

完整的测试脚本如下

%let shop_count = 1000;
%let client_count = 100;
%let visit_count = 50000;

data shops;
    do shop_id = 1 to &shop_count;
        output;
    end;
run;

data clients;
    do client_id = 1 to &client_count;
        output;
    end;
run;

data client_shop_visits;
    do visit_id = 1 to &visit_count;
        client_id = rand("Integer", 1, &client_count);
        shop_id = rand("Integer", 1, &shop_count);
        output;
    end;
run;

proc sql;
    create table unvisited_shops_original as
        select a.client_id, min(a.shop_id) as id_1, min(b.shop_id) as id_2
            from client_shop_visits a, shops b
        where a.shop_id <> b.shop_id
        group by 1
    ;
run;

proc sql;
    create table unvisited_shops_updated as
        select  c.client_id,
                u1.first_unvisited_shop,
                v1.first_visited_shop
        from clients c
        left join ( /* For each client, get the first shop_id they havn't visited */
            select  u.client_id,
                    MIN(u.shop_id) as first_unvisited_shop
            from (
                select  c.client_id, /* Get list of all client/shop combinations */
                        s.shop_id
                from clients c
                cross join shops s

                except /* Remove client/shop combinations that have been visited */

                select  v.client_id,
                        v.shop_id
                from client_shop_visits v
            ) u
            group by u.client_id
        ) u1
            on u1.client_id = c.client_id
        left join ( /* For each client, get the first shop_id they have visited */
            select  v.client_id,
                    MIN(v.shop_id) as first_visited_shop
            from client_shop_visits v
            group by v.client_id
        ) v1
            on v1.client_id = c.client_id
        order by c.client_id
    ;
run;

另一种选择是过滤掉客户没有光顾的商店,然后再经营

monotonic()
它会计算出顾客从未光顾过的商店,然后对顾客做同样的计算,然后simpy加入他们

PROC SQL;
CREATE TABLE WORK.QUERY_FOR_FISH AS 
   SELECT DISTINCT t1.Species, 
          /* birds_monotonic */
            (monotonic()) AS birds_monotonic
      FROM SASHELP.FISH t1;


CREATE TABLE WORK.QUERY_FOR_CARS AS 
   SELECT DISTINCT t1.Make, 
          t1.Model, 
          t1.Type, 
          /* cars_monotonic */
            (monotonic()) AS cars_monotonic
      FROM SASHELP.CARS t1;


CREATE TABLE WORK.QUERY_FOR_FISH_0000 AS 
   SELECT DISTINCT t1.Species, 
          t1.birds_monotonic, 
          t2.Make, 
          t2.Model, 
          t2.Type, 
          t2.cars_monotonic
      FROM WORK.QUERY_FOR_FISH t1
           LEFT JOIN WORK.QUERY_FOR_CARS t2 ON (t1.birds_monotonic = t2.cars_monotonic);
QUIT;

商店与顾客商店参观的大致比例是多少?是否有10000家店铺,平均每个客户只访问3次?或者有10家商店,平均每个顾客访问8家?而不是10000家商店,平均每个顾客访问10家
monotonic()
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_FISH AS 
   SELECT DISTINCT t1.Species, 
          /* birds_monotonic */
            (monotonic()) AS birds_monotonic
      FROM SASHELP.FISH t1;


CREATE TABLE WORK.QUERY_FOR_CARS AS 
   SELECT DISTINCT t1.Make, 
          t1.Model, 
          t1.Type, 
          /* cars_monotonic */
            (monotonic()) AS cars_monotonic
      FROM SASHELP.CARS t1;


CREATE TABLE WORK.QUERY_FOR_FISH_0000 AS 
   SELECT DISTINCT t1.Species, 
          t1.birds_monotonic, 
          t2.Make, 
          t2.Model, 
          t2.Type, 
          t2.cars_monotonic
      FROM WORK.QUERY_FOR_FISH t1
           LEFT JOIN WORK.QUERY_FOR_CARS t2 ON (t1.birds_monotonic = t2.cars_monotonic);
QUIT;