Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 基于其他列中最常见的值(不包括外围值)筛选结果_Sql_Sql Server - Fatal编程技术网

Sql 基于其他列中最常见的值(不包括外围值)筛选结果

Sql 基于其他列中最常见的值(不包括外围值)筛选结果,sql,sql-server,Sql,Sql Server,考虑下表和伪查询: Distinct Customers WHERE Most common PaymentMethod = 'CreditCard' AND Most common DeliveryService = '24hr' Customer TransID PaymentMethod DeliveryService ----------------------------------------------------- Susan 1 Cr

考虑下表和伪查询:

Distinct Customers
WHERE
Most common PaymentMethod = 'CreditCard'
AND
Most common DeliveryService = '24hr'

Customer    TransID   PaymentMethod   DeliveryService
-----------------------------------------------------
Susan       1         CreditCard        24hr
Susan       2         CreditCard        24hr
Susan       3         Cash              24hr
John        4         CreditCard        48hr
John        5         CreditCard        48hr
Diane       6         CreditCard        24hr
Steve       7         Paypal            24hr
Steve       8         CreditCard        48hr
Steve       9         Paypal            24hr


Should return (2) records:

Customer
---------
Susan
Diane
从另一个角度来看,我想排除少数群体案例,即: 我不想返回“Steve”,因为尽管他用过一次信用卡,但他通常不会这样做,我只关心多栏中大多数人的行为

实际上,有更多的列(10个)需要应用相同的原则,因此我正在寻找一种技术,该技术至少可以扩展到搜索100k的记录。

试试看

CREATE TABLE #D
(
Customer   VARCHAR(50), TransID INT,  PaymentMethod  VARCHAR(50),  DeliveryService VARCHAR(50)
)
INSERT INTO #D VALUES
('Susan',1,'CreditCard','24hr'),
('Susan',2,'CreditCard','24hr'),
('Susan',3,'Cash','24hr'),
('John ',4,'CreditCard','48hr'),
('John ',5,'CreditCard','48hr'),
('Diane',6,'CreditCard','24hr'),
('Steve',7,'Paypal','24hr'),
('Steve',8,'CreditCard','48hr'),
('Steve',9,'Paypal','24hr')


;with cte as
(
SELECT *,row_number() OVER (PARTITION BY PaymentMethod,Customer ORDER BY TransID) AS RN FROM #D
)
select DISTINCT Customer FROM cte  where    PaymentMethod = 'CreditCard'
AND DeliveryService = '24hr' and rn>1
CREATE TABLE #TEMP (Customer varchar(20),TransID INT, PaymentMethod varchar(20),DeliveryService VARCHAR(10))
    INSERT INTO #TEMP VALUES
    ('Susan',1,'CreditCard','24hr'),
    ('Susan',2,'CreditCard','24hr'),
    ('Susan',3,'Cash','24hr'),
    ('John',4,'CreditCard','48hr'),
    ('John',5,'CreditCard','48hr'),
    ('Diane',6,'CreditCard','24hr'),
    ('Steve',7,'Paypal','24hr'),
    ('Steve',8,'CreditCard','48hr'),
    ('Steve',9,'Paypal','24hr');



SELECT DISTINCT Customer FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY PaymentMethod,Customer ORDER BY Customer) AS RNPaymentMethod,
ROW_NUMBER () OVER (PARTITION BY DeliveryService,Customer ORDER BY Customer) AS RNDeliveryService,Customer,TransID,PaymentMethod,DeliveryService FROM #TEMP) X
WHERE  X.PaymentMethod = 'CreditCard' AND X.DeliveryService = '24hr' AND X.RNPaymentMethod=1 AND X.RNDeliveryService=1
PS:Delivery service
我还保留了额外的行号,因为您提到我们需要调查跨多个列的大多数行为

希望这有帮助

试试这个

CREATE TABLE #TEMP (Customer varchar(20),TransID INT, PaymentMethod varchar(20),DeliveryService VARCHAR(10))
    INSERT INTO #TEMP VALUES
    ('Susan',1,'CreditCard','24hr'),
    ('Susan',2,'CreditCard','24hr'),
    ('Susan',3,'Cash','24hr'),
    ('John',4,'CreditCard','48hr'),
    ('John',5,'CreditCard','48hr'),
    ('Diane',6,'CreditCard','24hr'),
    ('Steve',7,'Paypal','24hr'),
    ('Steve',8,'CreditCard','48hr'),
    ('Steve',9,'Paypal','24hr');



SELECT DISTINCT Customer FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY PaymentMethod,Customer ORDER BY Customer) AS RNPaymentMethod,
ROW_NUMBER () OVER (PARTITION BY DeliveryService,Customer ORDER BY Customer) AS RNDeliveryService,Customer,TransID,PaymentMethod,DeliveryService FROM #TEMP) X
WHERE  X.PaymentMethod = 'CreditCard' AND X.DeliveryService = '24hr' AND X.RNPaymentMethod=1 AND X.RNDeliveryService=1
PS:Delivery service我还保留了额外的行号,因为您提到我们需要调查跨多个列的大多数行为

希望这有帮助

好的,让我试试

根据这个问题,您需要知道最常见的事件,我认为您必须声明一个函数,该函数返回以下内容:

对于这个示例,我使用了临时表的相同值,但是我创建了一个永久表,如果没有,我就无法创建和测试函数。我真的相信这个功能可以优化,但我没有时间做更多

使用函数可以修改公式,并使其符合您的条件

create function most_common_payment(@customer varchar(100))
returns varchar(100)
as
begin
    declare @total int, @payment varchar(100), @max_times int

    -- total records
    select @total = COUNT(*) from tempD where Customer=@customer;
    if @total = 0 return ''

    -- max ocurrences payment method
    select top 1 @payment = PaymentMethod, @max_times = count(*)
    from tempd 
    where Customer = @customer
    group by Customer, PaymentMethod
    order by COUNT(*) desc;
    if  @max_times <= 1 return '';

    -- percentatge
    if ((@max_times * 100) / @total) < 50 set @payment = '';

    return @payment;
end
go
这就是结果:

Customer   MostCommonPayment    MostCommonDelivery
--------   -----------------    ------------------
Susan      CreditCard           24hr
无过滤器

Customer   MostCommonPayment    MostCommonDelivery
--------   -----------------    ------------------
Diane
John       CreditCard           48hr
Steve      Paypal               24hr
Susan      CreditCard           24hr
好的,让我试试

根据这个问题,您需要知道最常见的事件,我认为您必须声明一个函数,该函数返回以下内容:

对于这个示例,我使用了临时表的相同值,但是我创建了一个永久表,如果没有,我就无法创建和测试函数。我真的相信这个功能可以优化,但我没有时间做更多

使用函数可以修改公式,并使其符合您的条件

create function most_common_payment(@customer varchar(100))
returns varchar(100)
as
begin
    declare @total int, @payment varchar(100), @max_times int

    -- total records
    select @total = COUNT(*) from tempD where Customer=@customer;
    if @total = 0 return ''

    -- max ocurrences payment method
    select top 1 @payment = PaymentMethod, @max_times = count(*)
    from tempd 
    where Customer = @customer
    group by Customer, PaymentMethod
    order by COUNT(*) desc;
    if  @max_times <= 1 return '';

    -- percentatge
    if ((@max_times * 100) / @total) < 50 set @payment = '';

    return @payment;
end
go
这就是结果:

Customer   MostCommonPayment    MostCommonDelivery
--------   -----------------    ------------------
Susan      CreditCard           24hr
无过滤器

Customer   MostCommonPayment    MostCommonDelivery
--------   -----------------    ------------------
Diane
John       CreditCard           48hr
Steve      Paypal               24hr
Susan      CreditCard           24hr

一种方法使用窗口函数和聚合:

with cp as (
     select customerid, paymentmethod, count(*) as cnt,
            rank() over (partition by customerid order by count(*) desc) as seqnum
     from t
     group by customerid, paymentmethod
    ),
    cd as (
     select customerid, deliveryservice, count(*) as cnt
            rank() over (partition by customerid over by count(*) desc) as seqnum
     from t
     group by customerid, deliveryservice
    )
select cp.customerid
from cp join
     cd
     on cp.customerid = cd.customerid
where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and
      (cd.seqnum = 1 and cd.DeliveryService = '24hr');

因为您需要沿两个不同维度排列,所以我认为您需要两个子查询(或等效子查询)。

一种方法使用窗口函数和聚合:

with cp as (
     select customerid, paymentmethod, count(*) as cnt,
            rank() over (partition by customerid order by count(*) desc) as seqnum
     from t
     group by customerid, paymentmethod
    ),
    cd as (
     select customerid, deliveryservice, count(*) as cnt
            rank() over (partition by customerid over by count(*) desc) as seqnum
     from t
     group by customerid, deliveryservice
    )
select cp.customerid
from cp join
     cd
     on cp.customerid = cd.customerid
where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and
      (cd.seqnum = 1 and cd.DeliveryService = '24hr');


因为您需要沿着两个不同维度的列组,所以我认为您需要两个子查询(或等效查询)。

首先您说“应该返回一条记录:Susan”,然后您说“我不想返回‘Susan’”。你到底要不要苏珊?请发邮件output@jarlh对于表上方的伪查询中的参数,应该返回Susan,即paymentmethod=creditcard和deliveryservice=24hr。如果付款方式是现金,我不想退回Susan,尽管她有包括现金在内的记录,因为这不是她最常用的付款方式。@chanukya好的,修改过的例子所以找到所有客户,按使用信用卡的时间百分比下单,递减?首先你说“应该退回一张记录:Susan”,然后你说“我不想回‘苏珊’“。你到底要不要苏珊?请发邮件。”output@jarlh对于表上方的伪查询中的参数,应该返回Susan,即paymentmethod=creditcard和deliveryservice=24hr。如果付款方式是现金,我不想退回Susan,尽管她有包括现金在内的记录,因为这不是她最常用的付款方式。@chanukya ok,修改后的示例So find all customers,order by percentage of time used credit card,descending?我的错误;在这个例子中,我没有发现“Diane”也应该被返回——这段代码只返回一条记录,而我想返回所有符合条件的记录@gh0st@gh0st接受我的回答。相比之下,我先发布了简单的查询,考虑到以下数据,John出现在结果集中,他不应该出现:('Susan',1,'CreditCard','24hr'),('Susan',2,'CreditCard','24hr'),('Susan',3,'Cash','24hr'),('John',4,'Paypal','48hr'),('John',5,'Paypal','48hr'),('John',6,'CreditCard','24hr'),等等,('Diane',7,'CreditCard','24hr'),('Diane',8,'CreditCard','24hr'),('Steve',9,'CreditCard','24hr'),('Steve',10,'CreditCard','48hr'))即使他的查询返回的是我现在只检查过的相同记录。@gh0stMy error;我在示例中没有发现“Diane”也应该被返回-此代码只返回一条记录,而我希望返回所有符合条件的记录。Diane Susan这两条应该来@gh0st@gh0st接受我的回答。相比之下,我先发布了simple Query考虑到以下数据,John出现在结果集中,他不应该出现:('Susan',1,'CreditCard','24hr'),('Susan',2,'CreditCard','24hr'),('Susan',3,'Cash','24hr'),('John',4,'Paypal','48hr'),('John',5,'Paypal','48hr'),('John',6,'CreditCard','24hr'),('Diane',7,'CreditCard','24hr'),',('Diane',8,'CreditCard','24hr'),('Steve',9,'CreditCard','24hr'),('Steve',10,'CreditCard','48hr')甚至他的查询只返回我现在检查过的相同结果。@gh0st非常有用,对于奖金回合,如果我们将TranID=7的PaymentMethod更改为'CreditCard',Steve会在结果中弹出(这是正确的)有没有一种方法来排除他,我们想考虑多栏标准一起,即他没有使用信用卡和24小时在一起的大多数交易?@ PK发言太快,在下面的数据插入,约翰出现在结果集,他不应该:(‘苏珊’,1,‘信用卡’,’24HR’),('Susan',2,'CreditCard','24hr'),('Susan',3,'Cash','24hr'),('John',4,'Paypal','48hr'),('John',5,'Paypal','48hr'),('John',6,'CreditCard','24hr'),('Diane',7,'CreditCard','24hr'),('Diane',8,'CreditCard','24hr'),('Steve 9,'CreditCard','24hr'),('Steve 10,'CreditCard',',