Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python SQL等效于更新,其中分组依据_Python_Sql_Pandas_Dataframe - Fatal编程技术网

Python SQL等效于更新,其中分组依据

Python SQL等效于更新,其中分组依据,python,sql,pandas,dataframe,Python,Sql,Pandas,Dataframe,尽管我一直在寻找这个问题,但我找不到正确的方法让这个查询在pandas中工作 update product set maxrating = (select max(rating) from rating where source = 'customer' and product.sku = rating.sku group by sku)

尽管我一直在寻找这个问题,但我找不到正确的方法让这个查询在pandas中工作

update product
  set maxrating = (select max(rating)
                   from rating
                   where source = 'customer'
                     and product.sku = rating.sku
                   group by sku)
  where maxrating is null;
熊猫

product = pd.DataFrame({'sku':[1,2,3],'maxrating':[0,0,1]})
rating = pd.DataFrame({'sku':[1,1,2,3,3],'rating':[2,5,3,5,4],'source':['retailer','customer','customer','retailer','customer']})
expected_result = pd.DataFrame({'sku':[1,2,3],'maxrating':[5,3,1]})
SQL


如何做到这一点?

您可以执行以下操作:

In [127]: df = pd.merge(rating, product, on='sku')

In [128]: df1 = df[df['maxrating'] == 0].groupby('sku').agg({'rating': np.max}).reset_index().rename(columns={'rating': 'maxrating'})

In [129]: df2 = df[df['maxrating'] != 0][['sku', 'maxrating']].drop_duplicates(keep='first')

In [131]: pd.concat([df1, df2])
Out[131]: 
   sku  maxrating
0    1          5
1    2          3
3    3          1

In [132]: expected_result
Out[132]: 
   sku  maxrating
0    1          5
1    2          3
2    3          1
基本上,我合并两个数据帧,然后提取需要处理的行(那些没有maxrating的行),并找到它们的实际最大评级

完成后,我将结果与我排除的行(那些具有maxrating的行)连接起来,并最终得到预期的结果。

所有这些都在一起 首先,让我们从空开始,而不是从零开始

product.maxrating = product.maxrating.replace(0, np.nan)
product

然后识别缺失的
“sku”
,并在
分组依据中使用它们来计算
缺失最大值

missing = product.loc[product.maxrating.isnull(), 'sku']
missingmax = rating.groupby(missing, as_index=False).rating.agg({'maxrating': 'max'})

missingmax

使用
更新

product.update(missingmax)
product
试试这个:

In [220]: product.ix[product.maxrating == 0, 'maxrating'] = product.sku.map(rating.groupby('sku')['rating'].max())

In [221]: product
Out[221]:
   maxrating  sku
0          5    1
1          3    2
2          1    3
或使用通用遮罩:

In [222]: mask = (product.maxrating == 0)

In [223]: product.ix[mask, 'maxrating'] = product.ix[mask, 'maxrating'].map(rating.groupby('sku')['rating'].max())

In [224]: product
Out[224]:
   maxrating  sku
0          5    1
1          3    2
2          1    3

哇!那正是我想要的,非常感谢!现在,我只需要找出是否可以对map方法使用多列而不是一系列,否则我将只使用计算列。@ArthurBurkhardt,不客气!我建议你用一个样本和所需的数据集来回答一个新问题。当你问这个问题时,你做得非常好——如果我们有生成输入和所需数据集的代码,那么SO社区就更容易找到答案
In [220]: product.ix[product.maxrating == 0, 'maxrating'] = product.sku.map(rating.groupby('sku')['rating'].max())

In [221]: product
Out[221]:
   maxrating  sku
0          5    1
1          3    2
2          1    3
In [222]: mask = (product.maxrating == 0)

In [223]: product.ix[mask, 'maxrating'] = product.ix[mask, 'maxrating'].map(rating.groupby('sku')['rating'].max())

In [224]: product
Out[224]:
   maxrating  sku
0          5    1
1          3    2
2          1    3