Python 将计算应用于数据帧中的过滤值_Python_Pandas_Dataframe

Python 将计算应用于数据帧中的过滤值

python pandas dataframe

Python 将计算应用于数据帧中的过滤值,python,pandas,dataframe,Python,Pandas,Dataframe,我对熊猫不熟悉将此视为我的数据帧： df Search Impressions Clicks Transactions ContainsBest ContainsFree Country Best phone 10 5 1 True False UK Best free

我对熊猫不熟悉

将此视为我的数据帧：

Search              Impressions     Clicks      Transactions    ContainsBest       ContainsFree         Country
Best phone          10              5           1               True               False                UK
Best free phone     15              4           2               True               True                 UK
free phone          20              3           4               False              True                 UK
good phone          13              1           5               False              False                US
just a free phone   12              3           4               False              True                 US

Country             Impressions     Clicks      Transactions
UK                  45              12          7
ContainsBest        25              9           3
ContainsFree        35              7           6

US                  25              4           9
ContainsBest        0               0           0
ContainsFree        12              3           4

Country             Impressions     Clicks      Transactions        TopCategoriesForImpressions     TopCategoriesForClicks          TopCategoriesForTransactions     
UK                  45              12          7                   ContainsFree                    ContainsBest                    ContainsFree
ContainsBest        25              9           3                   ContainsBest                    ContainsFree                    ContainsBest
ContainsFree        35              7           6

US                  25              4           9                   ContainsFree                    ContainsFree                    ContainsFree
ContainsBest        0               0           0
ContainsFree        12              3           4

我有列

ContainsBest

和

ContainsFree

。我想对所有

印象

，

点击

和

交易

进行求和，其中

包含的测试

为

真

，然后我想对

印象

进行求和，

点击

和

交易

其中

包含的自由

为真，并对

国家/地区列中的每个唯一值执行相同的操作。因此，新的数据帧将如下所示：
TopCategoriesForImpressions = output_df['Impressions'].sort_values(by='Impressions', ascending=False).where(output_df['Country']=='UK')

输出_df
Search              Impressions     Clicks      Transactions    ContainsBest       ContainsFree         Country
Best phone          10              5           1               True               False                UK
Best free phone     15              4           2               True               True                 UK
free phone          20              3           4               False              True                 UK
good phone          13              1           5               False              False                US
just a free phone   12              3           4               False              True                 US

Country             Impressions     Clicks      Transactions
UK                  45              12          7
ContainsBest        25              9           3
ContainsFree        35              7           6

US                  25              4           9
ContainsBest        0               0           0
ContainsFree        12              3           4

Country             Impressions     Clicks      Transactions        TopCategoriesForImpressions     TopCategoriesForClicks          TopCategoriesForTransactions     
UK                  45              12          7                   ContainsFree                    ContainsBest                    ContainsFree
ContainsBest        25              9           3                   ContainsBest                    ContainsFree                    ContainsBest
ContainsFree        35              7           6

US                  25              4           9                   ContainsFree                    ContainsFree                    ContainsFree
ContainsBest        0               0           0
ContainsFree        12              3           4

为此，我理解我需要使用以下内容：
uk_toal_impressions = df['Impressions'].sum().where(df['Country']=='UK')

uk_best_impressions = df['Impressions'].sum().where(df['Country']=='UK' & df['ContainsBest'])

uk_free_impressions = df['Impressions'].sum().where(df['Country']=='UK' & df['ContainsFree'])

然后，我会对点击
和交易
应用相同的逻辑，并对国家
美国
重做相同的代码
我试图实现的第二件事是添加列TopCategories
perCountry
和Impressions
，点击和交易
，以便我的最终输出_df
如下所示：
TopCategoriesForImpressions = output_df['Impressions'].sort_values(by='Impressions', ascending=False).where(output_df['Country']=='UK')

最终输出\u df
Search              Impressions     Clicks      Transactions    ContainsBest       ContainsFree         Country
Best phone          10              5           1               True               False                UK
Best free phone     15              4           2               True               True                 UK
free phone          20              3           4               False              True                 UK
good phone          13              1           5               False              False                US
just a free phone   12              3           4               False              True                 US

Country             Impressions     Clicks      Transactions
UK                  45              12          7
ContainsBest        25              9           3
ContainsFree        35              7           6

US                  25              4           9
ContainsBest        0               0           0
ContainsFree        12              3           4

Country             Impressions     Clicks      Transactions        TopCategoriesForImpressions     TopCategoriesForClicks          TopCategoriesForTransactions     
UK                  45              12          7                   ContainsFree                    ContainsBest                    ContainsFree
ContainsBest        25              9           3                   ContainsBest                    ContainsFree                    ContainsBest
ContainsFree        35              7           6

US                  25              4           9                   ContainsFree                    ContainsFree                    ContainsFree
ContainsBest        0               0           0
ContainsFree        12              3           4

列topcegoriesforxx
逻辑是一种简单的ContainsBest
和ContainsFree
行，位于Country
列下。因此，UK
国家的TopCategoriesForImpressions

无容器
集装箱贝斯特
而UK
国家的TopCategoriesForClicks
是：
集装箱贝斯特
无容器
我知道我需要使用类似这样的东西：
TopCategoriesForImpressions = output_df['Impressions'].sort_values(by='Impressions', ascending=False).where(output_df['Country']=='UK')

我只是觉得很难把所有东西都放在我上一次的最终输出\u df
中。另外，我假设我不需要创建output\u df
，只是想添加它，以便更好地理解实现最终输出\u df的步骤
所以我的问题是：
如何应用基于一个和多个条件的计算？请参见行ContainsBest
和ContainsFree
如何根据条件对列值进行排序？请参见列TopCategoriesForImpressions
事实上，我有70个国家和20个栏目Containsxxx
，有没有办法在不增加70个国家和20个Containsxxx
栏目条件的情况下实现这一点
非常感谢您的建议。
解决方案的第一部分应该是：
#removed unnecessary column Search and added ContainAll column filled Trues
df1 = df.drop('Search', 1).assign(ContainAll = True)

#columns for tests
cols1 = ['Impressions','Clicks','Transactions']
cols2 = ['ContainsBest','ContainsFree','ContainAll']

print (df1[cols2].dtypes)
ContainsBest    bool
ContainsFree    bool
ContainAll      bool
dtype: object

print (df1[cols1].dtypes)
Impressions     int64
Clicks          int64
Transactions    int64
dtype: object



对于第二种情况，可以使用numpy.argsort
和per groups筛选检查排序的行：
def f(x):
    i = x.index.to_numpy()
    a = i[(-x.to_numpy()).argsort(axis=0)]
    return pd.DataFrame(a, columns=x.columns)


df2 = (df1[df1['Type'].isin(['ContainsBest','ContainsFree']) &
          ~df1[cols1].eq(0).all(1)]
           .set_index('Type')
           .groupby('Country')[cols1]
           .apply(f)
           .add_prefix('TopCategoriesFor')
           .rename_axis(['Country','Type'])
           .rename({0:'ContainsBest', 1:'ContainsFree'})
)
print (df2)
                     TopCategoriesForImpressions TopCategoriesForClicks  \
Country Type                                                              
UK      ContainsBest                ContainsFree           ContainsBest   
        ContainsFree                ContainsBest           ContainsFree   
US      ContainsBest                ContainsFree           ContainsFree   

                     TopCategoriesForTransactions  
Country Type                                       
UK      ContainsBest                 ContainsFree  
        ContainsFree                 ContainsBest  
US      ContainsBest                 ContainsFree  


出于某种原因，我确实得到了与您相同的结构，但我所有的值都是0
。你知道为什么会这样吗？在使用您添加的代码之前，我正在将df
写入csv，可能to_csv
会使df
为空？@JonasPalačionis-hmmm，数据是数字吗？正在检查。我还收到了一个警告futurearning:Passing list like to.loc或[]如果缺少任何标签，将在将来引发KeyError，您可以使用.reindex（）作为替代方法。
不确定这是否会影响脚本。@JonasPalačionis-一个想法是print（df1.melt（['Country']+cols1，var_name='Type'，value_name='mask'）.dtypes）
？@JonasPalačionis-添加了一些指纹以供检查。