Python 在函数中使用.filter_Python_Pandas

Python 在函数中使用.filter

python pandas

Python 在函数中使用.filter,python,pandas,Python,Pandas,我正在尝试创建一个创建透视表的函数，需要根据字符串筛选一列 df = DataFrame({'Breed': ['Sheltie', 'Bernard', 'Husky', 'Husky', 'pig', 'Sheltie','Bernard'], 'Metric': ['One month walked', 'two month walked', 'three month walked', 'four month walked', 'one month waiting

我正在尝试创建一个创建透视表的函数，需要根据字符串筛选一列

df = DataFrame({'Breed': ['Sheltie', 'Bernard', 'Husky', 'Husky', 'pig', 'Sheltie','Bernard'], 
            'Metric': ['One month walked', 'two month walked', 'three month walked', 'four month walked', 'one month waiting', 'two month waiting', 'Three month waiting'],
            'Age': [1,2,3,4,5,6,7]})

我想要一个数据透视表，其中汇总了所有狗的年龄，它们有一个“完成的”指标，不管是哪个月

它看起来有点像这样：

                             Age
Breed      Metric            sum
------------------------------------
Husky  one month walked       4
Husky  four month walked      5

该函数将过滤掉任何未“走”的度量，同时汇总每个“完成”度量

我已经试过了

import pandas as pd
import fnmatch

def Dog_Walked_Completed(dfprime):
    return dfprime[dfprime['Breed'] == 'Husky'].groupby(['Breed','Metric']).fnmatch.filter(lambda df : (df['Metric']=='?completion')).any().agg({'Age': ['sum']})

但无论何时尝试，都会得到一个“DataFrameGroupBy”对象没有属性“fnmatch”错误。在函数中是否有不同的通配符搜索方法？

假设要查找每个品种的年龄总和，在其度量中包含完成词。您可以采取以下方法

>>> import pandas as pd
>>> df = pd.DataFrame({'Breed': ['Sheltie', 'Bernard', 'Husky', 'Husky', 'pig', 'Sheltie','Bernard'],'Metric': ['One month walked', 'two month walked', 'three month walked', 'four month walked', 'one month waiting', 'two month waiting', 'Three month waiting'],'Age': [1,2,3,4,5,6,7]})
>>> df
   Age    Breed               Metric
0    1  Sheltie     One month walked
1    2  Bernard     two month walked
2    3    Husky   three month walked
3    4    Husky    four month walked
4    5      pig    one month waiting
5    6  Sheltie    two month waiting
6    7  Bernard  Three month waiting

现在，让我们创建一个布尔函数，用于检查数据帧

df

的

Metrics

列中的单词完成情况

>>> bool = df['Metric'].str.contains('completion')

现在，您可以对品种和

bool

变量执行

groupby

，以查找年龄总和

>>> pvt_tbl = df.groupby(['Breed',bool])['Age'].sum()
>>> pvt_tbl
Breed    Metric
Bernard  False     9
Husky    False     7
Sheltie  False     7
pig      False     5
Name: Age, dtype: int64

由于样本数据中没有“完成”字，因此所有结果都返回false。但我们可以检查“walked”这个词，因为有一些行中存在walked

>>> bool1 = df['Metric'].str.contains('walked')
>>> pvt_tbl1 = df.groupby(['Breed',bool1])['Age'].sum()
>>> pvt_tbl1
Breed    Metric
Bernard  False     7
         True      2
Husky    True      7
Sheltie  False     6
         True      1
pig      False     5
Name: Age, dtype: int64

希望，这就是你想要做的

更新根据评论：

>>> df.groupby(['Breed','Metric'])['Age'].sum()
Breed    Metric
Bernard  Three month waiting    7
         two month walked       2
Husky    four month walked      4
         three month walked     3
Sheltie  One month walked       1
         two month waiting      6
pig      one month waiting      5
Name: Age, dtype: int64

实际上，你还有其他的非封闭括号。基本上，看起来你开始写你想要的东西，然后在代码中间放弃。很难修正你的代码…X是在哪里定义的？嘿，谢谢你的帮助，并对缺少有意义的指令道歉。对于“品种”列，我希望为每个品种单独创建一个dframe，因此该函数是因为在我的真实数据框中，我正在处理100+个品种。对于公制列，我希望返回字符串本身，而不是布尔值。谢谢你的耐心。