Python 蟒蛇熊猫计数_Python_Pandas

Python 蟒蛇熊猫计数

python pandas

Python 蟒蛇熊猫计数,python,pandas,Python,Pandas,我有一个“句子”的数据框架，我希望从中搜索关键字。假设我的关键字只是字母“A”。样本数据： year | sentence | index ----------------------- 2015 | AAX | 0 2015 | BAX | 1 2015 | XXY | -1 2016 | AWY | 0 2017 | BWY | -1 也就是说，“索引”列显示每个句子中第一次出现“A”的索引（-1，如果未找到）。我想把这些行分成各自的年份，

我有一个“句子”的数据框架，我希望从中搜索关键字。假设我的关键字只是字母“A”。样本数据：

year | sentence | index
-----------------------
2015 | AAX      | 0
2015 | BAX      | 1
2015 | XXY      | -1
2016 | AWY      | 0
2017 | BWY      | -1

也就是说，“索引”列显示每个句子中第一次出现“A”的索引（-1，如果未找到）。我想把这些行分成各自的年份，用一列显示每年记录中“a”出现的百分比。即:

year | index
-------------
2015 | 0.667
2016 | 1.0
2017 | 0

我有一种感觉，这在某种程度上涉及到

agg

或

groupby

，但我不清楚如何将它们组合在一起。我已经做到了：

df.groupby（“index”）.count（）

但这里的问题是某种条件count（），我们首先计算201X年包含“A”的行数，然后除以201X年的行数

您可以使用或与：

或：

最后除以：

有不同的方法可以做到这一点，但据我所知，没有“本土”的方法。这里有一个例子，只有一个格劳比：

g = df.groupby('year')['index'].agg([lambda x: x[x>=0].count(), 'count'])
g['<lambda>'] / g['count']

g=df.groupby（'year'）['index'].agg（[lambda x:x[x>=0].count（），'count']））
g['']/g['count']

同时检查：

使用

语句检查
df.sentence.str.contains('A').groupby(df.year).mean()

year
2015    0.666667
2016    1.000000
2017    0.000000
Name: sentence, dtype: float64


使用已选中的索引

df['index'].ne(-1).groupby(df.year).mean()

year
2015    0.666667
2016    1.000000
2017    0.000000
Name: index, dtype: float64

g = df.groupby('year')['index'].agg([lambda x: x[x>=0].count(), 'count'])
g['<lambda>'] / g['count']

from __future__ import division
import pandas as pd
x_df = # your dataframe

y = x_df.groupby('year')['sentence'].apply(lambda x: sum(True if i.count('A') >0 else False for i in x)/len(x))

#or

y = x.groupby('year')['index'].apply(lambda x: sum(True if i >=0 else False for i in x)/len(x))

df.sentence.str.contains('A').groupby(df.year).mean()

year
2015    0.666667
2016    1.000000
2017    0.000000
Name: sentence, dtype: float64

df['index'].ne(-1).groupby(df.year).mean()

year
2015    0.666667
2016    1.000000
2017    0.000000
Name: index, dtype: float64