Python 计算字典中出现的值
给定DF:Python 计算字典中出现的值,python,pandas,Python,Pandas,给定DF: pd.DataFrame({"A":[1,2,3], "B": [{"Mon":"Closed", "Tue":"Open", "Wed":"Closed"}, {"Mon":"Open", "Tue":"Open", "Wed":"Closed"}, {"Mon":"Open", "Tue":"Open", "Wed":"Open"}] }) 如
pd.DataFrame({"A":[1,2,3],
"B": [{"Mon":"Closed", "Tue":"Open", "Wed":"Closed"},
{"Mon":"Open", "Tue":"Open", "Wed":"Closed"},
{"Mon":"Open", "Tue":"Open", "Wed":"Open"}]
})
如何计算“关闭”在dict中出现的次数
A B count
1 {..} 2
2 {..} 1
3 {..} 0
我真的不知道如何从这里开始尝试您可以尝试将一系列字典转换为一个数据帧,然后
堆栈
,然后在级别=0上取关闭的
值之和以获得每行的计数:
df['Count_closed'] = pd.DataFrame(df['B'].tolist()).stack().eq("Closed").sum(level=0)
您可以执行
应用:
df['count'] = df.B.apply(pd.Series).eq('Closed').sum(1)
输出:
A B count
0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
我会的
df.B.astype(str).str.count('Closed')
Out[30]:
0 2
1 1
2 0
Name: B, dtype: int64
或
简单的.apply()
解决方案:
df['Count'] = df.B.apply(lambda x: sum('Closed' in v for v in x.values()))
print(df)
印刷品:
A B Count
0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
基准:
import perfplot
import pandas as pd
def f1(df):
df['Count'] = df.B.apply(lambda x: sum('Closed' in v for v in x.values()))
return df
def f2(df):
df['count'] = df.B.astype(str).str.count('Closed')
return df
# Commented out because of timed-out:
# def f3(df):
# df['count'] = df.B.apply(pd.Series).eq('Closed').sum(1)
# return df
def f4(df):
df['count'] = pd.DataFrame(df['B'].tolist()).stack().eq("Closed").sum(level=0)
return df
def setup(n):
A = [*range(n)]
B = [{'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} for _ in range(n)]
df = pd.DataFrame({'A': A,
'B': B})
return df
perfplot.show(
setup=setup,
kernels=[f1, f2, f4],
labels=['apply(sum)', 'str.count()', 'stack.eq()'],
n_range=[10**i for i in range(1, 7)],
xlabel='N (* len(df))',
equality_check=None,
logx=True,
logy=True)
结果:
因此,直接的apply()
和sum()
似乎是最快的。请不要将字典放入数据帧列中。您正在失去矢量化操作的所有速度,并使值难以访问
清洁您的df
:
>>> df = pd.concat([df['A'], df['B'].apply(pd.Series)], axis=1)
>>> df
A Mon Tue Wed
0 1 Closed Open Closed
1 2 Open Open Closed
2 3 Open Open Open
现在计数“已关闭”
很容易
>>> df['count'] = df.eq('Closed').sum(1)
>>> df
A Mon Tue Wed count
0 1 Closed Open Closed 2
1 2 Open Open Closed 1
2 3 Open Open Open 0
使用辅助功能:
def aux_func(x):
week_days = x.keys()
count=0
for day in week_days:
if x[day]=='Closed':
count+=1
return count
counts = [aux_func(c) for c in df.loc[:,'B'] ]
df['counts'] = counts
您可以在简单的列表中使用计数器
from collections import Counter
df['count'] = [Counter(x.values())['Closed'] for x in df.B]
# A B Count
#0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
#1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
#2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
@安基啊哈,我有时也会忘记和轴的和~
def aux_func(x):
week_days = x.keys()
count=0
for day in week_days:
if x[day]=='Closed':
count+=1
return count
counts = [aux_func(c) for c in df.loc[:,'B'] ]
df['counts'] = counts
from collections import Counter
df['count'] = [Counter(x.values())['Closed'] for x in df.B]
# A B Count
#0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
#1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
#2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0