Python 有条件地迭代并输出数据帧和图表
我正在尝试为每个“VAL”输出数据框和图表。我正在努力拼凑一些Python的基础知识 Flow:我获取数据帧,做一个groupby,得到总数的百分比。。。输出一个表和一个图表。但是,我想循环完成这个过程,第一次是使用Python 有条件地迭代并输出数据帧和图表,python,pandas,for-loop,seaborn,Python,Pandas,For Loop,Seaborn,我正在尝试为每个“VAL”输出数据框和图表。我正在努力拼凑一些Python的基础知识 Flow:我获取数据帧,做一个groupby,得到总数的百分比。。。输出一个表和一个图表。但是,我想循环完成这个过程,第一次是使用Review?=“Yes”上的数据帧过滤器,然后是No data = {'Region': ["US", "US", "US","US"], 'Gender': ["M","F","F","M"], 'Reviewed?': ["Yes","Yes",
Review?=“Yes”
上的数据帧过滤器,然后是No
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df):
vals = ['Yes','No']
for i in range(len(vals)):
for x in vals:
gb[i] = df[df['Reviewed?']==x].groupby(['Gender'])['Region'].count().reset_index()
total[i] = gb[i]['Region'].sum()
gb[i]['Percentage'] = (gb[i]['Region'] / total[i])
gb[i] = gb[i].sort_values(by='Percentage', ascending=False)
sns.barplot(data=gb[i], x='Region', y='Percentage')
plt.show()
return gb[i]
一些错误消息:
ValueError:无法将输入数组从形状(0,2)广播到形状(0)
ValueError:无法将大小为2的序列复制到维度为0的数组轴上
ValueError:无法设置没有定义索引的帧和无法转换为序列的值
更新
这是我想要的暴力版本。我只是想用一种更高效、更具活力的方式来实现这一点
注意,我最初并没有明确表示我希望将计数保留在最终数据帧中
import pandas as pd
import seaborn as sns
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df):
gb = df[df['Reviewed?']=='No'].groupby(['Gender'])['Region'].count().reset_index()
total = gb['Region'].sum()
gb['Percentage'] = (gb['Region'] / total)
notyetreviewed = gb.sort_values(by='Percentage', ascending=False)
sns.barplot(data=notyetreviewed, x='Gender', y='Percentage')
bottom, top = plt.ylim(0,1)
plt.show()
gb = df[df['Reviewed?']=='Yes'].groupby(['Gender'])['Region'].count().reset_index()
total = gb['Region'].sum()
gb['Percentage'] = (gb['Region'] / total)
reviewed = gb.sort_values(by='Percentage', ascending=False)
bottom, top = plt.ylim(0,1)
sns.barplot(data=reviewed, x='Gender', y='Percentage')
plt.show()
return notyetreviewed, reviewed
func(df)
您可以尝试以下方法:
import pandas as pd
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
for outcome in ['Yes', 'No']:
filtered = df[df['Reviewed?'].eq(outcome)]['Gender'].value_counts(normalize=True)
filtered.plot.bar()
在本例中,我将通过
审查?
结果过滤每个循环上的DF,然后获得男性和女性的比例值。您的问题提出了一个二元选择,但我认为它可以扩展为df['review']中的结果。如果能看到一个更具python风格的解决方案,而不需要我在函数调用中硬编码“review?”
,那就太好了
import pandas as pd
import seaborn as sns
data = {'Region': ["US", "US", "US","US"],
'Gender': ["M","F","F","M"],
'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])
def func(df,group,reviewed):
df = df[df['Reviewed?'].isin(reviewed)].groupby([group])['Region'].count().reset_index()
df['Percentage'] = df['Region'] / df['Region'].sum()
sns.barplot(data=df, x='Gender', y='Percentage')
bottom, top = plt.ylim(0,1)
plt.show()
return df
df1 = func(df,'Gender',['Yes'])
df1 = func(df,'Gender',['No'])
您无法在一次代码运行中获得3个
ValueErrors
。是哪一个?我很感激你试图说明各种尝试。我们这里应该有两个图表,每个图表显示M/F的50/50分割?哈,显然你在这方面没有我差:)在第一个值错误之后,我得到:“在处理上述异常的过程中,发生了另一个异常:”第二个值错误之后也会发生同样的事情。是的,我刚刚为可复制的代码输入了一些虚拟数据。我试图修复您的缩进,因为它在func
中缩进得太远了。请检查并确保它仍然正确。几乎。如前所述,我正在尝试输出图表和数据帧。另外,我喜欢保留计数,这就是我手动计算总数的原因。@Christopher我在保留计数的问题中看不到任何东西。我没有具体询问计数,但我将其构建到我的代码中。在这一点上,这真的无关紧要。我特别要求的是输出图表和数据帧。感谢您对这段代码的帮助。