Python 有条件地迭代并输出数据帧和图表

Python 有条件地迭代并输出数据帧和图表,python,pandas,for-loop,seaborn,Python,Pandas,For Loop,Seaborn,我正在尝试为每个“VAL”输出数据框和图表。我正在努力拼凑一些Python的基础知识 Flow:我获取数据帧,做一个groupby,得到总数的百分比。。。输出一个表和一个图表。但是,我想循环完成这个过程,第一次是使用Review?=“Yes”上的数据帧过滤器,然后是No data = {'Region': ["US", "US", "US","US"], 'Gender': ["M","F","F","M"], 'Reviewed?': ["Yes","Yes",

我正在尝试为每个“VAL”输出数据框和图表。我正在努力拼凑一些Python的基础知识

Flow:我获取数据帧,做一个groupby,得到总数的百分比。。。输出一个表和一个图表。但是,我想循环完成这个过程,第一次是使用
Review?=“Yes”
上的数据帧过滤器,然后是
No

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df):
    vals = ['Yes','No']
    for i in range(len(vals)):
        for x in vals:
            gb[i] = df[df['Reviewed?']==x].groupby(['Gender'])['Region'].count().reset_index()
            total[i] = gb[i]['Region'].sum()
            gb[i]['Percentage'] = (gb[i]['Region'] / total[i])
            gb[i] = gb[i].sort_values(by='Percentage', ascending=False)
            sns.barplot(data=gb[i], x='Region', y='Percentage')
    plt.show()
    return gb[i]
一些错误消息:

ValueError:无法将输入数组从形状(0,2)广播到形状(0)

ValueError:无法将大小为2的序列复制到维度为0的数组轴上

ValueError:无法设置没有定义索引的帧和无法转换为序列的值

更新 这是我想要的暴力版本。我只是想用一种更高效、更具活力的方式来实现这一点

注意,我最初并没有明确表示我希望将计数保留在最终数据帧中

import pandas as pd
import seaborn as sns

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df):
    gb = df[df['Reviewed?']=='No'].groupby(['Gender'])['Region'].count().reset_index()
    total = gb['Region'].sum()
    gb['Percentage'] = (gb['Region'] / total)
    notyetreviewed = gb.sort_values(by='Percentage', ascending=False)
    sns.barplot(data=notyetreviewed, x='Gender', y='Percentage')
    bottom, top = plt.ylim(0,1) 
    plt.show()

    gb = df[df['Reviewed?']=='Yes'].groupby(['Gender'])['Region'].count().reset_index()
    total = gb['Region'].sum()
    gb['Percentage'] = (gb['Region'] / total)
    reviewed = gb.sort_values(by='Percentage', ascending=False)
    bottom, top = plt.ylim(0,1)  
    sns.barplot(data=reviewed, x='Gender', y='Percentage')
    plt.show()

    return notyetreviewed, reviewed
func(df)

您可以尝试以下方法:

import pandas as pd

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

for outcome in ['Yes', 'No']:
    filtered = df[df['Reviewed?'].eq(outcome)]['Gender'].value_counts(normalize=True)
    filtered.plot.bar()

在本例中,我将通过
审查?
结果过滤每个循环上的DF,然后获得男性和女性的比例值。您的问题提出了一个二元选择,但我认为它可以扩展为df['review']中的结果。如果能看到一个更具python风格的解决方案,而不需要我在函数调用中硬编码
“review?”
,那就太好了

import pandas as pd
import seaborn as sns

data = {'Region': ["US", "US", "US","US"],
        'Gender': ["M","F","F","M"],
        'Reviewed?': ["Yes","Yes","No","No"]}
df = pd.DataFrame(data, columns=['Region','Gender','Reviewed?'])

def func(df,group,reviewed):
    df = df[df['Reviewed?'].isin(reviewed)].groupby([group])['Region'].count().reset_index()
    df['Percentage'] = df['Region'] / df['Region'].sum()
    sns.barplot(data=df, x='Gender', y='Percentage')
    bottom, top = plt.ylim(0,1)
    plt.show()
    return df

df1 = func(df,'Gender',['Yes'])
df1 = func(df,'Gender',['No'])

您无法在一次代码运行中获得3个
ValueErrors
。是哪一个?我很感激你试图说明各种尝试。我们这里应该有两个图表,每个图表显示M/F的50/50分割?哈,显然你在这方面没有我差:)在第一个值错误之后,我得到:“在处理上述异常的过程中,发生了另一个异常:”第二个值错误之后也会发生同样的事情。是的,我刚刚为可复制的代码输入了一些虚拟数据。我试图修复您的缩进,因为它在
func
中缩进得太远了。请检查并确保它仍然正确。几乎。如前所述,我正在尝试输出图表和数据帧。另外,我喜欢保留计数,这就是我手动计算总数的原因。@Christopher我在保留计数的问题中看不到任何东西。我没有具体询问计数,但我将其构建到我的代码中。在这一点上,这真的无关紧要。我特别要求的是输出图表和数据帧。感谢您对这段代码的帮助。