Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从熊猫的专栏中总结观察结果_Python_Pandas_Dataframe_Multiple Columns_Series - Fatal编程技术网

Python 从熊猫的专栏中总结观察结果

Python 从熊猫的专栏中总结观察结果,python,pandas,dataframe,multiple-columns,series,Python,Pandas,Dataframe,Multiple Columns,Series,假设我有一个大数据框DS_df,其中包括列名year、dealamount和CCS等。从1985年到2020年,每年我都需要一个单独的熊猫系列,即sum_2019。如果CCS确实发生多次(如果只发生一次,则应将其添加到系列中),并且年份匹配,则我需要对dealamount求和: year dealamount CCS 0 2013 37,522,700 Albania_Azerbaijan 1 2013 37,522,700 Albania_Azerbai

假设我有一个大数据框DS_df,其中包括列名year、dealamount和CCS等。从1985年到2020年,每年我都需要一个单独的熊猫系列,即sum_2019。如果CCS确实发生多次(如果只发生一次,则应将其添加到系列中),并且年份匹配,则我需要对dealamount求和:

    year    dealamount  CCS
0   2013    37,522,700  Albania_Azerbaijan
1   2013    37,522,700  Albania_Azerbaijan
2   2016    436,341,300 Albania_Greece
3   2019    763,189,200 Albania_Russia
4   2019    763,189,200 Albania_Russia
5   2019    763,189,200 Albania_Russia
6   2019    763,189,200 Albania_Russia
7   2017    150,931,000 Albania_Turkey
8   2016    275,293,750 Albania_Turkey
9   2009    258,328,000 Albania_Turkey
10  2019    153,452,000 Albania_Venezuela
11  2019    153,452,000 Albania_Venezuela
11  2017    153,452,000 Albania_Venezuela
因此,在这种情况下,sum_2019应该是一个熊猫系列,指数为CCS,总量为“观测值”

同样,对于sum_2013:

Albania_Azerbaijan 75,045,400
非常感谢您提供的任何帮助,因为我需要为很多数据点提供这些帮助,并且感到非常失落(对于python来说真的很陌生),我该如何正确地实现自动化呢

谢谢

你想要这个吗

df.dealamount = df.dealamount.str.replace(',','').astype(int)
new_df = df.groupby(['year','CCS']).agg({'dealamount': sum})
输出-

                         dealamount
year CCS                           
2009 Albania_Turkey       258328000
2013 Albania_Azerbaijan    75045400
2016 Albania_Greece       436341300
     Albania_Turkey       275293750
2017 Albania_Turkey       150931000
     Albania_Venezuela    153452000
2019 Albania_Russia      3052756800
     Albania_Venezuela    306904000
                         dealamount
year CCS                           
2009 Albania_Turkey       258328000
2013 Albania_Azerbaijan    75045400
2016 Albania_Greece       436341300
     Albania_Turkey       275293750
2017 Albania_Turkey       150931000
     Albania_Venezuela    153452000
2019 Albania_Russia      3052756800
     Albania_Venezuela    306904000
# to avoid scientific notation (e notation)
pd.set_option('display.float_format', lambda x: '%.d' % x) 

# first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()

print(sum_2019)
CCS
Albania_Russia      3052756800
Albania_Venezuela    306904000
Name: dealamount, dtype: float64