Python 循环只取最后一个值_Python_Pandas_Loops

Python 循环只取最后一个值

python pandas loops

Python 循环只取最后一个值,python,pandas,loops,Python,Pandas,Loops,我有一个数据框架，每年有特定国家的人口，还有一个大熊猫系列，每年有世界人口。这是我正在使用的系列： pop_tot = df3.groupby('Year')['population'].sum() Year 1990 4.575442e+09 1991 4.659075e+09 1992 4.699921e+09 1993 4.795129e+09 1994 4.862547e+09 1995 4.949902e+09 ... ... 2

我有一个数据框架，每年有特定国家的人口，还有一个大熊猫系列，每年有世界人口。这是我正在使用的系列：

pop_tot = df3.groupby('Year')['population'].sum()
Year     
1990    4.575442e+09
1991    4.659075e+09
1992    4.699921e+09
1993    4.795129e+09
1994    4.862547e+09
1995    4.949902e+09
...     ...
2017    6.837429e+09

这是我正在使用的数据帧

        Country      Year   HDI     population
0       Afghanistan 1990    NaN     1.22491e+07
1       Albania     1990    0.645   3.28654e+06
2       Algeria     1990    0.577   2.59124e+07
3       Andorra     1990    NaN     54509
4       Angola      1990    NaN     1.21714e+07
...     ...         ...     ...     ...
4096    Uzbekistan  2017    0.71    3.23872e+07 
4097    Vanuatu     2017    0.603   276244  
4098    Zambia      2017    0.588   1.70941e+07 
4099    Zimbabwe    2017    0.535   1.65299e+07

我想计算每年该国人口所占的世界人口比例，因此我循环使用该系列和数据框架，如下所示：

j = 0
for i in range(len(df3)):
    if df3.iloc[i,1]==pop_tot.index[j]:
        df3['pop_tot']=pop_tot[j] #Sanity check
        df3['weighted']=df3['population']/pop_tot[j]
        *df3.iloc[i,2]
    else:
        j=j+1

但是，我得到的数据帧不是预期的。最后，我将所有值除以2017年的总人口，从而得出该年不正确的比例（即，对于第一行，pop_tot应为4.575442e+09，因为根据上述系列，它对应于1990年，而不是对应于2017年的6.837429e+09）

然而，我看不出循环中有什么错误。

提前感谢。

您不需要循环，您可以使用直接在

df3

中创建列

pop_tot

。然后对于列

加权

只需执行列操作，例如：

df3['pop_tot'] = df3.groupby('Year')['population'].transform(sum)
df3['weighted'] = df3['population']/df3['pop_tot']

正如@roganjosh所指出的，你的方法的问题在于每次你的条件

满足时，你都会替换整个列pop_tot
和weighted
，因此在满足该条件的最后一次迭代中，可能是2017年，您可以将列pop_tot
的值定义为2017年的值，并使用该值计算weithed。
您不必循环，它的速度较慢，并且可以使事情变得非常复杂。使用pandas
和numpys
矢量化解决方案，例如：
df['pop_tot'] = df.population.sum()
df['weighted'] =  df.population / df.population.sum()

print(df)
       Country  Year    HDI  population     pop_tot  weighted
0  Afghanistan  1990    NaN  12249100.0  53673949.0  0.228213
1      Albania  1990  0.645   3286540.0  53673949.0  0.061232
2      Algeria  1990  0.577  25912400.0  53673949.0  0.482774
3      Andorra  1990    NaN     54509.0  53673949.0  0.001016
4       Angola  1990    NaN  12171400.0  53673949.0  0.226766

在OP评论后编辑
df['pop_tot'] = df.groupby('Year').population.transform('sum')

df['weighted'] =  df.population / df['pop_tot']

print(df)
       Country  Year    HDI  population     pop_tot  weighted
0  Afghanistan  1990    NaN  12249100.0  53673949.0  0.228213
1      Albania  1990  0.645   3286540.0  53673949.0  0.061232
2      Algeria  1990  0.577  25912400.0  53673949.0  0.482774
3      Andorra  1990    NaN     54509.0  53673949.0  0.001016
4       Angola  1990    NaN  12171400.0  53673949.0  0.226766

注意

我使用了您提供的小数据集作为示例：
    Country     Year    HDI     population
0   Afghanistan 1990    NaN     12249100.0
1   Albania     1990    0.645   3286540.0
2   Algeria     1990    0.577   25912400.0
3   Andorra     1990    NaN     54509.0
4   Angola      1990    NaN     12171400.0

它们的循环不起作用，因为它们在每次迭代时引用整个列谢谢，这是一种更好的方法problem@Daniel这样比使用loopfor更快：）谢谢，我现在也看到了这个循环的问题，谢谢，但是在这种情况下，我不确定您是否会是特定于年份的df3.population.sum（）将对人口列中的所有值求和，而不考虑年份。我将对示例进行编辑，使其更清晰。我知道，对我的原始答案进行了编辑，但这与Ben.T现在的答案基本相同，因此您应该接受该答案@丹尼尔
    Country     Year    HDI     population
0   Afghanistan 1990    NaN     12249100.0
1   Albania     1990    0.645   3286540.0
2   Algeria     1990    0.577   25912400.0
3   Andorra     1990    NaN     54509.0
4   Angola      1990    NaN     12171400.0