Python 迭代df列并基于行索引、列引用返回数据帧中的值_Python_Python 2.7_Pandas

Python 迭代df列并基于行索引、列引用返回数据帧中的值

python python-2.7 pandas

Python 迭代df列并基于行索引、列引用返回数据帧中的值,python,python-2.7,pandas,Python,Python 2.7,Pandas,我的目标是将“年”列中的每个值与相应的列年（即1999年、2000年）进行比较。然后我想从相应的列返回相应的值。例如，对于阿富汗（第一行），2004年，我希望找到名为“2004”的列，并从包含阿富汗的行返回值这是桌子。为了便于参考，该表是单一定义年份的教育成就与1999-2010年各国gdp表之间sql联接的结果。我的最终目标是返回教育数据来源年份的gdp country year men_ed_yrs women_ed_yrs total_ed_yrs 1999 2

我的目标是将“年”列中的每个值与相应的列年（即1999年、2000年）进行比较。然后我想从相应的列返回相应的值。例如，对于阿富汗（第一行），2004年，我希望找到名为“2004”的列，并从包含阿富汗的行返回值

这是桌子。为了便于参考，该表是单一定义年份的教育成就与1999-2010年各国gdp表之间sql联接的结果。我的最终目标是返回教育数据来源年份的gdp

country year    men_ed_yrs  women_ed_yrs    total_ed_yrs    1999    2000    2001    2002    2003    2004    2005    2006    2007    2008    2009    2010
0   Afghanistan 2004    11  5   8   NaN NaN 2461666315  4128818042  4583648922  5285461999  6.275076e+09    7.057598e+09    9.843842e+09    1.019053e+10    1.248694e+10    1.593680e+10
1   Albania 2004    11  11  11  3414760915  3632043908  4060758804  4435078648  5746945913  7314865176  8.158549e+09    8.992642e+09    1.070101e+10    1.288135e+10    1.204421e+10    1.192695e+10
2   Algeria 2005    13  13  13  48640611686 54790060513 54744714110 56760288396 67863829705 85324998959 1.030000e+11    1.170000e+11    1.350000e+11    1.710000e+11    1.370000e+11    1.610000e+11
3   Andorra 2008    11  12  11  1239840270  1401694156  1484004617  1717563533  2373836214  2916913449  3.248135e+09    3.536452e+09    4.010785e+09    4.001349e+09    3.649863e+09    3.346317e+09
4   Anguilla    2008    11  11  11  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

gdp_ed_list = []
for value in df_combined_column_named['year']: #loops through each year in year column
        if value in df_combined_column_named.columns: #compares year to column names
            idx = df_combined_column_named[df_combined_column_named['year'][value]].index.tolist() #supposed to get the index associated with value
            gdp_ed = df_combined_column_named.get_value(idx, value) #get the value of the cell found at idx, value
            gdp_ed_list.append(gdp_ed) #append to a list

目前，我的代码被困在index.list（）部分。它正在返回错误：

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-85-361acb97edd4> in <module>()
      2 for value in df_combined_column_named['year']: #loops through each year in year column
      3     if value in df_combined_column_named.columns: #compares year to column names
----> 4         idx = df_combined_column_named[df_combined_column_named['year'][value]].index.tolist()
      5         gdp_ed = df_combined_column_named.get_value(idx, value)
      6         gdp_ed_list.append(gdp_ed)
KeyError: u'2004'

---------------------------------------------------------------------------
KeyError回溯（最近一次呼叫最后一次）
在（）
2对于df_组合_列中的值_命名为['year']：#在year列中循环遍历每一年
3如果df_combined_column_named.columns中的值：#将年份与列名进行比较
---->4 idx=df_combined_column_named[df_combined_column_named['year'][value]]。index.tolist（）
5 gdp_ed=df_组合_列_命名。获取_值（idx，值）
6 gdp\u ed\u列表。追加（gdp\u ed）
关键错误：u'2004'

有什么想法吗？

看起来您正在尝试将

年

列中的值与列标签匹配，然后提取相应单元格中的值。您可以通过循环行（见下文）来实现这一点，但我认为这不是最快的方式。相反，您可以使用将带有类似年份标签的列合并到单个列中，例如，

year\u col

：

In [38]: melted = pd.melt(df, id_vars=['country', 'year', 'men_ed_yrs', 'women_ed_yrs', 'total_ed_yrs'], var_name='year_col')

In [39]: melted
Out[39]: 
        country  year  men_ed_yrs  women_ed_yrs  total_ed_yrs year_col         value  
0   Afghanistan  2004          11             5             8     1999            NaN   
1       Albania  2004          11            11            11     1999   3.414761e+09   
2       Algeria  2005          13            13            13     1999   4.864061e+10   
3       Andorra  2008          11            12            11     1999   1.239840e+09   
4      Anguilla  2008          11            11            11     1999            NaN   
5   Afghanistan  2004          11             5             8     2000            NaN
...

以这种方式“融化”数据帧的好处是现在您将同时拥有

year

和

year\u col

列。您要查找的值位于

year

等于

year\u col

的行中。通过使用

.loc

，很容易获得：

In [41]: melted.loc[melted['year'] == melted['year_col']]
Out[41]: 
        country  year  men_ed_yrs  women_ed_yrs  total_ed_yrs year_col  \
25  Afghanistan  2004          11             5             8     2004   
26      Albania  2004          11            11            11     2004   
32      Algeria  2005          13            13            13     2005   
48      Andorra  2008          11            12            11     2008   
49     Anguilla  2008          11            11            11     2008   

           value  
25  5.285462e+09  
26  7.314865e+09  
32  1.030000e+11  
48  4.001349e+09  
49           NaN

因此，您可以使用

import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'1999': [nan, 3414760915.0, 48640611686.0, 1239840270.0, nan],
 '2000': [nan, 3632043908.0, 54790060513.0, 1401694156.0, nan],
 '2001': [2461666315.0, 4060758804.0, 54744714110.0, 1484004617.0, nan],
 '2002': [4128818042.0, 4435078648.0, 56760288396.0, 1717563533.0, nan],
 '2003': [4583648922.0, 5746945913.0, 67863829705.0, 2373836214.0, nan],
 '2004': [5285461999.0, 7314865176.0, 85324998959.0, 2916913449.0, nan],
 '2005': [6275076000.0, 8158549000.0, 103000000000.0, 3248135000.0, nan],
 '2006': [7057598000.0, 8992642000.0, 117000000000.0, 3536452000.0, nan],
 '2007': [9843842000.0, 10701010000.0, 135000000000.0, 4010785000.0, nan],
 '2008': [10190530000.0, 12881350000.0, 171000000000.0, 4001349000.0, nan],
 '2009': [12486940000.0, 12044210000.0, 137000000000.0, 3649863000.0, nan],
 '2010': [15936800000.0, 11926950000.0, 161000000000.0, 3346317000.0, nan],
 'country': ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Anguilla'],
 'men_ed_yrs': [11, 11, 13, 11, 11],
 'total_ed_yrs': [8, 11, 13, 11, 11],
 'women_ed_yrs': [5, 11, 13, 12, 11],
 'year': ['2004', '2004', '2005', '2008', '2008']})

melted = pd.melt(df, id_vars=['country', 'year', 'men_ed_yrs', 'women_ed_yrs', 
                              'total_ed_yrs'], var_name='year_col')
result = melted.loc[melted['year'] == melted['year_col']]
print(result)

为什么会出现

键错误

：

名为['year'][value]的

df\u组合列\u正在引发KeyError
。假设值
为'2004'
。然后，df\u combined\u column\u命名为['year']
是一个包含年的字符串表示并由整数（如0、1、2、…）索引的序列<代码>名为['year'][value]

的df\u组合列\u失败，因为它尝试使用不在整数索引中的字符串

'2004'

对该系列进行索引

或者，这里有另一种通过使用循环遍历行来实现目标的方法。这可能更容易理解，但通常使用

iterrows

是：

印刷品

       country  year         value
0  Afghanistan  2004  5.285462e+09
1      Albania  2004  7.314865e+09
2      Algeria  2005  1.030000e+11
3      Andorra  2008  4.001349e+09
4     Anguilla  2008           NaN

非常感谢您花时间提供这两种解决方案！我已经成功地实现了第二个。现在，我正在努力完成每一行，以确保我理解第一个解决方案。

       country  year         value
0  Afghanistan  2004  5.285462e+09
1      Albania  2004  7.314865e+09
2      Algeria  2005  1.030000e+11
3      Andorra  2008  4.001349e+09
4     Anguilla  2008           NaN