Python 填充数据帧_Python_Pandas_Dataframe

Python 填充数据帧

python pandas dataframe

Python 填充数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧df： AuthorID Year citations 0 1 1995 86 1 2 1995 22 2 3 1995 22 3 4 1995 22 4 5 1995 36 5 6 1995 25 1994 1995 1996 1

我有一个数据帧

df

：

    AuthorID  Year  citations
0          1  1995         86
1          2  1995         22
2          3  1995         22
3          4  1995         22
4          5  1995         36
5          6  1995         25

         1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  
1           0     0     0     0     0     0     0     0     0     0     0   
2           0     0     0     0     0     0     0     0     0     0     0   
3           0     0     0     0     0     0     0     0     0     0     0   
4           0     0     0     0     0     0     0     0     0     0     0   
5           0     0     0     0     0     0     0     0     0     0     0   
6           0     0     0     0     0     0     0     0     0     0     0

我创建了另一个数据帧，并将其全部初始化为零，其中每个索引表示来自

df

的

AuthorID

：

    AuthorID  Year  citations
0          1  1995         86
1          2  1995         22
2          3  1995         22
3          4  1995         22
4          5  1995         36
5          6  1995         25

         1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  
1           0     0     0     0     0     0     0     0     0     0     0   
2           0     0     0     0     0     0     0     0     0     0     0   
3           0     0     0     0     0     0     0     0     0     0     0   
4           0     0     0     0     0     0     0     0     0     0     0   
5           0     0     0     0     0     0     0     0     0     0     0   
6           0     0     0     0     0     0     0     0     0     0     0

现在我要做的是迭代

df

，并将引用值添加到第二个矩阵中的正确位置。因此，如果我要根据上面的内容填写

df2

，它将如下所示：

         1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  
1           0     86     0     0     0     0     0     0     0     0     0   
2           0     22     0     0     0     0     0     0     0     0     0   
3           0     22     0     0     0     0     0     0     0     0     0   
4           0     36     0     0     0     0     0     0     0     0     0   
5           0     25     0     0     0     0     0     0     0     0     0   
6           0     0     0     0     0     0     0     0     0     0     0

这很简单

现在我所做的是：

for index, row in df.iterrows():
     df2.iloc[row[0]][row[1]] = df2.iloc[row[0]][row[1]] + row[2]

IndexError: index out of bounds

但它一直给我以下信息：

for index, row in df.iterrows():
     df2.iloc[row[0]][row[1]] = df2.iloc[row[0]][row[1]] + row[2]

IndexError: index out of bounds

所以我试着：

for index, row in df.iterrows():
     df2.at[row[0], row[1]] = df2.at[row[0], row[1]] + row[2]

它给了我：

ValueError: At based indexing on an non-integer index can only have non-integer indexers

我也尝试了

df.iat

，但也没有成功

不知道我做错了什么。当我检查

df.dtypes

时，它们都返回

int64

为什么不能像这样旋转第一个数据帧

>> df.pivot(index='AuthorID', columns='Year', values='citations')

这将带来所有年份，列和索引将是您的

作者

因此，要实现您的愿望还有很长的路要走：为每个作者指定1995年以外的其他年份的1/3值

是您的数据帧

y = pd.DataFrame([[i, y, 0] for y in [1996,1997,1998] for i in x.AuthorID], columns=['AuthorID','Year','citations'])
z = x.append(y)

我们将为下面的每个作者添加年份：1996年、1997年和1998年，并存储在

数据框中

y = pd.DataFrame([[i, y, 0] for y in [1996,1997,1998] for i in x.AuthorID], columns=['AuthorID','Year','citations'])
z = x.append(y)

下面，我们将1995年引文的1/3分配给同一作者的所有其他年份

for id in z['AuthorID'].unique():
    condition = (z['AuthorID']==id) & (z['Year']>1995)
    citation2 = (z.loc[(z['Year']==1995) & (z['AuthorID']==id),'citations']/3).values
    z['citations'][condition] = citation2

In [1541]: z.pivot(index='AuthorID', columns='Year', values='citations')
Out[1541]: 
Year      1995       1996       1997       1998
AuthorID                                       
1           86  28.666667  28.666667  28.666667
2           22   7.333333   7.333333   7.333333
3           22   7.333333   7.333333   7.333333
4           22   7.333333   7.333333   7.333333
5           36  12.000000  12.000000  12.000000
6           25   8.333333   8.333333   8.333333

我真的想用引文做一些计算，而不是像我重复的那样。但我想用这种方式解释可能更简单。我真正想要的是在出版年之后的三年内平均分配引用的价值。所以authorid=1，year=1995，引文=86，将在df2[1][1996']，df2[1][1997']和df2[1][1998']中加上86/3，如果我只有1995年到1998年，这就行了，事实并非如此。迭代第一个df并添加到第二个df2有什么问题？如果需要，我希望能够灵活地进行其他计算。我不知道为什么基本循环不起作用？