Python 基于一列或多列在dataframe中添加/插入值
我有两个数据帧Python 基于一列或多列在dataframe中添加/插入值,python,pandas,Python,Pandas,我有两个数据帧 date sitename Auto_name AutoCount 2012-05-01 chess.com Autobiographer 8 2012-05-05 chess.com Autobiographer 1 2012-05-15 chess.com Autobiographer 3 及
date sitename Auto_name AutoCount
2012-05-01 chess.com Autobiographer 8
2012-05-05 chess.com Autobiographer 1
2012-05-15 chess.com Autobiographer 3
及
输出应该是怎样的
date sitename Autoname AutoCount Stu_name Stu_count
2012-05-01 chess.com Autobiographer 8 Student 4
2012-05-02 chess.com Autobiographer 0 Student 2
2012-05-05 chess.com Autobiographer 1 Student 0
2012-05-15 chess.com Autobiographer 3 Student 0
我想将姓名和学生人数从第二列插入第一列,但以日期列为基础。这看起来没那么难,但我无法理解这一点。您可以使用
merge
功能(参见有关合并数据帧的文档:)。假设您的数据帧被称为df1
和df2
:
In [13]: df = pd.merge(df1, df2, how='outer')
In [14]: df
Out[14]:
date sitename Auto_name AutoCount Stu_name StudentCount
0 2012-05-01 chess.com Autobiographer 8 Student 4
1 2012-05-05 chess.com Autobiographer 1 NaN NaN
2 2012-05-15 chess.com Autobiographer 3 NaN NaN
3 2012-05-02 chess.com NaN NaN Student 2
在上面,它使用公共列进行合并(在本例中为date
和sitename
),但您也可以使用on
关键字指定列(请参阅)
在下一步中,可以根据需要填充NaN值。根据示例输出,这可以是:
In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]:
date sitename Auto_name AutoCount Stu_name StudentCount
0 2012-05-01 chess.com Autobiographer 8 Student 4
1 2012-05-05 chess.com Autobiographer 1 Student 0
2 2012-05-15 chess.com Autobiographer 3 Student 0
3 2012-05-02 chess.com Autobiographer 0 Student 2
非常感谢。我的想法是一样的。在多个列(本例中为日期和sitename)上进行合并是否比在单个列(仅为日期)上进行合并花费更多时间。我正在合并非常大的数据帧。因此,不必考虑计算时间。我不确定,但您可以随时测试并计时(例如,使用ipython中的
%timeit
)。
In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]:
date sitename Auto_name AutoCount Stu_name StudentCount
0 2012-05-01 chess.com Autobiographer 8 Student 4
1 2012-05-05 chess.com Autobiographer 1 Student 0
2 2012-05-15 chess.com Autobiographer 3 Student 0
3 2012-05-02 chess.com Autobiographer 0 Student 2