Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于一列或多列在dataframe中添加/插入值_Python_Pandas - Fatal编程技术网

Python 基于一列或多列在dataframe中添加/插入值

Python 基于一列或多列在dataframe中添加/插入值,python,pandas,Python,Pandas,我有两个数据帧 date sitename Auto_name AutoCount 2012-05-01 chess.com Autobiographer 8 2012-05-05 chess.com Autobiographer 1 2012-05-15 chess.com Autobiographer 3 及

我有两个数据帧

date       sitename  Auto_name                  AutoCount                         
2012-05-01 chess.com Autobiographer               8
2012-05-05 chess.com Autobiographer               1
2012-05-15 chess.com Autobiographer               3

输出应该是怎样的

date       sitename    Autoname                 AutoCount     Stu_name    Stu_count                     
2012-05-01 chess.com Autobiographer               8            Student       4
2012-05-02 chess.com Autobiographer               0            Student       2
2012-05-05 chess.com Autobiographer               1            Student       0
2012-05-15 chess.com Autobiographer               3            Student       0

我想将姓名和学生人数从第二列插入第一列,但以日期列为基础。这看起来没那么难,但我无法理解这一点。

您可以使用
merge
功能(参见有关合并数据帧的文档:)。假设您的数据帧被称为
df1
df2

In [13]: df = pd.merge(df1, df2, how='outer')

In [14]: df
Out[14]: 
         date   sitename       Auto_name  AutoCount Stu_name  StudentCount
0  2012-05-01  chess.com  Autobiographer          8  Student             4
1  2012-05-05  chess.com  Autobiographer          1      NaN           NaN
2  2012-05-15  chess.com  Autobiographer          3      NaN           NaN
3  2012-05-02  chess.com             NaN        NaN  Student             2
在上面,它使用公共列进行合并(在本例中为
date
sitename
),但您也可以使用
on
关键字指定列(请参阅)

在下一步中,可以根据需要填充NaN值。根据示例输出,这可以是:

In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]: 
         date   sitename       Auto_name  AutoCount Stu_name  StudentCount
0  2012-05-01  chess.com  Autobiographer          8  Student             4
1  2012-05-05  chess.com  Autobiographer          1  Student             0
2  2012-05-15  chess.com  Autobiographer          3  Student             0
3  2012-05-02  chess.com  Autobiographer          0  Student             2

非常感谢。我的想法是一样的。在多个列(本例中为日期和sitename)上进行合并是否比在单个列(仅为日期)上进行合并花费更多时间。我正在合并非常大的数据帧。因此,不必考虑计算时间。我不确定,但您可以随时测试并计时(例如,使用ipython中的
%timeit
)。
In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]: 
         date   sitename       Auto_name  AutoCount Stu_name  StudentCount
0  2012-05-01  chess.com  Autobiographer          8  Student             4
1  2012-05-05  chess.com  Autobiographer          1  Student             0
2  2012-05-15  chess.com  Autobiographer          3  Student             0
3  2012-05-02  chess.com  Autobiographer          0  Student             2