Python 为什么将xlrd.xldate_作为_datetime（）函数应用不会按预期更新数据帧的子集？_Python_Python 3.x_Datetime_Series_Xlrd

Python 为什么将xlrd.xldate_作为_datetime（）函数应用不会按预期更新数据帧的子集？

python python-3.x datetime

Python 为什么将xlrd.xldate_作为_datetime（）函数应用不会按预期更新数据帧的子集？,python,python-3.x,datetime,series,xlrd,Python,Python 3.x,Datetime,Series,Xlrd,我从excel文件中提取了一个数据框，该文件有一个datetime列，但excel日期格式中有几个值，如下所示： import pandas as pd import numpy as np import xlrd rnd = np.random.randint(0,1000,size=(10, 1)) test = pd.DataFrame(data=rnd,index=range(0,10),columns=['rnd']) test['Date'] = pd.date_range(sta

我从excel文件中提取了一个数据框，该文件有一个datetime列，但excel日期格式中有几个值，如下所示：

import pandas as pd
import numpy as np
import xlrd

rnd = np.random.randint(0,1000,size=(10, 1))
test = pd.DataFrame(data=rnd,index=range(0,10),columns=['rnd'])
test['Date'] = pd.date_range(start='1/1/1979', periods=len(test), freq='D')
r1 = np.random.randint(0,5)
r2 = np.random.randint(6,10)
test.loc[r1, 'Date'] = 44305
test.loc[r2, 'Date'] = 44287
test

    rnd     Date
0   56  1979-01-01 00:00:00
1   557     1979-01-02 00:00:00
2   851     1979-01-03 00:00:00
3   553     44305
4   258     1979-01-05 00:00:00
5   946     1979-01-06 00:00:00
6   930     1979-01-07 00:00:00
7   805     1979-01-08 00:00:00
8   362     44287
9   705     1979-01-10 00:00:00

当我尝试单独使用xlrd.xldate_as_datetime函数转换错误日期时，我得到了一个格式正确的序列：

# Identifying the index of dates in int format
idx_ints = test[test['Date'].map(lambda x: isinstance(x, int))].index

test.loc[idx_ints, 'Date'].map(lambda x: xlrd.xldate_as_datetime(x, 0))

3   2021-04-19
8   2021-04-01
Name: Date, dtype: datetime64[ns]

但是，当我尝试在适当的位置应用更改时，我得到了一个完全不同的int：

test.loc[idx_ints,'Date'] = test.loc[idx_ints, 'Date'].map(lambda x: xlrd.xldate_as_datetime(x, 0))

test
  
    rnd     Date
0   56  1979-01-01 00:00:00
1   557     1979-01-02 00:00:00
2   851     1979-01-03 00:00:00
3   553     1618790400000000000
4   258     1979-01-05 00:00:00
5   946     1979-01-06 00:00:00
6   930     1979-01-07 00:00:00
7   805     1979-01-08 00:00:00
8   362     1617235200000000000
9   705     1979-01-10 00:00:00

任何想法，或者我的日期整数转换问题的替代解决方案，谢谢

将我链接的答案的逻辑颠倒过来，这对您的测试df很有效：

# where you have numeric values, i.e. "excel datetime format":
nums = pd.to_numeric(test['Date'], errors='coerce') # timestamps will give NaN here
# now first convert the excel dates:
test.loc[nums.notna(), 'datetime'] = pd.to_datetime(nums[nums.notna()], unit='d', origin='1899-12-30')
# ...and the other, "parseable" timestamps:
test.loc[nums.isna(), 'datetime'] = pd.to_datetime(test['Date'][nums.isna()])

test
   rnd                 Date   datetime
0  840                44305 2021-04-19
1  298  1979-01-02 00:00:00 1979-01-02
2  981  1979-01-03 00:00:00 1979-01-03
3  806  1979-01-04 00:00:00 1979-01-04
4  629  1979-01-05 00:00:00 1979-01-05
5  540  1979-01-06 00:00:00 1979-01-06
6  499  1979-01-07 00:00:00 1979-01-07
7  155  1979-01-08 00:00:00 1979-01-08
8  208                44287 2021-04-01
9  737  1979-01-10 00:00:00 1979-01-10

如果您的输入已经有了datetime对象而不是时间戳字符串，您可以跳过转换，只需将值转移到新的列中即可。

related：谢谢，它是related，但是，当我尝试更新DataFrame时，我得到了与上面完全相同的结果。我发现，

xlrd.xldate\u as\u datetime

从纪元（Unix时间）开始就以ns为单位保留时间戳，而不是将所有内容转换为pandas datetime感谢这一点-如果我像您这样创建一个新列，它会起作用，否则，如果我尝试更新现有列，结果将是历元time@at8865，hm您还应该能够使用

test.loc[nums.notna（），'Date']

进行“就地”转换。您可能需要将另一个

pd.添加到_datetime（test.Date）

，以便该系列的每个项目都具有相同的数据类型。