python键错误：“日期时间”_Python_Pandas_Dataframe_Datetime

python键错误：“日期时间”

python pandas dataframe datetime

python键错误：“日期时间”,python,pandas,dataframe,datetime,Python,Pandas,Dataframe,Datetime,我有一个数据帧pd1和熊猫 pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':', header=None, names = ['date-time','domain','requests-qty','response-bytes'], parse_dates=[1], converters={'date-time': to_datetime}, index

我有一个数据帧pd1和熊猫

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
                  header=None, names  = ['date-time','domain','requests-qty','response-bytes'],
                   parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')

带索引

>> pd1.index:  

 DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                ...
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00'],
               dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)

但当我想设置该列的索引时，我得到了如下错误：我最初想设置多列索引，出现了错误，然后试图从中创建其他数据帧pd_new_index=pd1。set_index['requests-qty'，'domain']将其他列作为索引确定，并将索引设置为“日期-时间”列back pd_new_2=pd_new_index。设置索引[“日期-时间”]-相同错误“日期时间”看起来不像特殊的关键字，而且该列现在是索引。为什么会出错

KeyError回溯最近的呼叫最后的 C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py 在get_locself、key、method、tolerance 2656中尝试： ->2657返回自。_engine.get_lockey 2658，钥匙错误除外：

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

KeyError:“日期时间”

在处理上述异常期间，发生了另一个异常：

KeyError回溯最近的呼叫最后的 -->1 pd_new_2=pd_new_索引。设置索引['date-time']

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in 设置索引self、键、拖放、附加、就位、验证完整性4176 names.appendNone 4177其他： ->4178级别=帧[col]。_值4179 name.appendcol 4180如果下降：

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in getitemself，如果self.columns.nlevels>1，则键2925:2926返回self.\u getitem\u multilevelkey ->2927 indexer=self.columns.get_lockey 2928 if is_integerIndex:2929 indexer=[indexer]

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py 在get_locself中，键、方法、公差2657 返回自。_engine.get_lockey 2658，钥匙错误除外： ->2659返回自我。引擎。获取自我。可能会使用索引键2660 indexer=self.get\u indexer[key]，method=method，tolerance=tolerance 2661如果indexer.ndim>1或indexer.size>1：

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

KeyError:“日期时间”

原因是日期时间已经是索引，这里是DatetimeIndex，所以不可能像按名称选择列一样选择它

原因是参数索引_col：

对于索引列中的多索引添加列名列表，请删除转换器并在parse_dates参数中指定列名：

import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

EDIT1:集合索引中带有append参数的解决方案：

如何向索引中添加其他列以创建类似pd1的索引。设置_index['date-time'，'domain']？我知道我可以追加，不是吗？pd_new_index4=pd1.set_index['domain']，append=True当在该命令之后运行pd_new_index_v4.head5时，它在其他列名下面显示两个首列名-就像之前的首列名一样。但是print pd_new_index_v4.index什么也不给，在其他一些单击之后，我没有足够的内存在jupyter中显示页面错误。我想这是另一个问题。但是append应该有用吗？@alexeimaritanov-我认为pd_new_index4=pd1.set_index['domain']，append=True是一个很好的解决方案，什么返回打印pd_new_index_v4.index？没什么？这很奇怪，我猜是内存不足的问题，我的数据集可能被认为是一个大的200Mb文本文件。或者它没有那么大？我怎么知道Jupyter只是落后了？@AlexeiMartianov-当然，这是call，u是用于Python2的unicode，现在应该删除它，因为Python3支持unicode

import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])

import pandas as pd
from io import StringIO


temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = 'date-time')

print (df)
           domain  requests-qty  response-bytes
date-time                                      
2016-01-01     d1             0               0
2016-01-02     d2             0               1
2016-01-03     d3             1               0

print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'], 
              dtype='datetime64[ns]', name='date-time', freq=None)

df1 = df.set_index(['domain'], append = True)
print (df1)
                   requests-qty  response-bytes
date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])