python键错误:“日期时间”

python键错误:“日期时间”,python,pandas,dataframe,datetime,Python,Pandas,Dataframe,Datetime,我有一个数据帧pd1和熊猫 pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':', header=None, names = ['date-time','domain','requests-qty','response-bytes'], parse_dates=[1], converters={'date-time': to_datetime}, index

我有一个数据帧pd1和熊猫

pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
                  header=None, names  = ['date-time','domain','requests-qty','response-bytes'],
                   parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')
带索引

>> pd1.index:  

 DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                '2016-01-01 00:00:00', '2016-01-01 00:00:00',
                ...
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00',
                '2016-08-05 12:00:00', '2016-08-05 12:00:00'],
               dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)
但当我想设置该列的索引时,我得到了如下错误:我最初想设置多列索引,出现了错误,然后试图从中创建其他数据帧pd_new_index=pd1。set_index['requests-qty','domain']将其他列作为索引确定,并将索引设置为“日期-时间”列back pd_new_2=pd_new_index。设置索引[“日期-时间”]-相同错误“日期时间”看起来不像特殊的关键字,而且该列现在是索引。为什么会出错

KeyError回溯最近的呼叫 最后的 C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py 在get_locself、key、method、tolerance 2656中尝试: ->2657返回自。_engine.get_lockey 2658,钥匙错误除外:

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

KeyError:“日期时间”

在处理上述异常期间,发生了另一个异常:

KeyError回溯最近的呼叫 最后的 -->1 pd_new_2=pd_new_索引。设置索引['date-time']

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in 设置索引self、键、拖放、附加、就位、验证完整性4176 names.appendNone 4177其他: ->4178级别=帧[col]。_值4179 name.appendcol 4180如果下降:

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in getitemself,如果self.columns.nlevels>1,则键2925:2926返回self.\u getitem\u multilevelkey ->2927 indexer=self.columns.get_lockey 2928 if is_integerIndex:2929 indexer=[indexer]

C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py 在get_locself中,键、方法、公差2657 返回自。_engine.get_lockey 2658,钥匙错误除外: ->2659返回自我。引擎。获取自我。可能会使用索引键2660 indexer=self.get\u indexer[key],method=method,tolerance=tolerance 2661如果indexer.ndim>1或indexer.size>1:

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

中的pandas/_libs/hashtable_class_helper.pxi pandas._libs.hashtable.PyObjectHashTable.get_项

KeyError:“日期时间”

原因是日期时间已经是索引,这里是DatetimeIndex,所以不可能像按名称选择列一样选择它

原因是参数索引_col:

对于索引列中的多索引添加列名列表,请删除转换器并在parse_dates参数中指定列名:

import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])
EDIT1:集合索引中带有append参数的解决方案:


如何向索引中添加其他列以创建类似pd1的索引。设置_index['date-time','domain']?我知道我可以追加,不是吗?pd_new_index4=pd1.set_index['domain'],append=True当在该命令之后运行pd_new_index_v4.head5时,它在其他列名下面显示两个首列名-就像之前的首列名一样。但是print pd_new_index_v4.index什么也不给,在其他一些单击之后,我没有足够的内存在jupyter中显示页面错误。我想这是另一个问题。但是append应该有用吗?@alexeimaritanov-我认为pd_new_index4=pd1.set_index['domain'],append=True是一个很好的解决方案,什么返回打印pd_new_index_v4.index?没什么?这很奇怪,我猜是内存不足的问题,我的数据集可能被认为是一个大的200Mb文本文件。或者它没有那么大?我怎么知道Jupyter只是落后了?@AlexeiMartianov-当然,这是call,u是用于Python2的unicode,现在应该删除它,因为Python3支持unicode
import pandas as pd
from io import StringIO

temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = ['date-time','domain'])

print (df)

date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])
import pandas as pd
from io import StringIO


temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp), 
                 sep=':',
                 header=None, 
                 names  = ['date-time','domain','requests-qty','response-bytes'],
                 parse_dates=['date-time'], 
                 index_col = 'date-time')

print (df)
           domain  requests-qty  response-bytes
date-time                                      
2016-01-01     d1             0               0
2016-01-02     d2             0               1
2016-01-03     d3             1               0

print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'], 
              dtype='datetime64[ns]', name='date-time', freq=None)
df1 = df.set_index(['domain'], append = True)
print (df1)
                   requests-qty  response-bytes
date-time  domain                              
2016-01-01 d1                 0               0
2016-01-02 d2                 0               1
2016-01-03 d3                 1               0

print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
            ('2016-01-02', 'd2'),
            ('2016-01-03', 'd3')],
           names=['date-time', 'domain'])