python键错误:“日期时间”
我有一个数据帧pd1和熊猫python键错误:“日期时间”,python,pandas,dataframe,datetime,Python,Pandas,Dataframe,Datetime,我有一个数据帧pd1和熊猫 pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':', header=None, names = ['date-time','domain','requests-qty','response-bytes'], parse_dates=[1], converters={'date-time': to_datetime}, index
pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
header=None, names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')
带索引
>> pd1.index:
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
...
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00'],
dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)
但当我想设置该列的索引时,我得到了如下错误:我最初想设置多列索引,出现了错误,然后试图从中创建其他数据帧pd_new_index=pd1。set_index['requests-qty','domain']将其他列作为索引确定,并将索引设置为“日期-时间”列back pd_new_2=pd_new_index。设置索引[“日期-时间”]-相同错误“日期时间”看起来不像特殊的关键字,而且该列现在是索引。为什么会出错
KeyError回溯最近的呼叫
最后的
C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py
在get_locself、key、method、tolerance 2656中尝试:
->2657返回自。_engine.get_lockey 2658,钥匙错误除外:
熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc
熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_项
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_项
KeyError:“日期时间”
在处理上述异常期间,发生了另一个异常:
KeyError回溯最近的呼叫
最后的
-->1 pd_new_2=pd_new_索引。设置索引['date-time']
C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in
设置索引self、键、拖放、附加、就位、验证完整性4176
names.appendNone 4177其他:
->4178级别=帧[col]。_值4179 name.appendcol 4180如果下降:
C:\ProgramData\Anaconda3\lib\site packages\pandas\core\frame.py in
getitemself,如果self.columns.nlevels>1,则键2925:2926返回self.\u getitem\u multilevelkey
->2927 indexer=self.columns.get_lockey 2928 if is_integerIndex:2929 indexer=[indexer]
C:\ProgramData\Anaconda3\lib\site packages\pandas\core\index\base.py
在get_locself中,键、方法、公差2657
返回自。_engine.get_lockey 2658,钥匙错误除外:
->2659返回自我。引擎。获取自我。可能会使用索引键2660
indexer=self.get\u indexer[key],method=method,tolerance=tolerance
2661如果indexer.ndim>1或indexer.size>1:
熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc
熊猫/_libs/index.pyx在熊猫中。_libs.index.IndexEngine.get_loc
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_项
中的pandas/_libs/hashtable_class_helper.pxi
pandas._libs.hashtable.PyObjectHashTable.get_项
KeyError:“日期时间”
原因是日期时间已经是索引,这里是DatetimeIndex,所以不可能像按名称选择列一样选择它
原因是参数索引_col:
对于索引列中的多索引添加列名列表,请删除转换器并在parse_dates参数中指定列名:
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = ['date-time','domain'])
print (df)
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])
EDIT1:集合索引中带有append参数的解决方案:
如何向索引中添加其他列以创建类似pd1的索引。设置_index['date-time','domain']?我知道我可以追加,不是吗?pd_new_index4=pd1.set_index['domain'],append=True当在该命令之后运行pd_new_index_v4.head5时,它在其他列名下面显示两个首列名-就像之前的首列名一样。但是print pd_new_index_v4.index什么也不给,在其他一些单击之后,我没有足够的内存在jupyter中显示页面错误。我想这是另一个问题。但是append应该有用吗?@alexeimaritanov-我认为pd_new_index4=pd1.set_index['domain'],append=True是一个很好的解决方案,什么返回打印pd_new_index_v4.index?没什么?这很奇怪,我猜是内存不足的问题,我的数据集可能被认为是一个大的200Mb文本文件。或者它没有那么大?我怎么知道Jupyter只是落后了?@AlexeiMartianov-当然,这是call,u是用于Python2的unicode,现在应该删除它,因为Python3支持unicode
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = ['date-time','domain'])
print (df)
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = 'date-time')
print (df)
domain requests-qty response-bytes
date-time
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'],
dtype='datetime64[ns]', name='date-time', freq=None)
df1 = df.set_index(['domain'], append = True)
print (df1)
requests-qty response-bytes
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])