Python 根据索引选择行
我试着用Python 根据索引选择行,python,pandas,Python,Pandas,我试着用 data = data.loc['bids:'] 获取与索引对应的所有行 文本文件中的示例数据: {"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5",
data = data.loc['bids:']
获取与索引对应的所有行
文本文件中的示例数据:
{"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5","14726172043","1"],["871073.96500","0.00100000","0.00100000","0","1081537185","29194","5","14726172231","1"]]],
"asks":[["875644.72000","0.00200000","0.00200000","0","1081606189","29194","5","14726356256","1"],["875669.77637","0.01000000","0.01000000","0","1081606227","29194","5","14726356379","1"],["875678.92000","0.00600000","0.00600000","0","1081606263","29194","5","14726356488","1"],["875731.74364","0.03000000","0.03000000","0","1081606233","29194","5","14726356393","1"],
代码示例:
data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)#, names= ['a','d','f','r','y','h','n','m','k'])
new = data[1].str.split("[", n = 1, expand = True)
data[1]= new[0]
data[10]= new[1]
data.drop(data.index[-1], inplace=True)
data[10]= new[1].str.strip('[').str.strip('"')
data = data.set_index([1,2])
data = data.loc[:,[10]]
data = data.loc['bids:']
数据样本:
bids: 0.002000 871094.22000
0.008080 871076.11000
0.001000 871073.96500
bids: 0.005000 871042.87000
0.005000 871038.55000
0.001000 871032.90156
代码输出:
bids: 0.002000 871094.22000
bids: 0.005000 871042.87000
请问这6排怎么走?其目的是在其他索引标签之间进行筛选
索引输出为:
Index(['bids:', '', '', '', '', '', '', '', '', '',
...
'asks:', '', '', '', '', '', '', '', '', '',
...
'bids:', '', '', '', '', '', '', '', '', '',
...'],
dtype='object', name=1, length=505148)
我相信你需要:
print (data)
10
1 2
bids 0.00200000 871094.22000
0.00808000 871076.11000
0.00100000 871073.96500
asks 0.00200000 875644.72000
0.01000000 875669.77637
0.00600000 875678.92000
0.03000000 875731.74364
bids 0.00200000 871094.22000
0.00808000 871076.11000
0.00100000 871073.96500
print (data.index)
MultiIndex(levels=[['asks', 'bids'],
['0.00100000', '0.00200000', '0.00600000',
'0.00808000', '0.01000000', '0.03000000']],
codes=[[1, 1, 1, 0, 0, 0, 0, 1, 1, 1], [1, 3, 0, 1, 4, 2, 5, 1, 3, 0]],
names=[1, 2])
按第一级多索引
求解第一个值,具有重复值:
s = data.index.get_level_values(0).to_series()
mask = s.ne(s.shift())
print (mask)
1
bids True
bids False
bids False
asks True
asks False
asks False
asks False
bids True
bids False
bids False
Name: 1, dtype: bool
df = data[mask.values]
print (df)
10
1 2
bids 0.00200000 871094.22000
asks 0.00200000 875644.72000
bids 0.00200000 871094.22000
df = df.xs('bids', drop_level=False)
print (df)
10
1 2
bids 0.00200000 871094.22000
0.00200000 871094.22000
如果没有多索引
:
print (data)
2 10
1
bids 0.00200000 871094.22000
bids 0.00808000 871076.11000
bids 0.00100000 871073.96500
asks 0.00200000 875644.72000
asks 0.01000000 875669.77637
asks 0.00600000 875678.92000
asks 0.03000000 875731.74364
bids 0.00200000 871094.22000
bids 0.00808000 871076.11000
bids 0.00100000 871073.96500
@ALollz只是添加了一个假设,当行索引为“”时,Pandas认为没有索引。不,它认为索引是空字符串。您可能想要创建一个多索引,其中第一个级别是['bids','asks',],第二个级别是第一个小数字。如果您可以包含创建数据帧的代码,或者至少是您正在显示的示例版本,这将有所帮助。例如,类似于
df=pd.DataFrame({'some_column':[1.0,2.0….
。这样,我们就知道你的数据框的结构是什么样的,我们可以复制粘贴代码来开发和测试我们的答案。现在,它没有以标准格式呈现,告诉我们你的数据到底发生了什么。谢谢!@Mike原始数据来自我解析的文本文件。你还记得吗你觉得应该贴出来吗?这会让这篇文章很难读。谢谢
print (data.index)
Index(['bids', 'bids', 'bids', 'asks', 'asks', 'asks', 'asks', 'bids', 'bids',
'bids'],
dtype='object', name=1)
s = data.index.to_series()
mask = s.ne(s.shift())
print (mask)
1
bids True
bids False
bids False
asks True
asks False
asks False
asks False
bids True
bids False
bids False
Name: 1, dtype: bool
df = data[mask.values].loc['bids']
print (df)
2 10
1
bids 0.00200000 871094.22000
bids 0.00200000 871094.22000