Python 我可以更新HDFStore吗？_Python_Pandas_Hdf5_Hdfstore

Python 我可以更新HDFStore吗？

python pandas

Python 我可以更新HDFStore吗？,python,pandas,hdf5,hdfstore,Python,Pandas,Hdf5,Hdfstore,考虑以下hdfstore和数据帧df和df2 import pandas as pd store = pd.HDFStore('test.h5') 我想先将df写入商店 store.append('df', df) store.get('df') C A B 0 X 0 Y 1 Z 2 1 X 3 Y 4 Z 5 在以后的某个时间点，我将有另一个数据帧，我想用它来更新存储。我想用与新数据帧中相同的索引值覆盖行，同时保留旧的索引值当我

考虑以下

hdfstore

和数据帧

df

和

df2

import pandas as pd

store = pd.HDFStore('test.h5')

我想先将

df

写入商店

store.append('df', df)

store.get('df')

     C
A B   
0 X  0
  Y  1
  Z  2
1 X  3
  Y  4
  Z  5

在以后的某个时间点，我将有另一个数据帧，我想用它来更新存储。我想用与新数据帧中相同的索引值覆盖行，同时保留旧的索引值

当我这样做的时候

store.append('df', df2)

store.get('df')

     C
A B   
0 X  0
  Y  1
  Z  2
1 X  3
  Y  4
  Z  5
0 V  0
  W  1
  X  2
1 V  3
  W  4
  X  5

这根本不是我想要的。请注意，

（0，'X'）

和

（1，'X'）

是重复的。我可以操作组合的数据帧和覆盖，但我希望处理大量不可行的数据

如何更新商店以获取

您将看到，对于

'A'

的每个级别，

'Y'

和'

Z'

是相同的，

'V'

和

'W'

是新的，

'X'

是更新的

正确的方法是什么？

想法：首先从HDF中删除匹配的行（具有匹配的索引值），然后将

df2

附加到HDFStore

问题：我找不到一种方法将

where=“index in df2.index”

用于多索引索引

解决方案：首先将多索引转换为普通索引：

df.index = df.index.get_level_values(0).astype(str) + '_' + df.index.get_level_values(1).astype(str)

df2.index = df2.index.get_level_values(0).astype(str) + '_' + df2.index.get_level_values(1).astype(str)

这将产生：

In [348]: df
Out[348]:
     C
0_X  0
0_Y  1
0_Z  2
1_X  3
1_Y  4
1_Z  5

In [349]: df2
Out[349]:
     C
0_V  0
0_W  1
0_X  2
1_V  3
1_W  4
1_X  5

创建/附加HDF5文件时，请确保使用

format='t'

和

data\u columns=True

（这将索引保存HDF5文件中的index和index所有列，允许我们在

where

子句中使用它们）：

store = pd.HDFStore('d:/temp/test1.h5')
store.append('df', df, format='t', data_columns=True)
store.close()

现在，我们可以首先使用匹配的索引从HDFStore中删除这些行：

store = pd.HDFStore('d:/temp/test1.h5')

In [345]: store.remove('df', where="index in df2.index")
Out[345]: 2

并附加

df2

：

In [346]: store.append('df', df2, format='t', data_columns=True, append=True)

结果:

In [347]: store.get('df')
Out[347]:
     C
0_Y  1
0_Z  2
1_Y  4
1_Z  5
0_V  0
0_W  1
0_X  2
1_V  3
1_W  4
1_X  5

你能使用普通（非多索引）索引吗？是的。。。我的真实数据有多个索引，但是如果你用单索引展示一些东西，我很高兴。好的，我需要一些时间来准备演示…非常感谢！我在那里学到了很多。我现在有了一些想法。我会回来报到的，如果有帮助我会很高兴的。是的，请就您的最终解决方案给出简短的反馈。这也将帮助那些有相同问题的人…df.index中的

where=“index”

语法存在问题。有关解释和解决方法，请参见熊猫。

In [346]: store.append('df', df2, format='t', data_columns=True, append=True)

In [347]: store.get('df')
Out[347]:
     C
0_Y  1
0_Z  2
1_Y  4
1_Z  5
0_V  0
0_W  1
0_X  2
1_V  3
1_W  4
1_X  5