Python 多索引哈希表的实现_Python_C++_Boost_Data Structures

Python 多索引哈希表的实现

python c++ boost data-structures

Python 多索引哈希表的实现,python,c++,boost,data-structures,Python,C++,Boost,Data Structures,我有一个需要快速查找的记录表，因此决定进行哈希表查找现在，最终出现的问题是，我必须根据多个键查找记录例如，下面所有4个键都应指向相同的记录 key1 -> a,b,c,d,e key2 -> a,b,d key3 -> a,b,e key4 -> c 问题1 然后，该模式显示了与数据库查找的相似性，其中指定了多个键。那么，B树数据结构是否比多哈希表设计更适合使用呢问题2 一个特殊的trie是否更适合这个问题。默认实现需要所有键a+b+c+d+e作为查找键。如果我必

我有一个需要快速查找的记录表，因此决定进行哈希表查找

现在，最终出现的问题是，我必须根据多个键查找记录

例如，下面所有4个键都应指向相同的记录

key1 -> a,b,c,d,e
key2 -> a,b,d
key3 -> a,b,e
key4 -> c

问题1 然后，该模式显示了与数据库查找的相似性，其中指定了多个键。那么，B树数据结构是否比多哈希表设计更适合使用呢

问题2 一个特殊的trie是否更适合这个问题。默认实现需要所有键a+b+c+d+e作为查找键。如果我必须查找a+b+d，那么在查找时，从这个主密钥将不得不跳过c&e。但这样的想法会奏效还是已经存在

问题3 另一个想法是，我是否将内容插入到表中，同时建立另一个查找表，每个记录都有索引。这样，我可以为每个键设置多个掩码，并扫描此查找表以查找匹配的记录。我想是类似于CAM表的东西。但如果我必须扫描整个表，性能就会下降。是否有可能将哈希表和索引逻辑混合在一起，以提供速度和最佳内存使用率

到目前为止，我们已经尝试使用boostmulti-index、uthash、trie等来尝试实现一个适合所有4个问题的设计，但迄今为止还没有成功。我喜欢boost multi index，但它也有自己的问题，禁止我使用

虽然我使用C语言来编程和测试设计，但我对java、php、python等其他语言都很在行

如有任何其他解决此问题的想法，将不胜感激

我想实现的解决方案的伪代码：

/* Keys */
struct key1_s {
int src;
int dst;
char name[10];
int t1;
int t2;
};

struct key2_s {
int src;
int dst;
char name[10];
};

struct key3_s {
int src;
int dst;
int t1;
};


struct key4_s {
int src;
int dst;
int t2;
};


/* Record */
struct record_s {
int src;
int dst;
char name[10];
int t1;
int t2;
int age;
int sex;
int mobile;
}

struct record_s record[2] = {
{1, 2, "jack", 5, 6, 50, 1, 1234567890},
{3, 4, "john", 7, 8, 60, 2, 1122334455}
};
table.insert(record[0]);
table.insert(record[1]);

/* search using key1 */
struct key1_s key1;
key1.src = 1;
key1.dst = 2;
strncpy(key1.name, "jack", 10);
key1.t1 = 5;
key1.t2 = 6;
table.find(key1); // should return pointer to record[0]

/* search using key2 */
struct key2_s key2;
key2.src = 1;
key2.dst = 2;
strncpy(key1.name, "jack", 10);
table.find(key2); // should return pointer to record[0]

/* search using key3 */
struct key3_s key3;
key3.src = 1;
key3.dst = 2;
key3.t1  = 5;
table.find(key3); // should return pointer to record[0]

如果查找结果返回了一个成功的指针，那么我想更新记录字段，如年龄、性别、手机等。

Boost Multi Index可以在这里提供帮助

composite_keys.cpp示例包含一个引人注目的示例。您只需要全局地将ordered替换为hash就可以得到您正在处理的内容，在您的情况下，密钥配置中会有更多的重叠

关于性能问题，我认为没有明确的答案；它始终取决于使用模式。您需要分析并平衡优化过程中花费的工作量

我个人认为Boost多指标在关注方便和快速结果时是一个甜蜜的话题。注意，这绝不意味着BMI没有优化，我相信它是高度优化的；但是，它将/始终/取决于使用模式。考虑一个最初批量插入大量数据的应用程序，然后读取；这样的应用程序可以受益于显式地构建一次索引，而不是在每次插入时自动更新所有索引

看到了吗

sry对于表示法中的混淆，为了澄清我的意思，添加了一些伪代码。我使用BMI解决了这个问题，但发现了以下用例中的一些问题：我使用BMI解决了这个问题，但在尝试解决我的问题时遇到了一些问题。我的使用模式是，最初会有批量插入，之后会以高速率进行更新/删除。我觉得这就是我被击中的地方。每当我需要更新表中的记录时，调用table.replacekey，newrecord实际上会删除旧记录并插入新记录。我发现这是一个相当昂贵的手术。相反，我希望如果我能得到一个指向它的指针，我可以继续更新记录。除非你不更新关键字段，否则你不能避免更新索引。除非您修改关键字段，否则可以通过引用进行修改。请使用“修改”而不是“替换”：这是最快的，不会发生删除/插入。@joaqínMLópezMuñoz我想如果您知道自己没有更新关键字段，直接在对象上操作会更快吗？还有，你不喜欢OP给出的特定场景吗？我承认在看到你的名字之前我已经忘记了：

using namespace boost::multi_index;

/* A file record maintains some info on name and size as well
 * as a pointer to the directory it belongs (null meaning the root
 * directory.)
 */

struct file_entry
{
  file_entry(
    std::string name_,unsigned size_,bool is_dir_,const file_entry* dir_):
    name(name_),size(size_),is_dir(is_dir_),dir(dir_)
  {}

  std::string       name;
  unsigned          size;
  bool              is_dir;
  const file_entry* dir;

  friend std::ostream& operator<<(std::ostream& os,const file_entry& f)
  {
      os << f.name << "\t" << f.size;
      if (f.is_dir)os << "\t <dir>";
      return os;
  }
};

/* A file system is just a multi_index_container of entries with indices on
 * file and size (per directory). 
 */
struct dir_and_name_key:composite_key<
  file_entry,
  BOOST_MULTI_INDEX_MEMBER(file_entry,const file_entry*,dir),
  BOOST_MULTI_INDEX_MEMBER(file_entry,std::string,name)
>{};

struct dir_and_size_key:composite_key<
  file_entry,
  BOOST_MULTI_INDEX_MEMBER(file_entry,const file_entry* const,dir),
  BOOST_MULTI_INDEX_MEMBER(file_entry,unsigned,size)
>{};

typedef multi_index_container<
  file_entry,
  indexed_by<
    hashed_unique<dir_and_name_key>,
    hashed_non_unique<dir_and_size_key>
  >
> file_system;

/* typedef's of the two indices of file_system */
typedef nth_index<file_system,0>::type file_system_by_name;
typedef nth_index<file_system,1>::type file_system_by_size;

/* We build a rudimentary file system simulation out of some global
 * info and a map of commands provided to the user.
 */

static file_system fs;                 /* the one and only file system */
static file_system_by_name& fs_by_name=fs;         /* name index to fs */
static file_system_by_size& fs_by_size=get<1>(fs); /* size index to fs */
static const file_entry* current_dir=0;            /* root directory   */