Java 为什么BatchInserterIndex精确查询多个字段会返回所有节点？_Java_Lucene_Neo4j

Java 为什么BatchInserterIndex精确查询多个字段会返回所有节点？

java lucene neo4j

Java 为什么BatchInserterIndex精确查询多个字段会返回所有节点？,java,lucene,neo4j,Java,Lucene,Neo4j,我正在将一堆数据（14M个节点，460M个边）加载到neo4j数据库中，并使用BatchInserter来实现性能目的。我在两个过程中加载数据：首先是节点，然后是边，在添加边时使用BatchInserterIndex查询节点ID 每个节点有两个属性：名字类型名称不是唯一的，但名称+类型是唯一的。这意味着我不能使用get（字符串键，Obj值）查询索引；所以我使用的是query（objectquery），它的文档记录很差。我抄袭了Ruby文档，其中的查询对象看起来应该是Lucene查询但

我正在将一堆数据（14M个节点，460M个边）加载到neo4j数据库中，并使用BatchInserter来实现性能目的。我在两个过程中加载数据：首先是节点，然后是边，在添加边时使用BatchInserterIndex查询节点ID

每个节点有两个属性：

名字
类型

名称不是唯一的，但名称+类型是唯一的。这意味着我不能使用

get（字符串键，Obj值）

查询索引；所以我使用的是

query（objectquery）

，它的文档记录很差。我抄袭了Ruby文档，其中的查询对象看起来应该是Lucene查询

但是，当我查询

名称：“thename”类型：“thetype”

时，我会得到数据库中所有节点的列表

如果所有其他方法都失败了，我可以添加第三个属性“nametype”，只是为了获得用于批插入的唯一ID，但如果不需要，我宁愿不添加。知道发生了什么吗

片段：

// the load-nodes phase:
BatchInserter inserter = BatchInserters.inserter(dbDir);
Map<String, Object> properties = new HashMap<String, Object>();
BatchInserterIndexProvider indexProvider = 
    new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex nodes = 
    indexProvider.nodeIndex( NODEINDEX, MapUtil.stringMap( "type", "exact" ) );

// for file in filelist
    // all nodes in a file have the same type
    properties.put( NODETYPE_KEY, types.get(file) );
    // for line in file:
        properties.put( NODENAME_KEY, line );
        long node = inserter.createNode( properties );
        nodes.add(node, properties);
    // \for
// \for

// ...

// the load-edges phase:
BatchInserter inserter = BatchInserters.inserter(dbDir);
BatchInserterIndexProvider indexProvider = 
    new LuceneBatchInserterIndexProvider( inserter );
BatchInserterIndex nodes = 
    indexProvider.nodeIndex( NODEINDEX, MapUtil.stringMap( "type", "exact" ) );
nodes.setCacheCapacity( NODENAME_KEY, cache );

// for line in file
    String fromType = fromTypes.get(file);
    String fromName = parseFromName(line);
    String query = String.format("%s:\"%s\" %s:\"%s\"",
        NODETYPE_KEY,fromType,NODENAME_KEY,fromName);
    IndexHits<Long> froms = nodes.query(query);
    // froms has #nodes results ?!
// \for

//加载节点阶段：
BatchInserter-inserter=BatchInserters.inserter（dbDir）；
映射属性=新的HashMap（）；
BatchInserteIndexProvider indexProvider=
新LuceneBatchInserterIndexProvider（插入器）；
BatchInserterIndex节点=
nodeIndex（nodeIndex，MapUtil.stringMap（“类型”，“精确”））；
//对于文件列表中的文件
//文件中的所有节点都具有相同的类型
properties.put（NODETYPE_KEY，types.get（file））；
//对于文件中的行：
properties.put（NODENAME_键，行）；
长节点=inserter.createNode（属性）；
添加（节点、属性）；
//\为
//\为
// ...
//加载边缘阶段：
BatchInserter-inserter=BatchInserters.inserter（dbDir）；
BatchInserteIndexProvider indexProvider=
新LuceneBatchInserterIndexProvider（插入器）；
BatchInserterIndex节点=
nodeIndex（nodeIndex，MapUtil.stringMap（“类型”，“精确”））；
setCacheCapacity（NODENAME_键，缓存）；
//对于文件中的行
String fromType=fromTypes.get（文件）；
String fromName=parseFromName（行）；
字符串查询=String.format（“%s:\%s\%s:\%s\”，
节点类型键、fromType、节点名称键、fromName）；
IndexHits froms=nodes.query（查询）；
//froms有#个节点结果？！
//\为

aaaaaaaaa Lucene中的默认连词是“或”：-/

我明确地做到了，而且成功了

此外，我尝试了类型和名称可选的第三个键连接。在本例中，index.get（key，val）的速度大约是index.query（lucene_表达式）的两倍，构造和存储额外属性使节点加载速度降低了50%。由于我的数据集的关系数是节点数的40倍，因此向每个节点添加额外属性实际上是有意义的。YMMV.

Aaaaaaaaaaa Lucene中的默认连词是“或”：-/

我明确地做到了，而且成功了

此外，我尝试了类型和名称可选的第三个键连接。在本例中，index.get（key，val）的速度大约是index.query（lucene_表达式）的两倍，构造和存储额外属性使节点加载速度降低了50%。由于我的数据集的关系数是节点数的40倍，因此向每个节点添加额外属性实际上是有意义的。YMMV