Join 加入Lucene_Join_Lucene - Fatal编程技术网

Join 加入Lucene

join lucene

Join 加入Lucene,join,lucene,Join,Lucene,有什么方法可以在Lucene中实现联接吗？Lucene不支持文档之间的关系，但是联接只是括号内多个和的特定组合，但是您需要先将关系展平示例（SQL=>Lucene）： SQL: 卢塞恩：确保文档上有所有必需字段及其各自的值，如： Customer.Name=>“Customer\u Name”和 Order.Nr=>“Order\u Nr” 然后，查询将是： ( Customer_Name:"SomeName" AND Order_Nr:"400" ) 您可以手动执行通用联接-运行两次搜索

有什么方法可以在Lucene中实现联接吗？

Lucene不支持文档之间的关系，但是联接只是括号内多个
和
的特定组合，但是您需要先将关系展平
示例（SQL=>Lucene）：
SQL:
卢塞恩：
确保文档上有所有必需字段及其各自的值，如： Customer.Name=>“Customer\u Name”和
Order.Nr=>“Order\u Nr”
然后，查询将是：

( Customer_Name:"SomeName" AND Order_Nr:"400" )

您可以手动执行通用联接-运行两次搜索，获取所有结果（而不是前N个），在联接键上对它们进行排序，并使两个有序列表相交。但这会对你的堆造成很大的冲击（如果这些列表都适合的话）
有可能的优化，但在非常特定的条件下。
也就是说，您进行自连接，只使用（随机访问）
过滤器进行过滤，不使用查询。然后，您可以在两个联接字段上手动迭代术语（并行），为每个术语交叉docId列表，过滤它们-这是您的联接有一种处理简单亲子关系的流行用例的方法，每个文档中的子对象数量相对较少- 与@ntziolis提到的扁平化方法不同，这种方法正确地处理了以下情况：拥有大量简历，每个简历都有多个有工作经验的孩子，并尝试寻找在YYY年在NNN公司工作的人。如果简单地将其展平，您将获得在任何一年为NNN工作的人的简历&在YYY年的某个地方工作过处理简单的父子案例的另一种方法是平展您的文档，但要确保不同子级的值之间有一个很大的间隔，然后使用span query来防止多个子查询在子级之间匹配。有几年前的LinkedIn演示，但我没有找到它。你也可以使用新的BlockJoinQuery；我在这里的一篇博文中对此进行了描述：使用joinutil。它允许查询时间连接请参阅：在Lucene的顶部有一些实现，可以在几个不同的索引之间实现这种连接。Numere（）启用该功能，并使其能够作为RDBMS结果集获取结果。下面是一个示例结果: <?xml version='1.0' encoding='UTF-8' standalone='yes' ?> <DATAPACKET Version="2.0"> <METADATA> <FIELDS> <FIELD attrname="type" fieldtype="string" WIDTH="20" /> <FIELD attrname="category" fieldtype="string" WIDTH="20" /> <FIELD attrname="sales" fieldtype="i8" /> <FIELD attrname="total" fieldtype="i4" /> </FIELDS> <PARAMS /> </METADATA> <ROWDATA> <ROW type="Book" category="stand" sales="127003304" total="2" /> <ROW type="Computer" category="eletronic" sales="44765715835" total="896" /> <ROW type="Meat" category="food" sales="3193526428" total="110" /> 。。。继续稍晚一点，但您可以使用包org.apache.lucene.search.join：根据他们的文件：索引加入时间支持在搜索时加入，其中已加入将文档索引为单个文档块，使用 IndexWriter.addDocuments（） Lucene不是关系数据库。你是什么意思？@skaffman-我使用Lucene作为MS-SQL数据库的全文搜索引擎。我突然想到，如果我只是将所有数据存储在Lucene中的字段中，就可以完全取消SQL DB。然而，要做到这一点，我需要某种方式将文档连接在一起。Norris你不能这样做，你需要将关系扁平化，因为lucene不处理文档之间的关系。据我所知，BlockJoinQuery的问题之一是，当包含的父级或子级之一发生更改时，块需要对块内的元素进行完全重新索引。这正是我希望通过“连接概念”避免的事情。因此，BlockJoinQuery创建了更大的查询灵活性，但没有提供更原子化的存储概念。 select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total" from a (index) inner join b (index) on (a.seq_id = b.seq_id) group by a.type, b.category order by a.type asc, b.category asc Join join = RequestFactory.newJoin(); // inner join a.seq_id = b.seq_id join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER); // left { Request left = join.left(); left.repository(UtilTest.getPath("indexes/md/master")); left.addColumn("type").textType().asc(); left.addMeasure("value").alias("sales").intType().sum(); } // right { Request right = join.right(); right.repository(UtilTest.getPath("indexes/md/detail")); right.addColumn("category").textType().asc(); right.addMeasure("product_id").intType().alias("total").count_distinct(); } Processor processor = ProcessorFactory.newProcessor(); try { ResultPacket result = processor.execute(join); System.out.println(result); } finally { processor.close(); } <?xml version='1.0' encoding='UTF-8' standalone='yes' ?> <DATAPACKET Version="2.0"> <METADATA> <FIELDS> <FIELD attrname="type" fieldtype="string" WIDTH="20" /> <FIELD attrname="category" fieldtype="string" WIDTH="20" /> <FIELD attrname="sales" fieldtype="i8" /> <FIELD attrname="total" fieldtype="i4" /> </FIELDS> <PARAMS /> </METADATA> <ROWDATA> <ROW type="Book" category="stand" sales="127003304" total="2" /> <ROW type="Computer" category="eletronic" sales="44765715835" total="896" /> <ROW type="Meat" category="food" sales="3193526428" total="110" /> String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index String toField = "to"; // Name of the to field ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join. Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode); TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher // Render topDocs...