Lucene 读取特定文档的术语向量_Lucene - Fatal编程技术网

Lucene 读取特定文档的术语向量

lucene

Lucene 读取特定文档的术语向量,lucene,Lucene,有没有办法读取文档的术语向量以及每个术语的位置在创建索引期间，我正在启用位置、频率等 FieldType fieldType = new FieldType(); fieldType.setStoreTermVectors(true); fieldType.setStoreTermVectorPositions(true); fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS

有没有办法读取文档的术语向量以及每个术语的位置

在创建索引期间，我正在启用位置、频率等

        FieldType fieldType = new FieldType();
        fieldType.setStoreTermVectors(true);
        fieldType.setStoreTermVectorPositions(true);
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        fieldType.setStored(true);

在读取搜索索引时，我使用

术语termVector=indexReader.getTermVector（docId，“内容”）； TermsEnum TermsEnum=termVector.iterator（）

termsEnum似乎未定位，我不确定如何获取文档中每个术语的位置值

感谢任何人在这方面的帮助。

我想，稍微沮丧一下可能会解决你的问题。我的lucene版本是3.6.2。下面的代码是用Scala编写的

假设您在一个文档的内容字段中有“我们是我们不爱的家人”，并且我们成功地匹配了该文档，那么我们开始获得每个术语的位置

val topDocs = iSearch.search("some query", 1).scoreDocs.toList topDocs.foreach { matched => val termVectors = indexReader.getTermFreqVector(matched.doc, "contents") // The field is added in document with TermVector.WITH_POSITIONS_OFFSETS, // better write some try..catch to make this more robust val tpvector = termVectors.asInstanceOf[TermPositionVector] val termAndPosition = termVectors.getTerms.toList.map { term => val indexOfTerm = termVectors.indexOf(term) //Returns an array of positions in which the term is found term -> tpvector.getTermPositions(indexOfTerm).toList } // Map(family -> List(2), love -> List(5), we -> List(0, 3)) println(termAndPosition.toMap) }
基本上，are这个词在索引过程中会被省略，因为它是一个停止词。返回的映射实际上是有意义的，术语we出现在位置0和3。如果要获取偏移量，请使用getOffsets方法术语位置向量供您使用
不管怎样，希望能有帮助

[cmd]相关文章推荐

在windows 7 cmd中打开文件 cmd

Cmd 批量重命名与文件夹名称相关的文件 cmd

如何使cmd脚本在运行后保持打开状态并接受新命令？ cmd

Cmd 将输出devcon.exe保存在.txt中 cmd

Cmd 为什么clearfsimport命令将文件添加到大小为零的源代码管理中？ cmd clearcase

Cmd ActiveMQ启动失败（IOException） cmd activemq

Cmd 如何列出没有访问权的所有目录和文件（包括子目录）？ cmd

Cmd 同时更改驱动器和目录的步骤 cmd

Cmd 在Windows 10中运行Innosetup安装程序 cmd inno-setup pascal

Cmd 使用Logstash获取异常发生时的特定行集 cmd logstash

Cmd gsutil cp命令错误，CommandException:没有匹配的URL: cmd

Cmd Windows 10将自定义字体添加到命令行 cmd fonts windows-10

Cmd 命令将具有所有依赖项的Nuget包下载到文件夹中 cmd nuget

Cmd Winrar在存档命令后删除文件 cmd automation

Cmd 有没有办法在Wix项目中指定命令行别名？ cmd wix

Cmd Eclipse基础命令行导入引发错误 cmd

Cmd CD—默认情况下将/d开关设置为打开问题: cmd parameters

随机文章推荐

Utf 8 IE8中的亚洲字符在服务器中被乱码；这是由于HTTP头内容类型造成的吗？ utf-8 internet-explorer-8

Utf 8 如何将ansi文本转换为utf8 utf-8 go

Utf 8 255中的SHA-1是什么？ utf-8

Utf 8 国际化（I18n）我的扩展名，翻译文件上载错误 utf-8 internationalization google-chrome-extension

Utf 8 设置UTF8时，Informix 4GL忽略窗口边框 utf-8

如何使用UTF-8（PHP站点）获取我的站点 utf-8

将图纸导出为UTF-8 CSV文件（使用Excel VBA） utf-8 excel vba

Utf 8 在strings.xml中显示为空白框的中文字符 utf-8

神秘的UTF-8类编码 utf-8

Utf 8 如何为iconv AIX安装不受支持的转换器？ utf-8

Utf 8 在Go中解组ISO-8859-1 XML输入 utf-8 character-encoding go

[lucene]相关推荐

斯芬克斯/Solr/Lucene/弹性关联
Lucene Solr Sphinx

Lucene 常用关键字/短语
Lucene Solr

Lucene Solr/Solrj：如何确定索引中的文档总数？
Lucene Solr

为什么我的Lucene.net搜索在搜索查询中对多个单词执行模糊搜索时失败？
Lucene

Lucene 爪哇突然死亡
Lucene Java

Lucene PatternTokenizerFactory和stopwords
Lucene Solr

lucene group by
Lucene

Lucene 是否基于多个字段和另一个字段的值进行前缀查询？
Lucene

如何使用Lucene查找行号或页码
Lucene

如何提取lucene查询的可重复部分？
Lucene

Lucene IndexSearcher始终返回20个ScoreDocs
Lucene

Lucene play framework 2的搜索模块
Lucene Playframework 2.0

Lucene 在ElasticSearch中，删除的停止词对得分的影响仍然很小
Lucene

使用Lucene'有什么好处吗；s更新的文档是否覆盖删除，然后添加？
Lucene

Neo4j中的Lucene索引无法按预期工作
Lucene Neo4j

带TopCoreDocCollector的Lucene分页
Lucene

Lucene Neo4J 1.9.1的替代索引提供程序
Lucene Neo4j

Lucene 弹性搜索聚合
Lucene

当查询较长时，如何提高Lucene的性能？
Lucene

Elasticsearch日期范围过滤器更改分数
Lucene

基于javaapi的Lucene图像搜索
Lucene

Lucene 增加Highlighter返回的文本的长度
Lucene

Lucene Hibernate搜索-自然语言搜索
Lucene

Lucene Hibernate搜索5.2+；facet字段的编程配置
Lucene

Lucene 我应该使用哪种字段类型来存储真/假值？
Lucene

Lucene 我可以测量给定字段的doc_值占用多少存储空间吗？
Lucene

Tags

Javafx 2 Ms Word Qt4 Sails.js Certificate Gulp Orientdb Devexpress Asp.net Core Mvc Rx Java Java Me Tabs Twitter Bootstrap 3 Network Programming Log4j Mariadb Apache Storm Intellij Idea Oracle10g Datetime Path Makefile Dependencies Hive Symfony Class Air Angular Dask String Office Js Testng Weblogic C# 3.0 Resharper Flask Node.js Groovy Unix Apache Flink Javafx Plot Nsis Sonarqube Memory Leaks Identityserver4 Internet Explorer Pdf Ag Grid Text Sap Collections Mdx Colors Facebook Vba Google Apps Script Data Structures Directx Formatting Vaadin Playframework 2.0 Osgi Windows Store Apps Parameters Drupal Ibm Midrange Lua C++ Vb6 Variables Phpstorm Google Visualization Navigation Zend Framework Jwt Webgl Iis 7 Memory Networking React Native Quickbooks Hibernate Browser Magento2 Dataframe Ionic Framework Angularjs Ubuntu File Upload Stripe Payments E Commerce Logic Aem Yii Ocaml Kernel Mobile Excel Formula Zsh Xamarin.forms Matlab Logging Neural Network Shell Doxygen Jmeter Azure Sql Database Activemq Python 3.x Compression Apache Flex Vim Binary Bootstrap 4 Webrtc Parallel Processing Listview Jsp Common Lisp Visual Studio Code Discord.js System Verilog Meteor Plsql Virtual Machine Search Corda Google Drive Api Three.js Latex Oauth Mysql Sprite Kit Exception Handling Next.js Soap Lambda Visual Studio 2015 Antlr4 Ios5 Requirejs Asynchronous Safari Post Cobol Grails Entity Framework Core Signalr Web Applications Php Windows Mobile Streaming Cloud Foundry Events Printing Configuration Git Smalltalk Ajax Struts2 Web Button Stanford Nlp Azure Active Directory Version Control Struct Liferay Android Ndk X86 Google Api Ios8 Geolocation Haskell Blockchain Apache Camel Python 2.7 Pentaho Outlook Openerp Web Crawler Elixir Rxjs Linkedin Robotframework For Loop Yaml Z3 Embedded Phantomjs Powershell Couchdb Cakephp User Interface Visual Studio 2010 Spring Boot Ravendb Angular Material Numpy Docusignapi Angular6

Copyright © 2024. All Rights Reserved by - Fatal编程技术网