Java 在Lucene中索引txt文件_Java_Api_Twitter_Indexing_Lucene

Java 在Lucene中索引txt文件

java api twitter indexing lucene

Java 在Lucene中索引txt文件,java,api,twitter,indexing,lucene,Java,Api,Twitter,Indexing,Lucene,我想创建一个小搜索引擎angine来搜索推文。我有一个包含20000条推文的txt文件。文件格式如下：托米法语185185170333395811123莫伊拉卢根，阿尔马。德里本周，我们在第一场比赛中倍感高兴四场欧冠比赛的进球者在商店里。冠军联赛伊姆阿凯17585170341430037122巴黎 @冠军联赛@AS_摩纳哥@AS_摩纳哥_EN no，这是当城市被淘汰出冠军联赛。等第一行是用户名，第二行是追随者，第二行是id和位置，最后一行是文本（tweet）我认为每一条推特都是一

我想创建一个小搜索引擎angine来搜索推文。我有一个包含20000条推文的txt文件。文件格式如下：

托米法语1
851
85170333395811123
莫伊拉卢根，阿尔马。德里
本周，我们在第一场比赛中倍感高兴四场欧冠比赛的进球者在商店里。冠军联赛

伊姆阿凯
175
85170341430037122
巴黎
@冠军联赛@AS_摩纳哥@AS_摩纳哥_EN no，这是当城市被淘汰出冠军联赛
。
等

第一行是

用户名

，第二行是

追随者

，第二行是

id

和

位置

，最后一行是

文本（tweet）

我认为每一条推特都是一份文件。所以我必须有20000个文件，每个文件必须有5个字段（用户名，追随者，id等）

如何制作索引

我看过一些教程，但没有找到类似的内容

编辑：这是我的代码

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class MyProgram {

    public static void main(String[] args) throws IOException, ParseException {
        FileReader fileReader = new FileReader(new File("myfile.txt"));
        BufferedReader br = new BufferedReader(fileReader);
        String line = null;

        String indexPath = "C:\\Desktop\\myfolder";
        Directory dir = FSDirectory.open(Paths.get(indexPath));

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);

        IndexWriter writer = new IndexWriter(dir, iwc);


        while ((line = br.readLine()) != null) {
            // reading lines until the end of the file
            Document doc = new Document();
            String username = br.readLine();
            doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED));  // adding title field
            String followers = br.readLine();
            doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED));
            String id = br.readLine();
            doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED));
            String location = br.readLine();
            doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED));
            String text = br.readLine();
            doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
            writer.addDocument(doc);  // writing new document to the index


            br.readLine();
         }

    }
}

我收到以下错误：

索引无法解析或不是字段

如何解决这个问题？

很难从您的问题中解释您实际上面临的是编译时错误而不是运行时错误

我必须复制您的代码，以了解它是

字段.Index.ANALYZED

参数上的编译时错误

在6.5.0中不再有这样的构造函数

这是人们使用顶级工具（如SOLR等）的原因之一，因为这些变化在低Lucene API中不断发生

无论如何，在上面的文档中，它也提到了

专家：直接为文档创建字段。大多数用户应该使用糖的一个子类：

对于您的情况，

TextField

和

StringField

是相关的类-两者之间有细微的区别

所以我会使用一个构造函数，比如-

newstringfield（fieldName，fieldValue，Store.YES）

等，而不是直接在

字段上执行
您可以使用字段
也可以像-新字段（字段名、字段值、字段类型）
一样，其中字段类型
是
您可以初始化FieldType
如-FieldType txtFieldType=newfieldtype（TextField.TYPE\u存储）或
FieldType strFieldType=新的字段类型（StringField.TYPE\u已存储）
等
总之，您在Lucene中创建字段
的方式在最近的版本中有所改变，因此请根据所使用的Lucene版本的文档创建您的字段
实例
类似-doc.add（新字段（“用户名”、用户名、新字段类型（TextField.TYPE_存储）））
等。
你所说的“索引”是什么意思，你想用它实现什么？我有一个项目要为20000条推文创建一个小型搜索机。索引过程是Lucene提供的核心功能之一。我必须阅读txt文件，每一条推特都必须是一个文档。然后每个文档都必须有用户名、id、位置等字段。我对hot有一个想法，但我是Lucene的初学者，我找不到类似的内容。你看过这个问题吗：@Ivan Priorin是的，我看过这个问题，但它是Lucene的旧版本。当前版本（Lucene 6.5.0）有许多更改。例如，我正在写这行代码IndexWriter writer=newindexwriter（index，analyzer，true，new IndexWriter.MaxFieldLength（25000））和我得到一个错误。在旧版本中，这一行很好如果你想得到高质量的答案，一定要在你的问题中提到这是一个编译时错误以及在哪一行。从你的问题来看，你不清楚静态代码会遇到什么样的错误。一旦你这样做了，你的问题就有效了。