Java 用于整个文本文件的OpenNLP句子检测API
以下是单个字符串的OpenNLP语句检测器API代码:Java 用于整个文本文件的OpenNLP句子检测API,java,string,file,opennlp,Java,String,File,Opennlp,以下是单个字符串的OpenNLP语句检测器API代码: package opennlp; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sent
package opennlp;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
public class SentenceDetector {
public static void main(String[] args) throws FileNotFoundException {
InputStream modelIn = new FileInputStream("en-sent.zip");
SentenceModel model = null;
try {
model = new SentenceModel(modelIn);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
String sentences[] = sentenceDetector.sentDetect(" First sentence. Second sentence.");
for(String str : sentences)
System.out.println(str);
}
}
现在我的问题是如何传递整个文本文件并执行句子检测,而不是单个字符串?简单的方法:将整个文件作为字符串读取,并以通常的方式对其进行pas。以下方法将文件内容读取为字符串:
public String readFileToString(String pathToFile) throws Exception{
StringBuilder strFile = new StringBuilder();
BufferedReader reader = new BufferedReader(new FileReader(pathToFile));
char[] buffer = new char[512];
int num = 0;
while((num = reader.read(buffer)) != -1){
String current = String.valueOf(buffer, 0, num);
strFile.append(current);
buffer = new char[512];
}
reader.close();
return strFile.toString();
}
将文件读入字符串,然后像以前一样继续。@andreThompson查看最后一行编辑+我希望你能得到答案。:)也许已经晚了,但对其他人可能有用;ApacheTika可用于从不同的文件类型中提取元数据和文本,尤其是您的案例中的文本文件。