在Python中运行Rftager

在Python中运行Rftager,python,command-line,nlp,Python,Command Line,Nlp,我想在Pyhton代码中使用RFTagger()。我让它工作的唯一方法是: file = open("RFTagger/temp.txt", "w") file.write(text) file.close() test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8") from subprocess import check_output, run from n

我想在Pyhton代码中使用RFTagger()。我让它工作的唯一方法是:

file = open("RFTagger/temp.txt", "w")
file.write(text)
file.close()
test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8")
from subprocess import check_output, run
from nltk.tokenize import sent_tokenize, word_tokenize

#run this once
run(["make"], cwd="RFTagger/src")

#run this for every text (text is a string)
file = open("RFTagger/temp.txt", "w")
file.write("\n\n".join("\n".join(word_tokenize(sentence, language='german')) for sentence in sent_tokenize(text, language='german')))
file.close()
test_tagged = check_output(["src/rft-annotate", "lib/german.par", "temp.txt"], cwd="RFTagger").decode("utf-8").split("\n")
有更简单/更快的方法吗?或者是否有类似的库可以提供相同的输出?我特别需要德语。
谢谢您的帮助:)

如果您这样运行,速度会快得多:

file = open("RFTagger/temp.txt", "w")
file.write(text)
file.close()
test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8")
from subprocess import check_output, run
from nltk.tokenize import sent_tokenize, word_tokenize

#run this once
run(["make"], cwd="RFTagger/src")

#run this for every text (text is a string)
file = open("RFTagger/temp.txt", "w")
file.write("\n\n".join("\n".join(word_tokenize(sentence, language='german')) for sentence in sent_tokenize(text, language='german')))
file.close()
test_tagged = check_output(["src/rft-annotate", "lib/german.par", "temp.txt"], cwd="RFTagger").decode("utf-8").split("\n")
我可以将每个文本的运行时间从大约40秒减少到1.5秒