在Python中运行Rftager_Python_Command Line_Nlp

在Python中运行Rftager

python command-line nlp

在Python中运行Rftager,python,command-line,nlp,Python,Command Line,Nlp,我想在Pyhton代码中使用RFTagger（）。我让它工作的唯一方法是： file = open("RFTagger/temp.txt", "w") file.write(text) file.close() test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8") from subprocess import check_output, run from n

我想在Pyhton代码中使用RFTagger（）。我让它工作的唯一方法是：

file = open("RFTagger/temp.txt", "w")
file.write(text)
file.close()
test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8")

from subprocess import check_output, run
from nltk.tokenize import sent_tokenize, word_tokenize

#run this once
run(["make"], cwd="RFTagger/src")

#run this for every text (text is a string)
file = open("RFTagger/temp.txt", "w")
file.write("\n\n".join("\n".join(word_tokenize(sentence, language='german')) for sentence in sent_tokenize(text, language='german')))
file.close()
test_tagged = check_output(["src/rft-annotate", "lib/german.par", "temp.txt"], cwd="RFTagger").decode("utf-8").split("\n")

有更简单/更快的方法吗？或者是否有类似的库可以提供相同的输出？我特别需要德语。

谢谢您的帮助：）

如果您这样运行，速度会快得多：

file = open("RFTagger/temp.txt", "w")
file.write(text)
file.close()
test_tagged = check_output(["cmd/rftagger-german", "temp.txt"], cwd="RFTagger").decode("utf-8")

from subprocess import check_output, run
from nltk.tokenize import sent_tokenize, word_tokenize

#run this once
run(["make"], cwd="RFTagger/src")

#run this for every text (text is a string)
file = open("RFTagger/temp.txt", "w")
file.write("\n\n".join("\n".join(word_tokenize(sentence, language='german')) for sentence in sent_tokenize(text, language='german')))
file.close()
test_tagged = check_output(["src/rft-annotate", "lib/german.par", "temp.txt"], cwd="RFTagger").decode("utf-8").split("\n")

我可以将每个文本的运行时间从大约40秒减少到1.5秒