如何修复此代码并制作我自己的POS标签？（PYTHON）_Python_Nlp_Pos Tagger

如何修复此代码并制作我自己的POS标签？（PYTHON）

python nlp

如何修复此代码并制作我自己的POS标签？（PYTHON）,python,nlp,pos-tagger,Python,Nlp,Pos Tagger,我的程序需要读取包含句子的文件，并生成如下输出：输入：IxéMaria。输出：Ixé\PRON Maria\N-PR 直到现在，我写了这个，但是outfile给了我一个空的文本文件。（请给我一些建议）： infle=open（'corpus_test.txt'，'r'，encoding='utf-8'）。read（） outfile=open（'tag_test.txt'，'w'，encoding='utf-8'） Diciario={'mimbira'：'N'， “anama itá”：“

我的程序需要读取包含句子的文件，并生成如下输出：

输入：IxéMaria。输出：Ixé\PRON Maria\N-PR

直到现在，我写了这个，但是outfile给了我一个空的文本文件。（请给我一些建议）：

infle=open（'corpus_test.txt'，'r'，encoding='utf-8'）。read（）
outfile=open（'tag_test.txt'，'w'，encoding='utf-8'）
Diciario={'mimbira'：'N'，
“anama itá”：“N-PL”，
‘玛丽亚’：‘N-PR’，
“sumuara kunhã”：“N-FEM”，
“sumuara kunhã-itá”：“N-FEM-PL”，
“sapukaia Apigua”：“N-MASC”，
“sapukaia Apigua itá”：“N-MASC-PL”，
“nhaã”：“DEM”，
“nhaã-itá”：“DEM-PL”，
“ne”：“POS”，
‘mukuĩ’：‘NUM’，
“muíri”：“QUANT”，
‘iepé’：‘INDF’，
“皮拉苏阿”：“A1”，
‘pusé’：‘A2’，
“ixé”：“PRON1”，
‘se’：‘PRON2’，
“.；”：“点刺”
}
np_words=diciario.keys（）
np_tags=diciario.values（）
对于infle.splitlines（）中的行：
单词列表=line.split（）
如果np_单词列表中的np_单词：
标签单词=单词列表。索引（np单词）+1
word\u taged=单词列表。插入（tag\u word，f'\{np\u tags}'）
word_taged=“”.加入（word_taged）
打印（word_标记，文件=输出文件）
outfile.close（）

简单地从NLP开始可以更容易理解和欣赏更先进的系统

这将为您提供所需的：

# Use 'with' so that the file is automatically closed when the 'with' ends.
with open('corpus_test.txt', 'r', encoding='utf-8') as f:
    # splitlines is not a method, readlines is.
    # infile will contain a list, where each item is a line.
    # e.g. infile[0] = line 1.
    infile = f.readlines()

dicionario = {
    'Maria': 'N-PR',
    'ixé': 'PRON1',
}

# Make a list to hold the new lines
outlines = []

for line in infile:
    list_of_words = line.split()
    
    new_line = ''
    # 'if np_words in list_of_words' is asking too much of Python.
    for word in list_of_words:
        # todo: Dictionaries are case-sensitive, so ixé is different to Ixé.
        if word in dicionario:
            new_line += word + '\\' + dicionario[word] + ' '
        else:
            new_line += word + ' '

    # Append the completed new line to the list and add a carriage return.
    outlines.append(new_line.strip() + '\n')

with open('tag_test.txt', 'w', encoding='utf-8') as f:
    f.writelines(outlines)

欢迎来到堆栈溢出。请阅读并解释代码的问题所在-“它不工作”对我们没有帮助。也请将其打印到屏幕上。。。如果它打印到屏幕上，而不是用你自己的话来说会很奇怪的文件，那么如果np_单词列表中的np_单词：要做什么呢？@DominickMaia顺便说一句，这是什么语言？反斜杠作为POS分隔符是不寻常的。使用与其他POS标记器相同的输出格式可能是一个好主意，这样您就不必为自己的特殊输出格式开发单独的解析器。太棒了！！！非常感谢，先生！谢谢你的评论和解释。这正是我需要的。愿上帝保佑你