Python中的索引器，同时根据模式拆分输入文件_Python_Python 3.x

Python中的索引器，同时根据模式拆分输入文件

python python-3.x

Python中的索引器，同时根据模式拆分输入文件,python,python-3.x,Python,Python 3.x,代码试图根据分隔符分割文本数据，但我一直收到一个错误 Traceback (most recent call last): File "split.py", line 7, in <module> en_text = split_text[1].lstrip() IndexError: list index out of range 此代码的输入文件可在此处的找到。索引器的原因是当行没有分隔符时，拆分文本只有一个元素你必须处理这个案子。删除该行或选

代码试图根据分隔符分割文本数据，但我一直收到一个错误

Traceback (most recent call last):
  File "split.py", line 7, in <module>
    en_text = split_text[1].lstrip()
IndexError: list index out of range

此代码的输入文件可在此处的

找到。

索引器

的原因是当行没有分隔符时，

拆分文本

只有一个元素

你必须处理这个案子。删除该行或选择其他处理

如果该行有多个分隔符，则为另一种情况。Marat有一个很好的解决方案（参见编辑）

其他一些重构技巧：

在处理之前，不需要读取整个文件

要获得更快的处理速度，请不要多次打开和关闭文件

如果拆分结果包含新行字符，请使用调试器检查拆分结果

如果字符串末尾不需要任何空格，则可以

strip（）

strip（）删除新行字符中的所有空格，并使用'strip（'\n'）

然后为两个写的行添加新行，以保持它们相似

with open('mn_en_sentences_split.txtaa') as inputFile:
    with open("mn_out.txt", "w") as mn_out:
        with open("en_out.txt", "w") as en_out:
            for i in inputFile:
                split_text = map(lambda x:x.strip('\n'), i.split("+++++SEP+++++"))
                if len(split_text) < 2: continue  # drop line if no separator
                mn_out.write(split_text[0].rstrip() + "\n")
                en_out.write(split_text[1].lstrip() + "\n")

我不打算下载文件来读取它（奇怪的危险！），但是为了进一步调试它，我可以建议注入一些日志记录或打印语句，以便您确切地知道列表是什么样子。我个人会从快速

print（I）

和

print（split_text）

开始，您可以在

infle

上进行迭代。由于这就是问题的症结所在，您将希望看到您的拆分正在执行您真正认为的操作。如果

split\u text[1]

超出范围，则意味着

split（separator）

只返回了一个元素的列表，这意味着在那一行中找不到分隔符文本。将文件的10行放入问题中。你能解释问题是什么，以及这段代码是如何解决的吗？@JohnGordon我没有准确地阅读问题（深夜）。我完全错过了

索引器

with open('mn_en_sentences_split.txtaa') as inputFile:
    with open("mn_out.txt", "w") as mn_out:
        with open("en_out.txt", "w") as en_out:
            for i in inputFile:
                split_text = map(lambda x:x.strip('\n'), i.split("+++++SEP+++++"))
                if len(split_text) < 2: continue  # drop line if no separator
                mn_out.write(split_text[0].rstrip() + "\n")
                en_out.write(split_text[1].lstrip() + "\n")

with open('mn_en_sentences_split.txtaa') as inputFile, \
     open("mn_out.txt", "w") as mn_out, \
     open("en_out.txt", "w") as en_out:
    for line in inputFile:
        try:
            mn, en = line.strip('\n').split("+++++SEP+++++", 1)
            mn_out.write(mn.rstrip() + "\n")
            en_out.write(en.lstrip() + "\n")
        except ValueError:
           pass