在python中使用正则表达式拆分数据_Python_Regex

在python中使用正则表达式拆分数据

python regex

在python中使用正则表达式拆分数据,python,regex,Python,Regex,我有很多行的文件。格式如下： //many lines of normal text 00.0000125 1319280 9.2 The Shawshank Redemption (1994) //lines of text 0000011111 59 6.8 "$#*! My Dad Says" (2010) {You Can't Handle the Truce (#1.10)} 1...101002 17 6.6

我有很多行的文件。格式如下：

//many lines of normal text

      00.0000125  1319280   9.2  The Shawshank Redemption (1994)
//lines of text
      0000011111      59   6.8  "$#*! My Dad Says" (2010) {You Can't Handle the Truce (#1.10)}
      1...101002      17   6.6  "$1,000,000 Chance of a Lifetime" (1986)

我想将数据拆分为列

1…101002,17,6.6，“$1000000一生的机会”（1986）

我试过的程序是

import re
f = open("E:/file.list");
reg = re.compile('[+ ].{10,}[+ ][+0-9].{3,}[+ ]')
for each in f:
if reg.match(each):
    print each
    print reg.split(each)

它没有给出正确的答案。我可以知道要使用的正则表达式吗。

在这种情况下，匹配比拆分更容易

^\s*(\S+)\s+(\S+)\s+(\S+)\s+(.*)$

试试这个。看演示

在这种情况下，匹配比拆分更容易

^\s*(\S+)\s+(\S+)\s+(\S+)\s+(.*)$

试试这个。看演示

我改变了正则表达式的模式

import re
f = open("file.txt");

reg = re.compile(r"      (.{10}) *(\d*) *(\d*\.\d*) (.*)")
for each in f:
    if reg.match(each):
        print each
        print reg.split(each)

我改变了正则表达式的模式

import re
f = open("file.txt");

reg = re.compile(r"      (.{10}) *(\d*) *(\d*\.\d*) (.*)")
for each in f:
    if reg.match(each):
        print each
        print reg.split(each)

像这样的怎么样

>>> str='1...101002      17   6.6  "$1,000,000 Chance of a Lifetime" (1986)'
>>> re.findall(r'^([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+(.*)', str)
[('1...101002', '17', '6.6', '"$1,000,000 Chance of a Lifetime" (1986)')]

像这样的怎么样

>>> str='1...101002      17   6.6  "$1,000,000 Chance of a Lifetime" (1986)'
>>> re.findall(r'^([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+(.*)', str)
[('1...101002', '17', '6.6', '"$1,000,000 Chance of a Lifetime" (1986)')]

首先，通过

split（）

函数拆分行，然后将拆分列表（使用）从列表的前导切到括号中有数字的位置（

如果re.match（r'\（\d+\），j）

）：

如果列表中有行（使用

readlines（）

）读取文件）：

首先，通过

split（）

函数拆分行，然后将拆分列表（使用）从列表的前导切到括号中有数字的位置（

如果re.match（r'\（\d+\），j）

）：

如果列表中有行（使用

readlines（）

）读取文件）：

你能展示你的预期产出吗？我已经对答案发表了评论。你能展示你的预期产出吗？我已经对答案发表了评论。

[''0000000 125'，'1319280'，'9.2'，'the'，'Shawshank'，'Redemption'，'（1994）'，][/code>是我第一行的产出，但我想要['0000000 125'，'1319280'，'9.2'，'Everything This']
谢谢。类似这样的行“通过FTP获得此帖子的详细信息也在下面给出”。他们也在分裂。有没有办法避免这种情况？@user168983试试“^\s*（？！[a-zA-Z]）（\s+）\s+（\s+）\s+（\s+）\s+（.*）$[”，“0000000 125”，“1319280”，“9.2”，“The”，“Shawshank”，“Redemption”，“The”（1994）”，][/code>是我第一行的输出，但我想要['0000000 125”，“1319280”，“9.2”，“其他一切”]谢谢。类似这样的行“通过FTP获得此帖子的详细信息也在下面给出”。他们也在分裂。有什么办法可以避免吗？@user168983试试“^\s*（？！[a-zA-Z]）（\s+）\s+（\s+）\s+（\s+）\s+（*$
>>> lines = ["""00.0000125  1319280   9.2  The Shawshank Redemption (1994)""","""0000011111      59   6.8  "$#*! My Dad Says" (2010) {You Can't Handle the Truce (#1.10)}""", """1...101002      17   6.6  "$1,000,000 Chance of a Lifetime" (1986)"""]

>>> [list(islice(line.split(),0,i+1)) for line in lines for i,j in enumerate(line.split()) if re.match(r'\(\d+\)',j)]
[['00.0000125', '1319280', '9.2', 'The', 'Shawshank', 'Redemption', '(1994)'], ['0000011111', '59', '6.8', '"$#*!', 'My', 'Dad', 'Says"', '(2010)'], ['1...101002', '17', '6.6', '"$1,000,000', 'Chance', 'of', 'a', 'Lifetime"', '(1986)']]