如何在python中从混合txt文件中读取数字_Python_List_Text_Types_Reader

如何在python中从混合txt文件中读取数字

python list text types

如何在python中从混合txt文件中读取数字,python,list,text,types,reader,Python,List,Text,Types,Reader,我有一个由文本和数字组成的txt文件。它看起来像这样： > this is a paragraph which is introductory which lasts some more lines text text text 567 45 32 468 974 35 3578 4467 325 765 355 5466 text text text 1 3 6 text text> 我需要的是存储包含4个数字元素的行当我使用read命令时，所有元素都被读取并存储为

我有一个由文本和数字组成的txt文件。它看起来像这样：

> this is a paragraph which is introductory which lasts
  some more lines 

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text>

我需要的是存储包含4个数字元素的行

当我使用read命令时，所有元素都被读取并存储为字符串。我不确定我是否可以在不过滤的情况下将数字转换成数字

我将感谢任何帮助。

谢谢。

逐行阅读文件，并进行分析。跳过包含不相等4个元素的行和不包含4个空格分隔整数的行：

results = []
with open (filename) as f:
    for line in f:
        line = line.strip().split()
        if len(line) != 4:
            continue  # line has != 4 elements

        try: 
            numbers = map(int,line)
        except ValueError:
            continue # line is not all numbers

        # do something with line
        results.append(line)  # or append(list(numbers)) to add the integers

print(*results, sep="\n")

印刷品：

['567', '45', '32', '468']
['974', '35', '3578', '4467']
['325', '765', '355', '5466']

使用splitlines（）函数

A=open(your file here,'r').read().splitlines()

这将是一个列表，现在您可以提取您需要的任何内容。比如：

Req=[]
对于我来说，在一个：
elem=[s.isnumeric（）表示i.split（“”）中的s]
如果len（elem）=4且全部（elem）：
请求追加（一）

如果您可以假设所需的行只有4个数字，则此解决方案应该有效：


nums = []
with open('filename.txt') as f:
    for line in f:
        line = line.split()
        if len(line) == 4 and all([c.isdigit() for c in line]):
            # use [float(c) for c in line] if needed
            nums.append([int(c) for c in line])

print(nums)

对我来说，这听起来像是

re

模块的任务。我会：

import re
with open('yourfile.txt', 'r') as f:
    txt = f.read()
lines_w_4_numbers = re.findall(r'^\d+\s\d+\s\d+\s\d+$', txt, re.M)
print(lines_w_4_numbers)

输出：

['567 45 32 468', '974 35 3578 4467', '325 765 355 5466']

说明：

re.M

标志意味着

和

将匹配行首/行尾，

\s

表示空白，

\d+

表示一个或多个数字。

因此，您要查找的子字符串正好包含四个由空格分隔并以换行结束的整数。可以使用正则表达式来定位遵循此模式的子字符串。假设您将字符串存储在变量

中：

重新导入
matches=[m[0]表示re.findall（r“（\d+\s{4}）”，s）中的m

matches

变量现在包含正好包含四个整数的字符串。之后，如果需要，可以拆分每个字符串并转换为整数：

matches=[[int（i）表示s中的i.split（“”）]表示s表示匹配]

结果:

[[567,45,32468]，[974,3535784467]，[325765355466]]

如果您知道如何使用python正则表达式模块，您可以这样做：

重新导入
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
打开（测试_文件，'r'）作为文件_1：
对于文件_1.readlines（）中的行：
如果重新匹配（r'（\d+\s）{4}'，行）：
line=line.strip（）#删除\n字符
打印（行）#只打印四个数字的行

文件示例的结果是：

567432468
974 35 3578 4467
325 765 355 5466

在这里使用正则表达式将是最强大的。我们使用re.compile创建一个模式，然后使用search或match方法匹配字符串中的模式

import re

p = re.compile(r'[\d]{4}') # \d matches for single digit and {4} will look for 4 continuous occurrences.
file = open('data.txt', 'r') # Opening the file
line_with_digits = [] 
for line in file:  # reading file line by line
    if p.search(line): # searching for pattern in line
        line_with_digits.append(line.strip())  # if pattern matches adding to list

print(line_with_digits)

上述程序的输入文件为：

text text text

567 45 32 468
974 35 3578 4467
325 765 355 5466

text text text
1 3 6
text text

text  5566 text 45 text
text text 564 text 458 25 text

输出为：

['974 35 3578 4467', '325 765 355 5466', 'text  5566 text 45 text']

希望这有帮助。

您可以使用正则表达式：

import re

result = []
with open('file_name.txt') as fp:
    for line in fp.readlines():
        if re.search(r'\d{4}', line):
            result.append(line.strip())

print(result)

输出：

['974 35 3578 4467', '325 765 355 5466']

“123 321”.isnumeric（）

的计算结果不为

true

，因此这不是查找带数字行的有效方法。你必须拆分每一行并检查每个元素的数值。我的答案对你有帮助吗？如果有，别忘了勾选有帮助的答案