Python 计算文本文件中每个字符串后面包含数字的行数_Python

Python 计算文本文件中每个字符串后面包含数字的行数

python

Python 计算文本文件中每个字符串后面包含数字的行数,python,Python,我想计算每个字符串后面包含数字的行数。如果存在连续字符串，则0将影响到第一个字符串，依此类推例如，假设我有一个文本文件： some text 120 130 1847 1853 other text 207 220 text 306 350 text with no numbers after some other text 400 435 900 121 125 369 我的输出如下： 2 1 1 0 3 我有一个包含文件的目录，我想将每个文件的结果存储在一个列表中，这样我就有了一个列表

我想计算每个字符串后面包含数字的行数。如果存在连续字符串，则0将影响到第一个字符串，依此类推

例如，假设我有一个文本文件：

some text
120 130
1847 1853
other text
207 220
text
306 350
text with no numbers after
some other text
400 435
900 121
125 369

我的输出如下：

我有一个包含文件的目录，我想将每个文件的结果存储在一个列表中，这样我就有了一个列表列表

以下是我尝试过的：

nb=[]  
c = 0
for filename in sorted(os.listdir("Path_to_txt_files")):
        with open(filename ,'r') as f:
            for line in f:
                if line.strip().replace(" ", "").isdigit(): 
                    c+=1                        
                    nb.append(c)
                else:
                    c=0
                    nb.append(c)

但这给了我一个错误的结果。如何编写代码？

您可以这样做：

# sample file
f = '''some text
120 130
1847 1853
other text
207 220
text
306 350
text with no numbers after
some other text
400 435
900 121
125 369'''

lines = f.split('\n')

line2write = []

for line in lines:
    if not line[0].isdigit():
        line2write.append(0)
    else:
        line2write[-1] += 1
print(line2write)

输出：

[2, 1, 1, 0, 3]

现在，您可以随心所欲地编写它。

使用

itertools.groupby

Ex:

from itertools import groupby


result = []
with open(filename) as infile:
    for k, v in groupby(infile.readlines(), lambda x: x[0].isalpha()):
        value = list(v)
        if k and len(value) > 1:
            result.append(0)
        if not k:
            result.append(len(value))

[2, 1, 1, 0, 3]

输出：

from itertools import groupby


result = []
with open(filename) as infile:
    for k, v in groupby(infile.readlines(), lambda x: x[0].isalpha()):
        value = list(v)
        if k and len(value) > 1:
            result.append(0)
        if not k:
            result.append(len(value))

[2, 1, 1, 0, 3]

按注释编辑

result = []
for filename in list_files:
    temp = []
    with open(filename) as infile:
        for k, v in groupby(infile.readlines(), lambda x: x[0].isalpha()):
            value = list(v)
            if k and len(value) > 1:
                temp.append(0)
            if not k:
                temp.append(len(value))
    result.append(temp)

有效地利用发电机功能：

def count_digit_lines(filename):
    with open(filename) as f:
        str_cnt = num_cnt = 0

        for line in f:
            line = line.strip().replace(" ", "")
            if not line:    # skip empty lines
                continue
            if not line.isdigit():   # catch non-digit line
                if num_cnt >= 1:
                    yield num_cnt
                    str_cnt = num_cnt = 0
                str_cnt += 1
            else:
                if str_cnt > 1:
                    yield 0
                    str_cnt = 0
                num_cnt += 1
        if num_cnt:    # check trailing digit lines
            yield num_cnt
        elif str_cnt:
           yield 0


res = []
for fname in sorted(os.listdir("Path_to_txt_files")):
    gen = count_digit_lines(fname)  # generator
    res.append(list(gen))

print(res)

单个文件的示例输出为：

[[2, 1, 1, 0, 3]]

一种选择是使用正则表达式在每行中搜索一个数字。以下是文档：在我看来，这个解决方案是错误的。isalpha不区分“120”和“ABC；”因为它把数字和特殊符号看成是same@Martin. 我实际上只测试了第一个字符

x[0].isalpha（）

无论如何都不是完整的字符串，为什么不是isdigit？@Rakesh我有一堆文件要处理。我如何将每个结果存储在一个列表中，以便最终得到一个列表列表？例如：[[2,1,1,0,3]，[0,1,1,2,1]，[3,1,1,1,2,1]]这个解决方案甚至在遇到新行时识别“number”，因为带isalpha的“/n”返回false如果文件末尾有带字符串的行，后面没有数字，则会得到false结果。@MejdiDallel，我刚刚检查了您提到的情况：added“文件末尾有字符串的行，后面没有数字”-并且它给出了相同的正确结果。或者您是指结尾有几行文本？在这种情况下，它将返回0，而不会返回任何。@MejDiAllel。好的，为了覆盖边缘情况-添加额外的条件分支，请参阅我的更新