Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从以特定字符串格式开头的列表中选择项目_Python_List_Formatting - Fatal编程技术网

Python 从以特定字符串格式开头的列表中选择项目

Python 从以特定字符串格式开头的列表中选择项目,python,list,formatting,Python,List,Formatting,我拥有的内容:我从一份PDF文件中获得的一份逐项列表,但是列表中的某些元素在列表的相邻元素中分布不正确 A = ["1. 100 Test.1; 200 Test.2; 300 ", "Test.3; 400 Test.4", "2. 500 Test.5; 600 Test.6;", "3. 700 Test.7; 800 Test.8; ", "900 T

我拥有的内容:我从一份PDF文件中获得的一份逐项列表,但是列表中的某些元素在列表的相邻元素中分布不正确

A = ["1. 100 Test.1; 200 Test.2; 300 ", 
     "Test.3; 400 Test.4", 
     "2. 500 Test.5; 600 Test.6;", 
     "3. 700 Test.7; 800 Test.8; ", 
     "900 Test.9; 1000 Test.10"]
我需要什么:以项目1、2、3等开头的列表,并将列表中的其他项目附加到列表的前一个元素:

B = ["1. 100 Test.1; 200 Test.2; 300 Test.3; 400 Test.4", 
     "2. 500 Test.5; 600 Test.6;", 
     "3. 700 Test.7; 800 Test.8; 900 Test.9; 1000 Test.10"]

我尝试过的内容:我希望找到一种方法来识别列表中格式为“X.X”的项目,但我运气不太好。我确实编写了一个循环,用于标识列表的元素是否以整数开头,但在类似于列表a的最后一个元素的情况下,这对我没有帮助。非常感谢任何帮助。

此解决方案将列表合并为单个文本字符串,然后使用它来查找要拆分的x.x模式

import re
import pprint

A = ["1. 100 Test.1; 200 Test.2; 300 ",
     "Test.3; 400 Test.4",
     "2. 500 Test.5; 600 Test.6;",
     "3. 700 Test.7; 800 Test.8; ",
     "900 Test.9; 1000 Test.10"]

# Combine list into a single string
text = "".join(A)

# Split the string into list elements based on desired pattern
lines = re.split(r'(\d\.\s)', text)

# Remove any blank lines
lines = [l for l in lines if l.strip()]

# Combine the line numbers and their matching strings back together
numbered_lines = []
for i in range(0, len(lines), 2):
    numbered_lines.append(lines[i] + lines[i+1])

# Print the results
pprint.pprint(numbered_lines)
输出:

❯ python main.py
['1. 100 Test.1; 200 Test.2; 300 Test.3; 400 Test.4',
 '2. 500 Test.5; 600 Test.6;',
 '3. 700 Test.7; 800 Test.8; 900 Test.9; 1000 Test.10']
更新:
将捕获组添加到regex以保留行号

感谢您提供的pprint示例;我以后一定要用它!