用python将文本文件解析为列表_Python_Parsing

用python将文本文件解析为列表

python parsing

用python将文本文件解析为列表,python,parsing,Python,Parsing,所以我有一个生成的文本文件，我想把它解析成几个日期列表。我知道每个“组”有一个日期，但我意识到每个组可能需要处理多个日期值。我的.txt文件如下所示： DateGroup1 20191129 20191127 20191126 DateGroup2 20191129 20191127 20191126 DateGroup3 2019-12-02 DateGroup4 2019-11-27 DateGroup5 2019-11-27 理想情况下，我可以将其分解为5个列表，其中包括每组的日期。我

所以我有一个生成的文本文件，我想把它解析成几个日期列表。我知道每个“组”有一个日期，但我意识到每个组可能需要处理多个日期值。我的.txt文件如下所示：

DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27

理想情况下，我可以将其分解为5个列表，其中包括每组的日期。我被难住了

这里有一个例子，你可以用它来构建，每次它读取一个字符串而不是一个数字，它就会生成一个新的列表，并将该组下的所有日期都放在其中

import os

#read file
lineList = 0
with open("test.txt") as f:
  lineList = f.readlines()

#make new list to hold variables
lists = []

#loop through and check for numbers and strings
y=-1
for x in range(len(lineList)):
    #check if it is a number or a string
    if(lineList[x][0] is not None and not lineList[x][0].isdigit()):
        #if it is a string make a new list and push back the name
        lists.append([lineList[x]])
        y+=1
    else:
        #if it is the number append it to the current list
        lists[y].append(lineList[x])

#print the lists
for x in lists:
    print(x)

从阅读整个文本文件开始。然后，您可以计算“日期组”的出现次数，这似乎是日期组分隔中不变的部分。然后，您可以通过遍历任意两个“日期组”标识符之间或一个“日期组”标识符与文件结尾之间的所有数据来解析文件。尝试理解以下代码并在此基础上构建应用程序：

file = open("dates.txt")
text = file.read()
file.close()

amountGroups = text.count("DateGroup")

list = []

index = 0
i = 0
for i in range(amountGroups):
    list.append([])

    index = text.find("DateGroup", index)
    index = text.find("\n", index) + 1
    indexEnd = text.find("DateGroup", index)
    if(indexEnd == -1):
        indexEnd = len(text)
    while(index < indexEnd):
        indexNewline = text.find("\n", index)
        list[i].append(text[index:indexNewline])
        index = indexNewline + 1

print(list)

file=open（“dates.txt”）
text=file.read（）
file.close（）文件
amountGroups=text.count（“日期组”）
列表=[]
索引=0
i=0
对于范围内的i（数量组）：
list.append（[]）
index=text.find（“日期组”，索引）
索引=文本。查找（“\n”，索引）+1
indexEnd=text.find（“日期组”，索引）
如果（indexEnd==-1）：
indexEnd=len（文本）
而（指数<指数）：
indexNewline=text.find（“\n”，索引）
列表[i]。追加（文本[index:indexNewline]）
索引=索引换行+1
打印（列表）

只需在每行上循环，检查将分组数据的密钥，删除换行符并存储每个新日期

DATE_GROUP_SEPARATOR = 'DateGroup'
sorted_data = {}

with open('test.txt') as file:
    last_group = None
    for line in file.readlines():
        line = line.replace('\n', '')
        if DATE_GROUP_SEPARATOR in line:
            sorted_data[line] = []
            last_group = line
        else:
            sorted_data[last_group].append(line)

for date_group, dates in sorted_data.items():
    print(f"{date_group}: {dates}")

这第一部分只是展示如何将包含数据的字符串视为来自文件。如果您不希望生成OP的实际文件，但希望在编辑器中以可见的方式导入数据，那么这会有所帮助

import sys
from io import StringIO  # allows treating some lines in editor as if they were from a file)

dat=StringIO("""DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27""")

lines=[ l.strip() for l in dat.readlines()]    
print(lines)

输出：

   ['DateGroup1', '20191129', '20191127', '20191126', 'DateGroup2', '20191129', '20191127', '20191126', 'DateGroup3', '2019-12-02', 'DateGroup4', '2019-11-27', 'DateGroup5', '2019-11-27']

 [['2019-11-27'],
 ['2019-11-29', '2019-11-27', '2019-11-26'],
 ['2019-11-29', '2019-11-27', '2019-11-26'],
 ['2019-12-02'],
 ['2019-11-27'],
 ['2019-11-27']]

现在，一种可能的方法可以生成所需的列表列表，同时确保涵盖两种可能的日期格式：

from datetime import datetime
b=[]
for i,line in enumerate(lines):
    try:             # try first dateformat
        do = datetime.strptime(line, '%Y%m%d')
        a.append(datetime.strftime(do,'%Y-%m-%d'))
    except:
        try:         # try second dateformat
            do=datetime.strptime(line,'%Y-%m-%d')
            a.append(datetime.strftime(do,'%Y-%m-%d'))
        except:       # if neither date, append old list to list of lists  & make a new list
            if a!=None:
                b.append(a)
            a=[]
    if i==len(lines)-1:
        b.append(a)

b

输出：

   ['DateGroup1', '20191129', '20191127', '20191126', 'DateGroup2', '20191129', '20191127', '20191126', 'DateGroup3', '2019-12-02', 'DateGroup4', '2019-11-27', 'DateGroup5', '2019-11-27']

 [['2019-11-27'],
 ['2019-11-29', '2019-11-27', '2019-11-26'],
 ['2019-11-29', '2019-11-27', '2019-11-26'],
 ['2019-12-02'],
 ['2019-11-27'],
 ['2019-11-27']]

也可以帮助解析此文本，以下是示例模板，其中包含如何运行它的代码：

from ttp import ttp

data_to_parse = """
DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27
"""

ttp_template = """
<group name="date_groups.date_group{{ id }}">
DateGroup{{ id }}
{{ dates | to_list | joinmatches() }}
</group>
"""

parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()
print(parser.result(format="json")[0])

请显示您的代码尝试，并清楚地显示所需的输出。不幸的是，“我被难住了”并不是一个我们可以解决的问题。看见