在Python中读取和分组数据列表_Python_List

在Python中读取和分组数据列表

python list

在Python中读取和分组数据列表,python,list,Python,List,我一直在努力管理一些数据。我把数据转换成一个列表，每个基本子列表都有如下结构 <1x>begins <2x>value-1 <3x>value-2 <4x>value-3 some indeterminate number of other values <1y>next observation begins <2y>value-1 <3y>value-2 <4y>value-3 some ind

我一直在努力管理一些数据。我把数据转换成一个列表，每个基本子列表都有如下结构

<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
 some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3
 some indeterminate number of other values

开始
价值-1
价值-2
价值-3
一些不确定数量的其他值
下一个观察开始
价值-1
价值-2
价值-3
一些不确定数量的其他值

这会在每个子列表中持续不确定的次数

编辑我需要获得所有出现的您已经有了一个良好的开端，注意到您最初的解决方案可能有效，但缺乏优雅

您应该在循环中解析字符串，为每一行创建一个新变量。下面是一些示例代码：

import re

s = """<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
 some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3"""
firstMatch = re.compile('^\<1x')
numMatch = re.compile('^\<(\d+)')
listIneed = []
templist = None
for line in s.split():
        if firstMatch.match(line):
                if templist is not None: 
                        listIneed.append(templist)
                templist = [line]
        elif numMatch.match(line):
            #print 'The matching number is %s' % numMatch.match(line).groups(1)
            templist.append(line)
if templist is not None: listIneed.append(templist)

print listIneed

重新导入
s=“”开始
价值-1
价值-2
价值-3
一些不确定数量的其他值
下一个观察开始
价值-1
价值-2
价值-3“”
firstMatch=re.compile（“^\注意到原来的解决方案可能有效，但缺乏优雅，这是一个良好的开端
您应该在循环中解析字符串，为每一行创建一个新变量。
下面是一些示例代码：
import re

s = """<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
 some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3"""
firstMatch = re.compile('^\<1x')
numMatch = re.compile('^\<(\d+)')
listIneed = []
templist = None
for line in s.split():
        if firstMatch.match(line):
                if templist is not None: 
                        listIneed.append(templist)
                templist = [line]
        elif numMatch.match(line):
            #print 'The matching number is %s' % numMatch.match(line).groups(1)
            templist.append(line)
if templist is not None: listIneed.append(templist)

print listIneed

重新导入
s=“”开始
价值-1
价值-2
价值-3
一些不确定数量的其他值
下一个观察开始
价值-1
价值-2
价值-3“”
firstMatch=re.compile（“^\如果您想挑选每个子列表的第二、第三和第四个元素，这应该可以：
listINeed = [sublist[1:4] for sublist in biglist]

如果要选择每个子列表的第二个、第三个和第四个元素，这应该可以：
listINeed = [sublist[1:4] for sublist in biglist]

我能帮你渡过难关
itertools.groupby(biglist, operator.itemgetter(2))

我能帮你渡过难关
itertools.groupby(biglist, operator.itemgetter(2))

如果我正确理解了你的问题：
import re
def getlines(ori):
    matches = re.finditer(r'(<([1-4])[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

重新导入
def getlines（ori）：
matches=re.finditer（r'（.*），ori）
mainlist=[]
子列表=[]
对于比赛中的sr：
如果int（sr.groups（）[1]）==1：
如果子列表！=[]：
mainlist.append（子列表）
子列表=[]
其他：
sublist.append（sr.groups（）[0]）
其他：
mainlist.append（子列表）
返回主列表

…如果你想使用正则表达式，我会帮你完成这项工作
下面的版本会将所有数据分解为子列表（而不仅仅是每个分组中的前四个），这可能更有用，具体取决于您需要对数据执行的其他操作。使用David的listINeed=[sublist[1:4]for sublist in biglist]可以从上述特定任务的每个列表中获取前四个结果
import re
def getlines(ori):
    matches = re.finditer(r'(<(\d*)[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            print "1 found!"
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

重新导入
def getlines（ori）：
matches=re.finditer（r'（.*），ori）
mainlist=[]
子列表=[]
对于比赛中的sr：
如果int（sr.groups（）[1]）==1：
打印“找到1！”
如果子列表！=[]：
mainlist.append（子列表）
子列表=[]
其他：
sublist.append（sr.groups（）[0]）
其他：
mainlist.append（子列表）
返回主列表
如果我正确理解了您的问题：
import re
def getlines(ori):
    matches = re.finditer(r'(<([1-4])[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

重新导入
def getlines（ori）：
matches=re.finditer（r'（.*），ori）
mainlist=[]
子列表=[]
对于比赛中的sr：
如果int（sr.groups（）[1]）==1：
如果子列表！=[]：
mainlist.append（子列表）
子列表=[]
其他：
sublist.append（sr.groups（）[0]）
其他：
mainlist.append（子列表）
返回主列表

…如果你想使用正则表达式，我会帮你完成这项工作
下面的版本会将所有数据分解为子列表（而不仅仅是每个分组中的前四个），这可能更有用，具体取决于您需要对数据执行的其他操作。使用David的listINeed=[sublist[1:4]for sublist in biglist]可以从上述特定任务的每个列表中获取前四个结果
import re
def getlines(ori):
    matches = re.finditer(r'(<(\d*)[a-zA-Z]>.*)', ori)
    mainlist = []
    sublist = []
    for sr in matches:
        if int(sr.groups()[1]) == 1:
            print "1 found!"
            if sublist != []:
                mainlist.append(sublist)
            sublist = []
        else:
            sublist.append(sr.groups()[0])
    else:
        mainlist.append(sublist)
    return mainlist

重新导入
def getlines（ori）：
matches=re.finditer（r'（.*），ori）
mainlist=[]
子列表=[]
对于比赛中的sr：
如果int（sr.groups（）[1]）==1：
打印“找到1！”
如果子列表！=[]：
mainlist.append（子列表）
子列表=[]
其他：
sublist.append（sr.groups（）[0]）
其他：
mainlist.append（子列表）
返回主列表
如果您可以编辑您的问题，以便更好地描述您想要的内容，这会有所帮助-您现在的问题真的不清楚。如果是列表列表，为什么要用“blah”表示法显示？为什么不将其显示为真正的列表？[[1，x，blah]，[2，x，value-1]，…]你真的有什么？字符串列表？为什么？因为源文件就是这样来的。我已经读入了源文件，每行的开头都有标记，我必须使用这些标记来确定要处理的内容。每个源文件都是一个子列表。记法是因为每行都以SGML标记开头。如果你可以编辑，它会有所帮助你的问题是为了更好地描述你想要什么-你现在的方式真的不清楚。如果它是一个列表列表，为什么你要用“blah”符号来显示它？为什么不将它显示为真正的列表列表？[[1，x，blah]，[2，x，value-1]，…]你真的有什么？字符串列表？为什么？因为源文件就是这样来的。我已经读入了源文件，每行的开头都有标记，我必须使用这些标记来标识要处理的内容。每个源文件都是一个子列表。记法是因为每行都以SGML标记开头。我无法确定是哪一行它们是，我贴在上面的整个东西都是子列表，所以我可能需要1个或10个单位，它们只由它们的名称表示，然后你需要在你的问题中更具体一些……我真的不明白你到底想做什么。我不确定它们是什么，我贴在那里的整个东西都是子列表，所以我可能需要1个或10个单位，这些单位只有三个名称，然后你需要在你的问题上更加具体…我真的不明白你到底想做什么。我欣赏你的创造力，但我认为我的解决方案实施起来更便宜，虽然我不是绝对肯定。它花了不到两个月的时间