在python中，如何基于特定值将文件解析为列表？_Python_File_Parsing_List

在python中，如何基于特定值将文件解析为列表？

python file parsing list

在python中，如何基于特定值将文件解析为列表？,python,file,parsing,list,Python,File,Parsing,List,我有一个很大的制表符分隔文本文件，例如，称之为john_文件： 1约翰12354 2约翰2 34 45 66 3约翰3 35 43 54 4约翰2 34 54 78 5约翰1123465 6约翰3 34 55 66 根据名称（john1、2或3）将此文件解析为3个列表的快速方法是什么提前谢谢 from collections import defaultdict d = defaultdict(list) with open('john_file.txt') as f: for l

我有一个很大的制表符分隔文本文件，例如，称之为john_文件：

1约翰12354
2约翰2 34 45 66
3约翰3 35 43 54
4约翰2 34 54 78

5约翰1123465
6约翰3 34 55 66

根据名称（john1、2或3）将此文件解析为3个列表的快速方法是什么

提前谢谢

from collections import defaultdict

d = defaultdict(list)

with open('john_file.txt') as f:
    for line in f:
        fields = line.split('\t')
        d[fields[1]].append(line)

然后在

d['john1']

，

d['john2']

等中列出各个列表

然后在

d['john1']

，

d['john2']

等中列出各个列表，您可以执行以下操作：

fh=open('john_file.txt','r').readlines()
john_lists={}
for i in fh:
    j=i.split('\t')[1]
    if j not in johns:
        john_lists[j]=[]
    johns[j].append(i)

这样做的优点是不依赖于预先知道第二列中的可能值

正如其他人所指出的，您也可以使用

defaultdict

来执行

from collections import defaultdict
fh=open('john_file.txt','r').readlines()
john_lists=defaultdict(list)
for i in fh:
    j=i.split('\t')[1]
    johns[j].append(i)

你可以这样做：

fh=open('john_file.txt','r').readlines()
john_lists={}
for i in fh:
    j=i.split('\t')[1]
    if j not in johns:
        john_lists[j]=[]
    johns[j].append(i)

这样做的优点是不依赖于预先知道第二列中的可能值

正如其他人所指出的，您也可以使用

defaultdict

来执行

from collections import defaultdict
fh=open('john_file.txt','r').readlines()
john_lists=defaultdict(list)
for i in fh:
    j=i.split('\t')[1]
    johns[j].append(i)

>>从集合导入defaultdict
>>>a=defaultdict（列表）
>>>对于“”1中的行john1 23 54 54
... 2约翰2 34 45 66
... 3约翰3 35 43 54
... 4约翰2 34 54 78
... 5 john1 12 34 65
... 6约翰3 34 55 66
... '''.拆分（'\n'）：
...  数据=筛选器（无，行.拆分（））
...  如果数据：
...   a[数据[1]]。追加（数据）
... 
>>>资料
[]
>>>a
defaultdict（，{'john1'：['1'，'john1'，'23'，'54'，'54']，['5'，'john1'，'12'，'34'，'65']，'john2'：['2'，'john2'，'34'，'66']，['4'，'john2'，'34'，'54']，'john3'，'35'，'43'，'54']，['6'，'john3'，'34'，'55'，'66']）

>>从集合导入defaultdict
>>>a=defaultdict（列表）
>>>对于“”1中的行john1 23 54 54
... 2约翰2 34 45 66
... 3约翰3 35 43 54
... 4约翰2 34 54 78
... 5 john1 12 34 65
... 6约翰3 34 55 66
... '''.拆分（'\n'）：
...  数据=筛选器（无，行.拆分（））
...  如果数据：
...   a[数据[1]]。追加（数据）
... 
>>>资料
[]
>>>a
defaultdict（，{'john1'：['1'，'john1'，'23'，'54'，'54']，['5'，'john1'，'12'，'34'，'65']，'john2'：['2'，'john2'，'34'，'66']，['4'，'john2'，'34'，'54']，'john3'，'35'，'43'，'54']，['6'，'john3'，'34'，'55'，'66']）

littletable使这种简单的切片和切分变得容易，使对象列表可以通过属性访问/查询/透视，就像一个小型内存数据库，但比SQLite的开销更小

from collections import namedtuple
from littletable import Table

data = """\
 1 john1 23 54 54
 2 john2 34 45 66
 3 john3 35 43 54
 4 john2 34 54 78
 5 john1 12 34 65
 6 john3 34 55 66"""

Record = namedtuple("Record", "id name length width height")
def makeRecord(s):
    s = s.strip().split()
    # convert all but name to ints, and build a Record instance
    return Record(*(ss if i == 1 else int(ss) for i,ss in enumerate(s)))

# create a table and load it up 
# (if this were CSV data, would be even simpler)
t = Table("data")
t.create_index("id", unique=True)
t.create_index("name")
t.insert_many(map(makeRecord, data.splitlines()))

# get a record by unique key 
# (unique indexes return just the single record)
print t.id[4]
print

# get all records matching an indexed value 
# (non-unique index retrievals return a new Table)
for d in t.name['john1']:
    print d
print

# dump summary pivot tables
t.pivot('name').dump_counts()
print

t.create_index('length')
t.pivot('name length').dump_counts()

印刷品：

Record(id=4, name='john2', length=34, width=54, height=78)

Record(id=1, name='john1', length=23, width=54, height=54)
Record(id=5, name='john1', length=12, width=34, height=65)

Pivot: name
john1       2
john2       2
john3       2

Pivot: name,length
           12      23      34      35   Total
john1       1       1       0       0       2
john2       0       0       2       0       2
john3       0       0       1       1       2
Total       1       1       3       1       6

littletable使这种简单的切片和切分变得容易，使对象列表可以通过属性访问/查询/透视，就像内存中的迷你数据库一样，但开销甚至比SQLite更小

from collections import namedtuple
from littletable import Table

data = """\
 1 john1 23 54 54
 2 john2 34 45 66
 3 john3 35 43 54
 4 john2 34 54 78
 5 john1 12 34 65
 6 john3 34 55 66"""

Record = namedtuple("Record", "id name length width height")
def makeRecord(s):
    s = s.strip().split()
    # convert all but name to ints, and build a Record instance
    return Record(*(ss if i == 1 else int(ss) for i,ss in enumerate(s)))

# create a table and load it up 
# (if this were CSV data, would be even simpler)
t = Table("data")
t.create_index("id", unique=True)
t.create_index("name")
t.insert_many(map(makeRecord, data.splitlines()))

# get a record by unique key 
# (unique indexes return just the single record)
print t.id[4]
print

# get all records matching an indexed value 
# (non-unique index retrievals return a new Table)
for d in t.name['john1']:
    print d
print

# dump summary pivot tables
t.pivot('name').dump_counts()
print

t.create_index('length')
t.pivot('name length').dump_counts()

印刷品：

Record(id=4, name='john2', length=34, width=54, height=78)

Record(id=1, name='john1', length=23, width=54, height=54)
Record(id=5, name='john1', length=12, width=34, height=65)

Pivot: name
john1       2
john2       2
john3       2

Pivot: name,length
           12      23      34      35   Total
john1       1       1       0       0       2
john2       0       0       2       0       2
john3       0       0       1       1       2
Total       1       1       3       1       6

你准备好回答你自己的问题了吗？你的解决方案似乎很不错quick@Christian：谢谢你的快速回复。在本例的代码中，我必须编写3个循环。在我的实际文件中，我从约翰一号到约翰三十号，所以我在寻找一种更简洁的方式。你准备好回答你自己的问题了吗？你的解决方案似乎很不错quick@Christian：谢谢你的快速回复。在本例的代码中，我必须编写3个循环。在我的实际文件中，我从john1到john30，因此我正在寻找一种更简洁的方法。如果将

john_列表

设置为

collections.defaultdict（list）

，则不需要If语句。如果将

john_列表

设置为

collections.defaultdict（list）

，则不需要If语句。