在Python中按单词对文本文件进行分组(仅限于使用列表)
我是python新手,我想知道如何将文本文件按单个单词分组。例如,假设我的文本文件如下所示:在Python中按单词对文本文件进行分组(仅限于使用列表),python,text-files,grouping,Python,Text Files,Grouping,我是python新手,我想知道如何将文本文件按单个单词分组。例如,假设我的文本文件如下所示: eggs monday $5 john bread monday $3 harry bananas wednesday $2 john milk saturday $4 sally tomatoes sunday $7 sally 在我的例子中,我想按名称对文件进行分组。例如,对于John,我希望它像这样显示: [john,[eggs,monday],[bananas,wednesday]] 哈利和
eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally
在我的例子中,我想按名称对文件进行分组。例如,对于John,我希望它像这样显示:
[john,[eggs,monday],[bananas,wednesday]]
哈利和莎莉也是如此
所以现在我的代码看起来是这样的,我已经能够识别出必要的东西(即名称、项目和日期),但是我不知道如何对它进行分组
def grocery():
file = open('shopping.txt')
wholelist = []
innerlist = []
for line in file:
lines = line.split()
name = lines[3]
item = lines[0]
day = lines[1]
先谢谢你。此外,我只能在列表中使用列表,因此不允许使用字典。试试这个
from collections import defaultdict
def grocery():
wholelist = defaultdict(list)
with open('shopping.txt') as file:
for line in file:
lines = line.split()
wholelist[lines[3]].append(lines[0:2])
return wholelist
custom_list = [[key]+val for key,val in grocery().items()]
print (sorted(custom_list,key=lambda data:data[0]))
custom_list
具有您想要的格式如果不限于列表和字典的使用,我建议对具有不均匀字段类型的表格数据使用
会是这样的
import pandas as pd
df = pd.read_table('data.txt', names=['item', 'day', 'price', 'name'],
delim_whitespace=True)
for name, group in df.groupby('name'):
print name, ':'
print group[['item','day']]
输出
harry :
item day
1 bread monday
john :
item day
0 eggs monday
2 bananas wednesday
sally :
item day
3 milk saturday
4 tomatoes sunday
如果您仅限于使用注释中指出的列表,我将使用列表
对象的.index()
方法:
table = [line.strip().split() for line in lines] # strip the newline char
table = [row for row in table if len(row) > 0] # remove empty lines
names = [] # for keeping names
groups = [] # for keeping groups associeated to names
for row in table:
item, day, price, day = row
try:
i = names.index(name)
except ValueError:
names.append(name)
i = len(names) - 1
groups.append([])
groups[i].append([item, day])
result = list(zip(names, groups))
输出:
如果您有仅使用列表的限制,则可以尝试以下操作:
a ="""eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally"""
sents = [b.split() for b in a.splitlines()]
names = []
for s in sents:
if s[3] not in names:
names.append(s[3])
names.append([])
for name in names:
for s in sents:
if name == s[3]:
names[names.index(name)+1].append([s[0], s[1]])
for no in range(0,len(names),2):
print [names[no]] + [a for a in names[no+1]]
输出:
['john', ['eggs', 'monday'], ['bananas', 'wednesday']]
['harry', ['bread', 'monday']]
['sally', ['milk', 'saturday'], ['tomatoes', 'sunday']]
你是否考虑过使用字典?为了这个任务的目的,这个小的是基于的,我被告知我必须使用列表,所以暂时我正在尝试如何在这些限制下排序它。我们的讲师需要很多的咖啡,请在原来的问题中指出列表限制。我想这就是答案被否决的原因。
a ="""eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally"""
sents = [b.split() for b in a.splitlines()]
names = []
for s in sents:
if s[3] not in names:
names.append(s[3])
names.append([])
for name in names:
for s in sents:
if name == s[3]:
names[names.index(name)+1].append([s[0], s[1]])
for no in range(0,len(names),2):
print [names[no]] + [a for a in names[no+1]]
['john', ['eggs', 'monday'], ['bananas', 'wednesday']]
['harry', ['bread', 'monday']]
['sally', ['milk', 'saturday'], ['tomatoes', 'sunday']]