在Python中按单词对文本文件进行分组（仅限于使用列表）_Python_Text Files_Grouping

在Python中按单词对文本文件进行分组（仅限于使用列表）

python

在Python中按单词对文本文件进行分组（仅限于使用列表）,python,text-files,grouping,Python,Text Files,Grouping,我是python新手，我想知道如何将文本文件按单个单词分组。例如，假设我的文本文件如下所示： eggs monday $5 john bread monday $3 harry bananas wednesday $2 john milk saturday $4 sally tomatoes sunday $7 sally 在我的例子中，我想按名称对文件进行分组。例如，对于John，我希望它像这样显示： [john,[eggs,monday],[bananas,wednesday]] 哈利和

我是python新手，我想知道如何将文本文件按单个单词分组。例如，假设我的文本文件如下所示：

eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally

在我的例子中，我想按名称对文件进行分组。例如，对于John，我希望它像这样显示：

[john,[eggs,monday],[bananas,wednesday]]

哈利和莎莉也是如此

所以现在我的代码看起来是这样的，我已经能够识别出必要的东西（即名称、项目和日期），但是我不知道如何对它进行分组

def grocery():
    file = open('shopping.txt')

    wholelist = []
    innerlist = [] 

    for line in file:
        lines = line.split()
        name = lines[3]
        item = lines[0]
        day = lines[1]

先谢谢你。此外，我只能在列表中使用列表，因此不允许使用字典。

试试这个

from collections import defaultdict

def grocery():
    wholelist = defaultdict(list)
    with open('shopping.txt') as file:
        for line in file:
            lines = line.split()
            wholelist[lines[3]].append(lines[0:2])
    return wholelist

custom_list = [[key]+val for key,val in grocery().items()]
print (sorted(custom_list,key=lambda data:data[0]))

custom_list

具有您想要的格式

如果不限于列表和字典的使用，我建议对具有不均匀字段类型的表格数据使用

会是这样的

import pandas as pd
df = pd.read_table('data.txt', names=['item', 'day', 'price', 'name'], 
                    delim_whitespace=True)
for name, group in df.groupby('name'):
    print name, ':'
    print group[['item','day']]

输出

harry :
    item     day
1  bread  monday
john :
      item        day
0     eggs     monday
2  bananas  wednesday
sally :
       item       day
3      milk  saturday
4  tomatoes    sunday

如果您仅限于使用注释中指出的列表，我将使用

列表

对象的

.index（）

方法：

table = [line.strip().split() for line in lines]  # strip the newline char
table = [row for row in table if len(row) > 0]  # remove empty lines
names = []  # for keeping names
groups = [] # for keeping groups associeated to names

for row in table:
    item, day, price, day = row
    try:
        i = names.index(name)
    except ValueError:
        names.append(name)
        i = len(names) - 1
        groups.append([])
    groups[i].append([item, day])

result = list(zip(names, groups))

输出：

如果您有仅使用列表的限制，则可以尝试以下操作：

a ="""eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally"""

sents = [b.split() for b in a.splitlines()]
names = []
for s in sents:
    if s[3] not in names:
        names.append(s[3])
        names.append([])

for name in names:
    for s in sents:
        if name == s[3]:
            names[names.index(name)+1].append([s[0], s[1]])

for no in range(0,len(names),2):
    print [names[no]] + [a for a in names[no+1]]

输出：

['john', ['eggs', 'monday'], ['bananas', 'wednesday']]
['harry', ['bread', 'monday']]
['sally', ['milk', 'saturday'], ['tomatoes', 'sunday']]

你是否考虑过使用字典？为了这个任务的目的，这个小的是基于的，我被告知我必须使用列表，所以暂时我正在尝试如何在这些限制下排序它。我们的讲师需要很多的咖啡，请在原来的问题中指出列表限制。我想这就是答案被否决的原因。

a ="""eggs monday $5 john
bread monday $3 harry
bananas wednesday $2 john
milk saturday $4 sally
tomatoes sunday $7 sally"""

sents = [b.split() for b in a.splitlines()]
names = []
for s in sents:
    if s[3] not in names:
        names.append(s[3])
        names.append([])

for name in names:
    for s in sents:
        if name == s[3]:
            names[names.index(name)+1].append([s[0], s[1]])

for no in range(0,len(names),2):
    print [names[no]] + [a for a in names[no+1]]

['john', ['eggs', 'monday'], ['bananas', 'wednesday']]
['harry', ['bread', 'monday']]
['sally', ['milk', 'saturday'], ['tomatoes', 'sunday']]