Python，在3列中组合唯一内容（Excel电子表格）_Python_Excel_List

Python，在3列中组合唯一内容（Excel电子表格）

python excel list

Python，在3列中组合唯一内容（Excel电子表格）,python,excel,list,Python,Excel,List,你好。Excel电子表格中有一些数据，结构如下：它希望将3列中的唯一值放在一起，很好地转换为如下格式： Mike to America for Hotel; Meal 及等我只能计算两列 the_file = xlrd.open_workbook("testing.xlsx") the_sheet = the_file.sheet_by_name("Sheet1") products = defaultdict(list) for row_index in range(1, the

你好。Excel电子表格中有一些数据，结构如下：

它希望将3列中的唯一值放在一起，很好地转换为如下格式：

Mike to America for Hotel; Meal

及

等

我只能计算两列

the_file = xlrd.open_workbook("testing.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

products = defaultdict(list)

for row_index in range(1, the_sheet.nrows):
    products[str(the_sheet.cell(row_index, 0).value)].append(the_sheet.cell(row_index, 1).value) 

for product, v in products.items()
    print product + " to " + ";".join(set(v))

输出为：

Mike to America
Hulk to America;Asia
Kate to Europe;America
Dave to Europe
Jack to Europe;America;Asia
Luci to Asia

如何使这些行同时适用于3列？

多谢各位

首先，在这里提取所需的行，我将其作为嵌套列表，即。

[[col1，col2，col3]，[col1，col2，col3]]

box = list()
bigbox = []
for i in range(len(the_sheet.col(1))):
    if i > 2:
        for j in range(1,4):
            box.append(str(the_sheet.col(j)[i]).split(":")[1])
        bigbox.append(box)
        box = []

print bigbox

然后我将嵌套列表转换为嵌套集合的嵌套字典 i、 e.

{'name'：{'travel'：差旅集，'expense'：费用集}，…}

dbox = dict()

for name, travel, expense in bigbox:
    if name not in dbox:
        dbox[name] = {'travel': {travel}, 'expense': {expense}}
    else:
        dbox[name]['travel'].add(travel)
        dbox[name]['expense'].add(expense)

print dbox

最后，你用巫术把它打印出来，阅读文档了解更多信息

for name in dbox:
    print(name, 'to', "; ".join(dbox[name]['travel']), 'for', "; ".join(dbox[name]['expense']))

希望这些帮助我想抱怨你没有给我下一次自己创建所需的excel文件，包括它，这是一些编程课程作业所熟悉的

首先，在这里提取所需的行，我将其作为嵌套列表，即。

[[col1，col2，col3]，[col1，col2，col3]]

box = list()
bigbox = []
for i in range(len(the_sheet.col(1))):
    if i > 2:
        for j in range(1,4):
            box.append(str(the_sheet.col(j)[i]).split(":")[1])
        bigbox.append(box)
        box = []

print bigbox

然后我将嵌套列表转换为嵌套集合的嵌套字典 i、 e.

{'name'：{'travel'：差旅集，'expense'：费用集}，…}

dbox = dict()

for name, travel, expense in bigbox:
    if name not in dbox:
        dbox[name] = {'travel': {travel}, 'expense': {expense}}
    else:
        dbox[name]['travel'].add(travel)
        dbox[name]['expense'].add(expense)

print dbox

最后，你用巫术把它打印出来，阅读文档了解更多信息

for name in dbox:
    print(name, 'to', "; ".join(dbox[name]['travel']), 'for', "; ".join(dbox[name]['expense']))

希望这些帮助我想抱怨你没有给我下一次自己创建所需的excel文件，包括它，这是一些编程课程作业所熟悉的

我想有一种更像蟒蛇的方法，但这就是我想到的：

from collections import defaultdict


l = [
    ['mike', 'america', 'hotel'],
    ['mike', 'america', 'meal'],
    ['jack', 'america', 'meal'],
    ['jack', 'europe', 'hotel'],
    ['jack', 'america', 'bonus'],
    ['jack', 'asia', 'hotel'],
    ['dave', 'europe', 'meal'],
]

people = defaultdict(list)
people_places = defaultdict(list)

for row_index in range(len(l)):
    people[l[row_index][0]].append(l[row_index][1])
    people_places[l[row_index][0] + '|' + l[row_index][1]].append(l[row_index][2])

for p, k in people.items():
    activity = []
    for place in k:
        activity += people_places[p + '|' + place]
    print '{} to {} for {}'.format(
        p,
        ';'.join(set(k)),
        ';'.join(set(activity))
    )

您可以将代码转换为直接使用电子表格行和单元格，或首先使用以下内容提取列表：

l = []
with xlrd.open_workbook("testing.xlsx") as the_file:
    the_sheet = the_file.sheet_by_name("Sheet1")

    for row_index in range(1, the_sheet.nrows):
        l.append([
            the_sheet.cell(row_index, 0).value, 
            the_sheet.cell(row_index, 1).value, 
            the_sheet.cell(row_index, 2).value])

我认为有一种更像蟒蛇的方法，但这就是我想到的：

from collections import defaultdict


l = [
    ['mike', 'america', 'hotel'],
    ['mike', 'america', 'meal'],
    ['jack', 'america', 'meal'],
    ['jack', 'europe', 'hotel'],
    ['jack', 'america', 'bonus'],
    ['jack', 'asia', 'hotel'],
    ['dave', 'europe', 'meal'],
]

people = defaultdict(list)
people_places = defaultdict(list)

for row_index in range(len(l)):
    people[l[row_index][0]].append(l[row_index][1])
    people_places[l[row_index][0] + '|' + l[row_index][1]].append(l[row_index][2])

for p, k in people.items():
    activity = []
    for place in k:
        activity += people_places[p + '|' + place]
    print '{} to {} for {}'.format(
        p,
        ';'.join(set(k)),
        ';'.join(set(activity))
    )

您可以将代码转换为直接使用电子表格行和单元格，或首先使用以下内容提取列表：

l = []
with xlrd.open_workbook("testing.xlsx") as the_file:
    the_sheet = the_file.sheet_by_name("Sheet1")

    for row_index in range(1, the_sheet.nrows):
        l.append([
            the_sheet.cell(row_index, 0).value, 
            the_sheet.cell(row_index, 1).value, 
            the_sheet.cell(row_index, 2).value])

此时此刻我想到的解决方案是：

from collections import defaultdict

the_file = xlrd.open_workbook("4_test.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

nested_dict = lambda: defaultdict(nested_dict)
_dict = nested_dict()

for row_index in range(1, the_sheet.nrows):
    expense = []
    travel = []
    name = str(the_sheet.cell(row_index, 0).value)
    for row_index_1 in range(1, the_sheet.nrows):
        if name == str(the_sheet.cell(row_index_1, 0).value):
            travel.append(str(the_sheet.cell(row_index_1, 1).value))
            expense.append(str(the_sheet.cell(row_index_1, 2).value))
            _dict[name]['travel'] = travel
            _dict[name]['expense']= expense

for name in _dict:
    print name + " to "+ ",".join(set(_dict[name]['travel'])) + " for " + ",".join(set(_dict[name]['expense']))

输出：：

品图去欧洲吃骨头

杰克去欧洲、美国住旅馆、吃饭

迈克到美国去度假、旅馆、交通

此时此刻我想到的解决方案是：

from collections import defaultdict

the_file = xlrd.open_workbook("4_test.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

nested_dict = lambda: defaultdict(nested_dict)
_dict = nested_dict()

for row_index in range(1, the_sheet.nrows):
    expense = []
    travel = []
    name = str(the_sheet.cell(row_index, 0).value)
    for row_index_1 in range(1, the_sheet.nrows):
        if name == str(the_sheet.cell(row_index_1, 0).value):
            travel.append(str(the_sheet.cell(row_index_1, 1).value))
            expense.append(str(the_sheet.cell(row_index_1, 2).value))
            _dict[name]['travel'] = travel
            _dict[name]['expense']= expense

for name in _dict:
    print name + " to "+ ",".join(set(_dict[name]['travel'])) + " for " + ",".join(set(_dict[name]['expense']))

输出：：

品图去欧洲吃骨头

杰克去欧洲、美国住旅馆、吃饭

迈克到美国去度假、旅馆、交通

我看不到您将单元格（行索引，2）添加到管道中的位置。@steppo，因为我不知道如何以及在何处添加它们……我看不到您将单元格（行索引，2）添加到管道中的位置。@steppo，因为我不知道如何以及在何处添加它们……谢谢。请勾选“box.append（str）（the_sheet.col（j）[i]）.split（“：”[1]）”？这似乎不可行。请使用此

框。追加（the_sheet.col（j）[i].value）

对于该行不需要拆分我以前从未使用过此库。抱歉。请尝试更改范围（3）中j的

因为它对我很有用，所以您的excel文件将非常有用。您需要检查i或j中的索引是否超出索引范围，或者只需执行以下操作添加一个try-and-except块，如

try:box.append（…）except:print（一些错误）

再次感谢您。只是不幸的是，我的知识赶不上。。。。希望你不介意我选择另一个合适的答案…谢谢。请勾选“box.append（str）（the_sheet.col（j）[i]）.split（“：”[1]）”？这似乎不可行。请使用此

框。追加（the_sheet.col（j）[i].value）

对于该行不需要拆分我以前从未使用过此库。抱歉。请尝试更改范围（3）中j的

因为它对我很有用，所以您的excel文件将非常有用。您需要检查i或j中的索引是否超出索引范围，或者只需执行以下操作添加一个try-and-except块，如

try:box.append（…）except:print（一些错误）

再次感谢您。只是不幸的是，我的知识赶不上。。。。希望你不介意我选择另一个合适的答案…谢谢。你能不能同时添加制作大列表l的方法？添加了一些关于列表提取的提示。如果您对数据集提取感兴趣，您最终可以将pandas（）视为自动化此类任务的一个伟大工具。谢谢。你能不能同时添加制作大列表l的方法？添加了一些关于列表提取的提示。如果您对数据集提取感兴趣，您最终可以将pandas（）视为自动化此类任务的一个伟大工具。谢谢您的帮助。这是一个很好的方法（但是在一长串的电子表格数据上运行起来似乎很慢），谢谢你的帮助。这是一个很好的方法（但在一长串电子表格数据上运行起来似乎非常慢）