如何在python中合并或连接列表并匹配记录

如何在python中合并或连接列表并匹配记录,python,arrays,list,Python,Arrays,List,我搜索过了,但没有找到足够近的东西 考虑以下3个或3个以上包含文件名或任何其他对象的列表-在特定目录中找到的列表(相关): 我想要得到的是一个包含三列的表(可以是excel、管道分隔的txt或类似文件),其中包含三列: column1 (c:\\temp) | column2 (d:\\myfiles) | column3 (d:\\backup) ------------------------------------------------------------------ file1.t

我搜索过了,但没有找到足够近的东西

考虑以下3个或3个以上包含文件名或任何其他对象的列表-在特定目录中找到的列表(相关):

我想要得到的是一个包含三列的表(可以是excel、管道分隔的txt或类似文件),其中包含三列:

column1 (c:\\temp) | column2 (d:\\myfiles) | column3 (d:\\backup)
------------------------------------------------------------------
file1.txt          | file1.txt             | <blank>
file2.txt          | file2.txt             | file2.txt
file3.txt          | <blank>               | file3.txt
<blank>            | file4.txt             | file4.txt
column1(c:\\temp)| column2(d:\\myfiles)| column3(d:\\backup)
------------------------------------------------------------------
file1.txt | file1.txt |
file2.txt | file2.txt | file2.txt
file3.txt | | file3.txt
|file4.txt | file4.txt
我有这个列表,但我不知道有什么函数或方法可以按照上面提供的方式对列表进行排序。Python2.7是我正在使用的

欢迎任何意见


-geo

对于您正在处理的问题,字典不是更好的数据结构吗?首先,让我们将数据转换为字典:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
我知道这是一个很难理解的理解,但它会给你一本很好的字典:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
现在,为了以您希望的方式显示文件,我们可以简单地循环键,然后循环字典的值:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
#标题
对于键入文件。keys():
打印(“%-15s”%key,end=“”)
打印(“\n”+“=”*44)
#价值观
size=max(len(val)表示文件中的val.values())
对于范围内的i(尺寸):
对于文件中的路径:
name=“文件%s.txt”%str(i+1)
如果文件[路径]中的名称:
打印(“%-15s”%name,end=“”)
其他:
打印(“%-15s”%”,end=“”)
打印()
输出符合要求:

c:\temp        d:\myfiles     d:\backup
============================================
file1.txt      file1.txt      <blank>        
file2.txt      <blank>        file2.txt      
file3.txt      <blank>        file3.txt 
c:\temp d:\myfiles d:\backup
============================================
file1.txt file1.txt
file2.txt file2.txt
file3.txt file3.txt

注意:

对于您正在处理的问题,字典不是更好的数据结构吗?首先,让我们将数据转换为字典:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
我知道这是一个很难理解的理解,但它会给你一本很好的字典:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
现在,为了以您希望的方式显示文件,我们可以简单地循环键,然后循环字典的值:

collections = [list1, list2, list3]
files = {'\\'.join(collection[0].split('\\')[:-1]): [item.split('\\')[-1] for item in collection] for collection in collections}
{'c:\\temp': ['file1.txt', 'file2.txt', 'file3.txt'], 'd:\\myfiles': ['file1.txt', 'file2.tx', 'file4.txt'], 'd:\\backup': ['file2.txt', 'file3.txt', 'file4.txt']}
# Headers
for key in files.keys():
    print("%-15s" % key, end="")
print("\n" + "="*44)

#Values
size = max(len(val) for val in files.values())
for i in range(size):
    for path in files:
        name =  "file%s.txt" % str(i+1)
        if name in files[path]:
           print("%-15s" % name, end="")
        else:
            print("%-15s" % "<blank>", end="")
    print()
#标题
对于键入文件。keys():
打印(“%-15s”%key,end=“”)
打印(“\n”+“=”*44)
#价值观
size=max(len(val)表示文件中的val.values())
对于范围内的i(尺寸):
对于文件中的路径:
name=“文件%s.txt”%str(i+1)
如果文件[路径]中的名称:
打印(“%-15s”%name,end=“”)
其他:
打印(“%-15s”%”,end=“”)
打印()
输出符合要求:

c:\temp        d:\myfiles     d:\backup
============================================
file1.txt      file1.txt      <blank>        
file2.txt      <blank>        file2.txt      
file3.txt      <blank>        file3.txt 
c:\temp d:\myfiles d:\backup
============================================
file1.txt file1.txt
file2.txt file2.txt
file3.txt file3.txt

注意:

我同意Sam的观点,第一步是将列表转换为列表字典

from collections import defaultdict

flattened_list = [s for sub in [list1, list2, list3] for s in sub]
tracker = defaultdict(list)

for path in flattened_list:
    dirname, _, basename = path.rpartition('\\')
    tracker[dirname].append(basename)

# {'c:\\temp':    ['file1.txt', 'file2.txt', 'file3.txt'], 
#  'd:\\myfiles': ['file1.txt', 'file2.txt', 'file4.txt'], 
#  'd:\\backup':  ['file2.txt', 'file3.txt', 'file4.txt']}
从这里开始,直接将这些数据转换为列数据列表或行数据列表

dirnames = sorted(tracker)
basenames = sorted(set(sum(tracker.values(), []))) # a set of all file names

# constructs a list for each directory, filling in empty slots with '<blank>'
files = [[b if b in tracker[d] else '<blank>' for b in basenames] for d in dirnames]

column_output = [[d] + f for d, f in zip(dirnames, files)]
# [['c:\\temp',    'file1.txt', 'file2.txt', 'file3.txt', '<blank>'], 
#  ['d:\\myfiles', 'file1.txt', 'file2.txt', '<blank>',   'file4.txt'],
#  ['d:\\backup',  '<blank>',   'file2.txt', 'file3.txt', 'file4.txt']]

row_output = zip(*column_output)
# [('c:\\temp',  'd:\\backup', 'd:\\myfiles'), 
#  ('file1.txt', '<blank>',    'file1.txt'), 
#  ('file2.txt', 'file2.txt',  'file2.txt'), 
#  ('file3.txt', 'file3.txt',  '<blank>'), 
#  ('<blank>',   'file4.txt',  'file4.txt')]
dirnames=sorted(跟踪器)
basenames=sorted(set(sum(tracker.values(),[]))#所有文件名的集合
#为每个目录构造一个列表,用“”填充空插槽
files=[[b if b in tracker[d]else''for b in basenames]for d in dirnames]
列输出=[[d]+f代表d,zip中的f(目录名、文件)]
#[['c:\\temp'、'file1.txt'、'file2.txt'、'file3.txt'、'''],
#['d:\\myfiles','file1.txt','file2.txt','file4.txt'],
#['d:\\backup','','file2.txt','file3.txt','file4.txt']
行输出=zip(*列输出)
#[('c:\\temp','d:\\backup','d:\\myfiles'),
#('file1.txt','file1.txt'),
#('file2.txt','file2.txt','file2.txt'),
#('file3.txt','file3.txt',''),
#(“”,'file4.txt','file4.txt')]

以您想要的方式打印这些列表或将其写入Excel文件是另一个问题,但应该很简单。

我同意Sam的观点,第一步是将列表转换为列表字典

from collections import defaultdict

flattened_list = [s for sub in [list1, list2, list3] for s in sub]
tracker = defaultdict(list)

for path in flattened_list:
    dirname, _, basename = path.rpartition('\\')
    tracker[dirname].append(basename)

# {'c:\\temp':    ['file1.txt', 'file2.txt', 'file3.txt'], 
#  'd:\\myfiles': ['file1.txt', 'file2.txt', 'file4.txt'], 
#  'd:\\backup':  ['file2.txt', 'file3.txt', 'file4.txt']}
从这里开始,直接将这些数据转换为列数据列表或行数据列表

dirnames = sorted(tracker)
basenames = sorted(set(sum(tracker.values(), []))) # a set of all file names

# constructs a list for each directory, filling in empty slots with '<blank>'
files = [[b if b in tracker[d] else '<blank>' for b in basenames] for d in dirnames]

column_output = [[d] + f for d, f in zip(dirnames, files)]
# [['c:\\temp',    'file1.txt', 'file2.txt', 'file3.txt', '<blank>'], 
#  ['d:\\myfiles', 'file1.txt', 'file2.txt', '<blank>',   'file4.txt'],
#  ['d:\\backup',  '<blank>',   'file2.txt', 'file3.txt', 'file4.txt']]

row_output = zip(*column_output)
# [('c:\\temp',  'd:\\backup', 'd:\\myfiles'), 
#  ('file1.txt', '<blank>',    'file1.txt'), 
#  ('file2.txt', 'file2.txt',  'file2.txt'), 
#  ('file3.txt', 'file3.txt',  '<blank>'), 
#  ('<blank>',   'file4.txt',  'file4.txt')]
dirnames=sorted(跟踪器)
basenames=sorted(set(sum(tracker.values(),[]))#所有文件名的集合
#为每个目录构造一个列表,用“”填充空插槽
files=[[b if b in tracker[d]else''for b in basenames]for d in dirnames]
列输出=[[d]+f代表d,zip中的f(目录名、文件)]
#[['c:\\temp'、'file1.txt'、'file2.txt'、'file3.txt'、'''],
#['d:\\myfiles','file1.txt','file2.txt','file4.txt'],
#['d:\\backup','','file2.txt','file3.txt','file4.txt']
行输出=zip(*列输出)
#[('c:\\temp','d:\\backup','d:\\myfiles'),
#('file1.txt','file1.txt'),
#('file2.txt','file2.txt','file2.txt'),
#('file3.txt','file3.txt',''),
#(“”,'file4.txt','file4.txt')]

以您想要的方式打印或将其写入Excel文件是另一个问题,但应该足够简单。

列表是否按文件名排序?这是您要求编写的相当多的代码…请注意右侧先前答案的“相关”链接-它们都有4位数范围的向上投票(很少看到)。其中一个必须要工作…不-我不是要求提供代码(也许我没有说清楚:()-仅仅是可以使用的结构的想法。我一直在使用列表,但正如其他人建议的那样,也许字典可能更好。我实际上可以通过将所有项目放在一起来做到这一点(排序、抓取最大的列表,然后在找到值后追加/连接并继续…但可能需要对每个列表进行迭代,但这似乎不够优雅。列表是否按文件名排序?这是您要求编写的相当多的代码…请注意“相关的”链接到右边先前的答案-他们都有4位数的投票