Python 在字符串列表中查找子字符串列表的索引;填写缺少的值

Python 在字符串列表中查找子字符串列表的索引;填写缺少的值,python,substring,list-comprehension,Python,Substring,List Comprehension,我试图确保在字符串列表中出现预期的子字符串列表。我需要知道是否有一个丢失,以便填充它。我需要在字符串列表中找到子字符串列表的索引,这样我就可以提取它旁边字符串的值。 使用Python3 # List of strings parsed from a document strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'], ['name', 'Winnie Cooler', 'email', 'Wi

我试图确保在字符串列表中出现预期的子字符串列表。我需要知道是否有一个丢失,以便填充它。我需要在字符串列表中找到子字符串列表的索引,这样我就可以提取它旁边字符串的值。 使用Python3

# List of strings parsed from a document
strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
           ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone', 
            '555-555-5550']]
# Expected/desired headings
subs = ['name', 'email', 'phone']
然后检查是否捕获了所有“sub”。如果没有,请找到那些并用nan填写

预期成果:

{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555- 
 5550'}
为此,我将使用列表理解选择字典输出

for row in strings:
    # Get key:value of each sub in row
    foundSubs = dict((s,row[i+1]) for (i,s) in enumerate([n.lower() for n 
                     in row]) for sub in subs if sub in s)

# check for all subs in result: name, email, phone
#    if one missing, fill in nan
for eachSub in subs:
    if [i for i in foundSubs if eachSub in i] == []:
        foundSubs[eachSub] = np.nan

print (foundSubs)
结果:

{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555- 
 5550'}
可以通过不使用列表中的“dict”将其转换为列表元组格式:

[('name', 'Joe Sixpack'), ('email', 'beerme@thebrew.com'), ('phone', nan)]
[('name', 'Winnie Cooler'), ('email', 'Winnie Cooler'), ('phone', '555-555- 
 5550')]

这个问题似乎是关于如何将解决问题所需的逻辑步骤转换为代码。甚至在开始使用Python之前,使用伪代码来清楚地看到所需的逻辑步骤是很有帮助的

for each row of data:
    * initialize a new output data structure for this row
    for each required key:
        if the key is in the row:
            * find the indices associated with the key/value pair
            * store key/value pair in the output data
        otherwise (i.e. if the key is not in the row):
            * store key/None pair in the output data 
您几乎可以直接将此伪代码转换为工作Python代码。这是一种非常明确的方法,在逻辑的每个步骤中使用循环和变量声明,这是一种很好的学习练习。稍后,您可能希望针对性能和/或样式对此进行优化

从文档中解析的字符串列表 strings=[['name','Joe Sixpack','email','beerme@thebrew.com'], ['name'、'Winnie Cooler'、'email'、'Winnie Cooler'、'phone', '555-555-5550']] 预期/期望标题 subs=['name','email','phone'] 为每行创建字典 结果=[] 对于字符串中的行: d={} 对于钥匙插入式接头: 如果在行中输入: key\u idx=行。索引key val_idx=键_idx+1 val=行[val_idx] 其他: val=无 d[键]=val 结果见附件 打印结果 结果:

{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555- 
 5550'}
[{'name':'Joe Sixpack','email':'beerme@thebrew.com“,”电话“:无}, {'name':'Winnie Cooler','email':'Winnie Cooler','phone':'555-555-5550'}]
我们将列表转换为集合,并查找缺少的值: 如果我们找到了一个,我们将在列表中附加缺少的值和NONE

# List of strings parsed from a document
    data = [['name', 'Joe Sixpack','email', 'Winnie Cooler'],
               ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone', 
                '555-555-5550']]
    # Expected/desired headings
    subs = set(['name', 'email', 'phone'])

    for node in data:
        missingValue = subs.difference(set(node))
        if missingValue:
            for value in missingValue:
                node.append(value)
                node.append(None)
        print(node)
输出

一条一号班轮:

>>> strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
...            ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone', 
...             '555-555-5550']]
>>> subs = ['name', 'email', 'phone']
>>> [{**{k: None for k in subs}, **dict(zip(s[::2], s[1::2]))} for s in strings]
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None}, {'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]
注意:对于电话号码来说,没有比nan更好的了

列表理解的核心是:dictzips[::2],s[1::2]:s[::2]创建s的偶数元素列表,s[1::2]创建奇数元素列表。这两个都是在一个iterable奇数,偶数,奇数,偶数拉链。。。那就是“姓名”、“乔·西斯帕克”、“电子邮件”等等beerme@thebrew.com'对于第一个字符串。他们用字典和口述词包装起来

现在是默认值。{k:None for k in subs}是一个字典{'name':None,'email':None,'phone':None}。这两个字典都合并了,请看,重复项的值取自第一个字典,瞧

>>> strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
...            ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone', 
...             '555-555-5550']]
>>> subs = ['name', 'email', 'phone']
>>> [{**{k: None for k in subs}, **dict(zip(s[::2], s[1::2]))} for s in strings]
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None}, {'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]