Python 在字符串列表中查找子字符串列表的索引;填写缺少的值
我试图确保在字符串列表中出现预期的子字符串列表。我需要知道是否有一个丢失,以便填充它。我需要在字符串列表中找到子字符串列表的索引,这样我就可以提取它旁边字符串的值。 使用Python3Python 在字符串列表中查找子字符串列表的索引;填写缺少的值,python,substring,list-comprehension,Python,Substring,List Comprehension,我试图确保在字符串列表中出现预期的子字符串列表。我需要知道是否有一个丢失,以便填充它。我需要在字符串列表中找到子字符串列表的索引,这样我就可以提取它旁边字符串的值。 使用Python3 # List of strings parsed from a document strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'], ['name', 'Winnie Cooler', 'email', 'Wi
# List of strings parsed from a document
strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = ['name', 'email', 'phone']
然后检查是否捕获了所有“sub”。如果没有,请找到那些并用nan填写
预期成果:
{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-
5550'}
为此,我将使用列表理解选择字典输出
for row in strings:
# Get key:value of each sub in row
foundSubs = dict((s,row[i+1]) for (i,s) in enumerate([n.lower() for n
in row]) for sub in subs if sub in s)
# check for all subs in result: name, email, phone
# if one missing, fill in nan
for eachSub in subs:
if [i for i in foundSubs if eachSub in i] == []:
foundSubs[eachSub] = np.nan
print (foundSubs)
结果:
{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-
5550'}
可以通过不使用列表中的“dict”将其转换为列表元组格式:
[('name', 'Joe Sixpack'), ('email', 'beerme@thebrew.com'), ('phone', nan)]
[('name', 'Winnie Cooler'), ('email', 'Winnie Cooler'), ('phone', '555-555-
5550')]
这个问题似乎是关于如何将解决问题所需的逻辑步骤转换为代码。甚至在开始使用Python之前,使用伪代码来清楚地看到所需的逻辑步骤是很有帮助的
for each row of data:
* initialize a new output data structure for this row
for each required key:
if the key is in the row:
* find the indices associated with the key/value pair
* store key/value pair in the output data
otherwise (i.e. if the key is not in the row):
* store key/None pair in the output data
您几乎可以直接将此伪代码转换为工作Python代码。这是一种非常明确的方法,在逻辑的每个步骤中使用循环和变量声明,这是一种很好的学习练习。稍后,您可能希望针对性能和/或样式对此进行优化
从文档中解析的字符串列表
strings=[['name','Joe Sixpack','email','beerme@thebrew.com'],
['name'、'Winnie Cooler'、'email'、'Winnie Cooler'、'phone',
'555-555-5550']]
预期/期望标题
subs=['name','email','phone']
为每行创建字典
结果=[]
对于字符串中的行:
d={}
对于钥匙插入式接头:
如果在行中输入:
key\u idx=行。索引key
val_idx=键_idx+1
val=行[val_idx]
其他:
val=无
d[键]=val
结果见附件
打印结果
结果:
{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': nan}
{'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-
5550'}
[{'name':'Joe Sixpack','email':'beerme@thebrew.com“,”电话“:无},
{'name':'Winnie Cooler','email':'Winnie Cooler','phone':'555-555-5550'}]
我们将列表转换为集合,并查找缺少的值: 如果我们找到了一个,我们将在列表中附加缺少的值和NONE
# List of strings parsed from a document
data = [['name', 'Joe Sixpack','email', 'Winnie Cooler'],
['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
'555-555-5550']]
# Expected/desired headings
subs = set(['name', 'email', 'phone'])
for node in data:
missingValue = subs.difference(set(node))
if missingValue:
for value in missingValue:
node.append(value)
node.append(None)
print(node)
输出
一条一号班轮:
>>> strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
... ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
... '555-555-5550']]
>>> subs = ['name', 'email', 'phone']
>>> [{**{k: None for k in subs}, **dict(zip(s[::2], s[1::2]))} for s in strings]
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None}, {'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]
注意:对于电话号码来说,没有比nan更好的了
列表理解的核心是:dictzips[::2],s[1::2]:s[::2]创建s的偶数元素列表,s[1::2]创建奇数元素列表。这两个都是在一个iterable奇数,偶数,奇数,偶数拉链。。。那就是“姓名”、“乔·西斯帕克”、“电子邮件”等等beerme@thebrew.com'对于第一个字符串。他们用字典和口述词包装起来
现在是默认值。{k:None for k in subs}是一个字典{'name':None,'email':None,'phone':None}。这两个字典都合并了,请看,重复项的值取自第一个字典,瞧
>>> strings = [['name', 'Joe Sixpack', 'email', 'beerme@thebrew.com'],
... ['name', 'Winnie Cooler', 'email', 'Winnie Cooler', 'phone',
... '555-555-5550']]
>>> subs = ['name', 'email', 'phone']
>>> [{**{k: None for k in subs}, **dict(zip(s[::2], s[1::2]))} for s in strings]
[{'name': 'Joe Sixpack', 'email': 'beerme@thebrew.com', 'phone': None}, {'name': 'Winnie Cooler', 'email': 'Winnie Cooler', 'phone': '555-555-5550'}]