Python for循环用于保存包含特定值的键和值

Python for循环用于保存包含特定值的键和值,python,list,loops,dictionary,jupyter-lab,Python,List,Loops,Dictionary,Jupyter Lab,假设我有一个python列表和字典结构,如下所示: [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}, {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}, {'href': 'https://www.simplyrecipes.com/recipes/type/condiment

假设我有一个python列表和字典结构,如下所示:

[ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
  {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
  {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
  {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
我正在努力找到最有效的方法来

(i) 仅循环遍历=
'href'
的键,并且仅循环遍历
'href'
值包含'
'的键https://www.simplyrecipes.com/recipes/“
并识别包含
'recipes/coineering'
'recipes/seasure'
'recipes/component'

(ii)将每个完整url值保存到单独的列表中(取决于它们满足的
'recipe/…'
条件),并命名为适当的

预期结果:

cuisine = ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
season = ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
type = ['https://www.simplyrecipes.com/recipes/type/condiment/']
ingredient = ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

跳过任何不符合上述条件的键和/或值。


如果有任何提示,我们将不胜感激。

假设URL遵循所附问题中的相同格式。更好的方法是创建一个不同食谱的目录

In [50]: from collections import defaultdict

In [51]: sep_data = defaultdict(list)

In [52]: lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

In [59]: for i in lst: sep_data[i["href"].split("/")[-3]].append(i["href"])

In [60]: sep_data
Out[60]:
defaultdict(list,
            {'cuisine': ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/'],
             'season': ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'],
             'type': ['https://www.simplyrecipes.com/recipes/type/condiment/'],
             'ingredient': ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']})

假设URL与所附问题中的格式相同。更好的方法是创建一个不同食谱的目录

In [50]: from collections import defaultdict

In [51]: sep_data = defaultdict(list)

In [52]: lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

In [59]: for i in lst: sep_data[i["href"].split("/")[-3]].append(i["href"])

In [60]: sep_data
Out[60]:
defaultdict(list,
            {'cuisine': ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/'],
             'season': ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'],
             'type': ['https://www.simplyrecipes.com/recipes/type/condiment/'],
             'ingredient': ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']})

这里有一个简单的例子,希望对您有所帮助

import re

trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
          {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
          {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
          {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

for x in trash:
    for y in x.values():
        txt = ''
        for i in re.findall("recipes/.*", y):
            txt += i
            title = txt.split('/')[1]
            print({title: y})

输出

{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}

这里有一个简单的例子,希望对您有所帮助

import re

trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
          {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
          {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
          {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

for x in trash:
    for y in x.values():
        txt = ''
        for i in re.findall("recipes/.*", y):
            txt += i
            title = txt.split('/')[1]
            print({title: y})

输出

{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}
所以大致上

从itertools导入groupby
进口稀土
lst=[{'href':'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
{'href':'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
{'href':'https://www.simplyrecipes.com/recipes/type/condiment/'},
{'href':'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
定义f(i):
x=re.findall(“https://www.simplyrecipes.com/recipes/([^/]+)/(?:[^/]+/?)+”,i[“href”])
返回x和x[0]或无
r=过滤器(lambda i:i[0]在(‘烹饪’、‘季节’、‘配料’)中,分组比(lst,f))
对于r中的i:
打印(f“{i[0]}={list(map(lambda j:j['href'],i[1]))}”)
#结果:
#菜肴=['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
#季节=['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
#成分=['https://www.simplyrecipes.com/recipes/ingredient/adobado/']
所以大致上

从itertools导入groupby
进口稀土
lst=[{'href':'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
{'href':'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
{'href':'https://www.simplyrecipes.com/recipes/type/condiment/'},
{'href':'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
定义f(i):
x=re.findall(“https://www.simplyrecipes.com/recipes/([^/]+)/(?:[^/]+/?)+”,i[“href”])
返回x和x[0]或无
r=过滤器(lambda i:i[0]在(‘烹饪’、‘季节’、‘配料’)中,分组比(lst,f))
对于r中的i:
打印(f“{i[0]}={list(map(lambda j:j['href'],i[1]))}”)
#结果:
#菜肴=['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
#季节=['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
#成分=['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

要求的第(i)部分不包含“调味品”,但预期结果包含。
/type/consumment/
是否包含在过滤范围内?您好,聪明-感谢您在我的问题中指出此缺陷。我应该说得更清楚,调味品不包括在内。我还应该提到,我的结构也有键和值,这些键和值与我上面的示例不同。在这种情况下,我希望将这些排除在外。到目前为止,您尝试了什么?我投票结束这个问题,因为它显示不费力。要求的第(I)部分不包含“调味品”,但预期结果包含。
/type/consumment/
是否包含在过滤范围内?您好,聪明-感谢您在我的问题中指出此缺陷。我应该说得更清楚,调味品不包括在内。我还应该提到,我的结构也有键和值,这些键和值与我上面的示例不同。在这种情况下,我希望将这些排除在外。到目前为止,您尝试了什么?我投票结束这个问题,因为它显示出毫不费力。谢谢您的建议。我不认为这有什么关系,但显然是个错误。密钥不会总是被命名为“href”;我想忽略那些不是“href”的。我该怎么做?谢谢你的建议。我不认为这有什么关系,但显然是个错误。密钥不会总是被命名为“href”;我想忽略那些不是“href”的。我将如何处理这个问题?感谢您的巨大贡献。不幸的是,当我使用垃圾目录运行代码时,它工作得非常好。然而,当我使用我的结构运行时,我得到了一个“TypeError:预期的字符串或类似字节的对象”。这大概是因为我的结构包含键,而值与代码不兼容。无论如何,我都可以改进这段代码,这样,如果键不等于'href',该键和值将被跳过?感谢您的贡献。不幸的是,当我使用垃圾目录运行代码时,它工作得非常好。然而,当我使用我的结构运行时,我得到了一个“TypeError:预期的字符串或类似字节的对象”。这大概是因为我的结构包含键,而值与代码不兼容。无论如何,我是否可以改进此代码,以便在密钥不等于“href”时跳过该密钥和值?谢谢您的建议。不幸的是,我得到了一个TypeError:f()接受0个位置参数,但1被指定为“错误”。谢谢您的建议。不幸的是,我得到了一个TypeError:f()接受0个位置参数,但1被指定为“error”。