Python for循环用于保存包含特定值的键和值_Python_List_Loops_Dictionary_Jupyter Lab

Python for循环用于保存包含特定值的键和值

python list loops dictionary

Python for循环用于保存包含特定值的键和值,python,list,loops,dictionary,jupyter-lab,Python,List,Loops,Dictionary,Jupyter Lab,假设我有一个python列表和字典结构，如下所示： [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}, {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}, {'href': 'https://www.simplyrecipes.com/recipes/type/condiment

假设我有一个python列表和字典结构，如下所示：

[ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
  {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
  {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
  {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

我正在努力找到最有效的方法来

（i）仅循环遍历=

'href'

的键，并且仅循环遍历

'href'

值包含'

'的键https://www.simplyrecipes.com/recipes/“

并识别包含

'recipes/coineering'

、

'recipes/seasure'

和

'recipes/component'

（ii）将每个完整url值保存到单独的列表中（取决于它们满足的

'recipe/…'

条件），并命名为适当的

预期结果：

cuisine = ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
season = ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
type = ['https://www.simplyrecipes.com/recipes/type/condiment/']
ingredient = ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

跳过任何不符合上述条件的键和/或值。

如果有任何提示，我们将不胜感激。

假设URL遵循所附问题中的相同格式。更好的方法是创建一个不同食谱的目录

In [50]: from collections import defaultdict

In [51]: sep_data = defaultdict(list)

In [52]: lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

In [59]: for i in lst: sep_data[i["href"].split("/")[-3]].append(i["href"])

In [60]: sep_data
Out[60]:
defaultdict(list,
            {'cuisine': ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/'],
             'season': ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'],
             'type': ['https://www.simplyrecipes.com/recipes/type/condiment/'],
             'ingredient': ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']})

假设URL与所附问题中的格式相同。更好的方法是创建一个不同食谱的目录

In [50]: from collections import defaultdict

In [51]: sep_data = defaultdict(list)

In [52]: lst = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
    ...:   {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

In [59]: for i in lst: sep_data[i["href"].split("/")[-3]].append(i["href"])

In [60]: sep_data
Out[60]:
defaultdict(list,
            {'cuisine': ['https://www.simplyrecipes.com/recipes/cuisine/portuguese/'],
             'season': ['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'],
             'type': ['https://www.simplyrecipes.com/recipes/type/condiment/'],
             'ingredient': ['https://www.simplyrecipes.com/recipes/ingredient/adobado/']})

这里有一个简单的例子，希望对您有所帮助

import re

trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
          {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
          {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
          {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

for x in trash:
    for y in x.values():
        txt = ''
        for i in re.findall("recipes/.*", y):
            txt += i
            title = txt.split('/')[1]
            print({title: y})

输出

{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}

这里有一个简单的例子，希望对您有所帮助

import re

trash = [ {'href': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
          {'href': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
          {'href': 'https://www.simplyrecipes.com/recipes/type/condiment/'},
          {'href': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]

for x in trash:
    for y in x.values():
        txt = ''
        for i in re.findall("recipes/.*", y):
            txt += i
            title = txt.split('/')[1]
            print({title: y})

输出

{'cuisine': 'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'}
{'season': 'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'}
{'type': 'https://www.simplyrecipes.com/recipes/type/condiment/'}
{'ingredient': 'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}

所以大致上

从itertools导入groupby
进口稀土
lst=[{'href'：'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
{'href'：'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
{'href'：'https://www.simplyrecipes.com/recipes/type/condiment/'},
{'href'：'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
定义f（i）：
x=re.findall（“https://www.simplyrecipes.com/recipes/（[^/]+）/（？：[^/]+/？）+”，i[“href”]）
返回x和x[0]或无
r=过滤器（lambda i:i[0]在（‘烹饪’、‘季节’、‘配料’）中，分组比（lst，f））
对于r中的i：
打印（f“{i[0]}={list（map（lambda j:j['href']，i[1]））}”）
#结果:
#菜肴=['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
#季节=['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
#成分=['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

所以大致上

从itertools导入groupby
进口稀土
lst=[{'href'：'https://www.simplyrecipes.com/recipes/cuisine/portuguese/'},
{'href'：'https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/'},
{'href'：'https://www.simplyrecipes.com/recipes/type/condiment/'},
{'href'：'https://www.simplyrecipes.com/recipes/ingredient/adobado/'}]
定义f（i）：
x=re.findall（“https://www.simplyrecipes.com/recipes/（[^/]+）/（？：[^/]+/？）+”，i[“href”]）
返回x和x[0]或无
r=过滤器（lambda i:i[0]在（‘烹饪’、‘季节’、‘配料’）中，分组比（lst，f））
对于r中的i：
打印（f“{i[0]}={list（map（lambda j:j['href']，i[1]））}”）
#结果:
#菜肴=['https://www.simplyrecipes.com/recipes/cuisine/portuguese/']
#季节=['https://www.simplyrecipes.com/recipes/season/seasonal_favorites_spring/']
#成分=['https://www.simplyrecipes.com/recipes/ingredient/adobado/']

要求的第（i）部分不包含“调味品”，但预期结果包含。

/type/consumment/

是否包含在过滤范围内？您好，聪明-感谢您在我的问题中指出此缺陷。我应该说得更清楚，调味品不包括在内。我还应该提到，我的结构也有键和值，这些键和值与我上面的示例不同。在这种情况下，我希望将这些排除在外。到目前为止，您尝试了什么？我投票结束这个问题，因为它显示不费力。要求的第（I）部分不包含“调味品”，但预期结果包含。

/type/consumment/

是否包含在过滤范围内？您好，聪明-感谢您在我的问题中指出此缺陷。我应该说得更清楚，调味品不包括在内。我还应该提到，我的结构也有键和值，这些键和值与我上面的示例不同。在这种情况下，我希望将这些排除在外。到目前为止，您尝试了什么？我投票结束这个问题，因为它显示出毫不费力。谢谢您的建议。我不认为这有什么关系，但显然是个错误。密钥不会总是被命名为“href”；我想忽略那些不是“href”的。我该怎么做？谢谢你的建议。我不认为这有什么关系，但显然是个错误。密钥不会总是被命名为“href”；我想忽略那些不是“href”的。我将如何处理这个问题？感谢您的巨大贡献。不幸的是，当我使用垃圾目录运行代码时，它工作得非常好。然而，当我使用我的结构运行时，我得到了一个“TypeError:预期的字符串或类似字节的对象”。这大概是因为我的结构包含键，而值与代码不兼容。无论如何，我都可以改进这段代码，这样，如果键不等于'href'，该键和值将被跳过？感谢您的贡献。不幸的是，当我使用垃圾目录运行代码时，它工作得非常好。然而，当我使用我的结构运行时，我得到了一个“TypeError:预期的字符串或类似字节的对象”。这大概是因为我的结构包含键，而值与代码不兼容。无论如何，我是否可以改进此代码，以便在密钥不等于“href”时跳过该密钥和值？谢谢您的建议。不幸的是，我得到了一个TypeError:f（）接受0个位置参数，但1被指定为“错误”。谢谢您的建议。不幸的是，我得到了一个TypeError:f（）接受0个位置参数，但1被指定为“error”。