Python 基于关键字搜索列表以附加特定列表内容
上下文 我有一个从这个网站上抓取的链接列表: 此链接列表如下所示Python 基于关键字搜索列表以附加特定列表内容,python,list,Python,List,上下文 我有一个从这个网站上抓取的链接列表: 此链接列表如下所示 ['https://twitter.com/ONS', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx', 'https://www.facebook.
['https://twitter.com/ONS',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
'https://www.facebook.com/ONS',
'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx'...
现在我想用氦/硒来打印出来。只有链接列表包含我不需要的链接和我需要下载的excel文档。我希望能够附加只包含xlsx的链接
我尝试了这个解决方案,但没有成功。我还尝试了.remove
功能,但这更耗时。我还试图通过切片来整理链接列表,但这同样很耗时
问题
是否有更简单的方法在链接列表中找到字符串允许我附加到列表并通过selenium循环它们(我可以通过selenium执行后者,只需要附加帮助)。使用列表压缩
linklist = ['https://twitter.com/ONS',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
'https://www.facebook.com/ONS',
'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']
relevant_links = [link for link in linklist if ".xlsx" in link]
将输出
['https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']
使用列表压缩
linklist = ['https://twitter.com/ONS',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
'https://www.facebook.com/ONS',
'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']
relevant_links = [link for link in linklist if ".xlsx" in link]
将输出
['https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']
检查字符串终止:
new_list = [link for link in original_list if link.endswith(".xlsx")]
然后您可以打开
新建列表中的每个链接
检查字符串终止:
new_list = [link for link in original_list if link.endswith(".xlsx")]
然后,您可以打开
新列表中的每个链接
谢谢,但已通过以下答案解决:)谢谢,但已通过以下答案解决:)