Python 检查链接中的短语是否以数字开头，然后是以数字开头的字母_Python_Selenium_Beautifulsoup

Python 检查链接中的短语是否以数字开头，然后是以数字开头的字母

python selenium

Python 检查链接中的短语是否以数字开头，然后是以数字开头的字母,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我想检测网页是否有一些链接以一个数字开头，然后是这样的两个词： 23理由…=>状态为“ok” 5张图片到…=>状态为“ok” 10张照片…=>状态为“ok” 这10张图片将=>状态为无要从列表（关键字）中检查特定单词（关键字），请执行以下操作：假设您的clean_链接如下所示： clean_links = ['23 reasons to .. ', '5 pictures to .. ', '10 photos that .. ', 'This 10 pictures will ..']

我想检测网页是否有一些链接以一个数字开头，然后是这样的两个词：

23理由…=>状态为“ok”

5张图片到…=>状态为“ok”

10张照片…=>状态为“ok”

这10张图片将=>状态为无

要从列表（关键字）中检查特定单词（关键字），请执行以下操作：

假设您的
clean_链接
如下所示：

clean_links = ['23 reasons to .. ', '5 pictures to .. ', '10 photos that .. ', 'This 10 pictures will ..']
现在我们需要拆分列表中的每个元素，然后检查第一个元素是否为数字，其余2个元素是否为单词。字符串具有
.isdigit（）
和
.isalpha（）
方法，这些方法将帮助您：

status_list = [] for link in clean_links: # Getting only first 3 words separated by space if len(link.split()) > 2: first_three_words = link.split()[:3] # -> ['23', 'reasons', 'to'] ... if first_three_words[0].isdigit() and first_three_words[1].isalpha() and first_three_words[2].isalpha(): status_list.append("ok") # status = "ok" else: status_list.append(None) else: status_list.append(None)

status\u list
之后将如下所示：

print(status_list) # -> ['ok', 'ok', 'ok', None]

假设您的
clean_链接
如下所示：

clean_links = ['23 reasons to .. ', '5 pictures to .. ', '10 photos that .. ', 'This 10 pictures will ..']
现在我们需要拆分列表中的每个元素，然后检查第一个元素是否为数字，其余2个元素是否为单词。字符串具有
.isdigit（）
和
.isalpha（）
方法，这些方法将帮助您：

status_list = [] for link in clean_links: # Getting only first 3 words separated by space if len(link.split()) > 2: first_three_words = link.split()[:3] # -> ['23', 'reasons', 'to'] ... if first_three_words[0].isdigit() and first_three_words[1].isalpha() and first_three_words[2].isalpha(): status_list.append("ok") # status = "ok" else: status_list.append(None) else: status_list.append(None)

status\u list
之后将如下所示：

print(status_list) # -> ['ok', 'ok', 'ok', None]
使用
.isdigit（）
使用
.isdigit（）

使用此正则表达式匹配该模式：
/^[0-9]+\W（\W+\W）{2}/gm
使用此正则表达式匹配该模式：
/^[0-9]+\W（\W+\W）{2}/gm
您可以使用正则表达式来解决此问题：

import re reg_pattern = re.compile('^[0-9]+ [A-Z|a-z]+ [A-Z|a-z]+.*') clean_links = ["23 reasons to", "5 pictures to", "10 photos that", "This 10 pictures will", "2 3 n"] for link in clean_links: if reg_pattern.findall(link): print("ok: " + link)
输出：

ok: 23 reasons to ok: 5 pictures to ok: 10 photos that

根据需要修改模式以匹配文字中的数字（如果需要）。
您可以使用正则表达式解决此问题：

import re reg_pattern = re.compile('^[0-9]+ [A-Z|a-z]+ [A-Z|a-z]+.*') clean_links = ["23 reasons to", "5 pictures to", "10 photos that", "This 10 pictures will", "2 3 n"] for link in clean_links: if reg_pattern.findall(link): print("ok: " + link)
输出：

ok: 23 reasons to ok: 5 pictures to ok: 10 photos that

根据需要修改模式，以匹配文字中的数字（如果需要）。
对于一些干净的链接，我有一个“索引超出范围”错误。我搞不懂why@Mathieu可能有些
clean_链接包含的单词少于三个，当您尝试获取索引2时，会引发该错误。您希望如何处理少于3个单词的链接？一个数字和一个单词就够了还是什么？好的。如果链接少于3个单词，那么状态仍然是“否”@Mathieu edited answer，检查这是否解决了一些clean_链接的问题，我有一个“索引超出范围”错误。我搞不懂why@Mathieu可能有些clean_链接包含的单词少于三个，当您尝试获取索引2时，会引发该错误。您希望如何处理少于3个单词的链接？一个数字和一个单词就够了还是什么？好的。如果链接少于3个单词，那么状态仍然是“否”@Mathieu编辑的答案，检查这是否解决了问题