Python 在div类中查找单词并保存/打印输出
我想查找instagram用户的snapchat用户名。div类总是相同的,通过一个数组,我可以循环更多的用户。但对我来说,获取他们的snapchat字符串仍然是一个未解决的问题Python 在div类中查找单词并保存/打印输出,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我想查找instagram用户的snapchat用户名。div类总是相同的,通过一个数组,我可以循环更多的用户。但对我来说,获取他们的snapchat字符串仍然是一个未解决的问题 for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users driver.get(Abonnenten[i]) #opens new tab of user everytime # get the text fr
for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users
driver.get(Abonnenten[i]) #opens new tab of user everytime
# get the text from their instagram bio
try:
wait = WebDriverWait(driver, 10) #let the browser catch up, if internet is slow
#This is to get the text of the bio. I never got any names to print out ever
bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
#Snapnames = list.append(bio) #this was a try to get the names in a list so i can print all of names i found out
#print(Snapnames)
# check if text contains "snapchat" but i don´t think that works
# also, can i add "snap" or "sc" to get a more results, because everyone puts their snap different in their bio
if ("snapchat" in bio):
# split the instagram bio by newline char to get line with snapchat name
bio_lines = bio.split("\n")
# parse over the instagram bio to find snapchat username, i dont get this line
for line in bio_lines:
# if we find the line with username, strip out text to get the username
if ("Snapchat:" in line):
#snapchat_username = []
snapchat_username = line.replace[("Snapchat:", "")]
# you probably need to do something here to save to file
print(snapchat_username)
# case: the user does not have a bio, so just move on to the next one
except TimeoutException:
continue
主要问题是,我无法从程序中获取任何snapchat名称。
这里还有一个示例,用户如何在他们的个人简历中显示他们的snapchat名称,因为我想在个人简历中包含更多要搜索的单词。
我看到的将屏幕截图与代码进行比较的问题是,该Instagram用户将其Snapchat表示为
sc:
,而不是Snapchat:
,这正是代码所寻找的。在提供的bio中找不到用户名,因为代码正在检查是否存在snapchat
文本(if(“bio中的snapchat”):
)
您可能需要创建一个字符串列表,其中包含用户snapchat用户名的有效选项,例如[“snapchat:”、“sc:”、“snap:”]
,然后对照该列表检查用户的bio,查看其bio是否包含其中一个字符串——如果包含,则可以解析出用户名
bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
# print bio to ensure we actually found something
print(bio)
# this should be moved above your for loop so it's not getting initialized every iteration
snapchat_strings = [ "snapchat:", "sc:", "snap:" ]
# iterate keywords list and check if user's bio has a keyword
for keyword in snapchat_strings:
# check if user's bio has a snapchat keyword
if (keyword in bio):
bio_lines = bio.split("\n")
# iterate lines in their bio to find the line with snapchat username
for line in bio_lines:
if (keyword in line):
# case: we found SC username, so save it then break out of this loop
snapchat_username = line.replace(keyword, "")
print(snapchat_username)
当我向您提供我的原始解决方案时,我确实提到编写解析器来正确查找Snapchat用户名是相当困难的,因为用户在其个人简历中选择了各种方式来表示他们的用户名。例如,如果用户只写下他们的Snapchat用户名,但不在其前面加上“sc”或“Snapchat”,那么代码将永远无法找到该用户名
另一个问题——bio行可能没有用\n
字符分隔,这将导致行bio\u lines=bio.split(“\n”)
无法正常工作。这可能需要对在bio
中打印的字符串进行一些实验,以找到正确的分隔符
最后——您在bio_line:行中的行上方的注释意味着您可能需要一些关于这是如何工作的解释,所以让我尝试将其分解
以下是某人instagram的个人简历示例:
Ex POINTER
September 26
"tú, eres mis buenos dias"
snapchat:therock
注意这是如何被分成3行的。我假设这些行被\n
隐藏字符分隔开——这是一种表示新的换行符的空白形式。因此,我们运行bio\u lines=bio.split(“\n”)
,以获取其bio中的行列表——现在,在代码中,bio如下所示:
[ "Ex POINTER", "September 26", "\"tú, eres mis buenos dias\"", "snapchat:therock" ]
因此,我们使用for循环
查看此列表中的每个项目,并检查它是否包含snapchat用户名。这就是对于line in bio_line:
所做的事情——在循环内部,我们检查如果(“Snapchat:”in line):
——当我们到达lineSnapchat:therock
时,这将返回true
,其思想是解析出Snapchat:
,因此我们只需看到控制台上打印的therock