Python 在div类中查找单词并保存/打印输出

Python 在div类中查找单词并保存/打印输出,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我想查找instagram用户的snapchat用户名。div类总是相同的,通过一个数组,我可以循环更多的用户。但对我来说,获取他们的snapchat字符串仍然是一个未解决的问题 for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users driver.get(Abonnenten[i]) #opens new tab of user everytime # get the text fr

我想查找instagram用户的snapchat用户名。div类总是相同的,通过一个数组,我可以循环更多的用户。但对我来说,获取他们的snapchat字符串仍然是一个未解决的问题

for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users
driver.get(Abonnenten[i]) #opens new tab of user everytime
# get the text from their instagram bio
try:
    wait = WebDriverWait(driver, 10) #let the browser catch up, if internet is slow
    #This is to get the text of the bio. I never got any names to print out ever
    bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
    #Snapnames = list.append(bio) #this was a try to get the names in a list so i can print all of names i found out
    #print(Snapnames)
    # check if text contains "snapchat" but i don´t think that works
    # also, can i add "snap" or "sc" to get a more results, because everyone puts their snap different in their bio
    if ("snapchat" in bio):

        # split the instagram bio by newline char to get line with snapchat name
        bio_lines = bio.split("\n")

        # parse over the instagram bio to find snapchat username, i dont get this line 
        for line in bio_lines:

            # if we find the line with username, strip out text to get the username
            if ("Snapchat:" in line):
                #snapchat_username = []
                snapchat_username = line.replace[("Snapchat:", "")]
                # you probably need to do something here to save to file
                print(snapchat_username)

    # case: the user does not have a bio, so just move on to the next one
except TimeoutException:
    continue
主要问题是,我无法从程序中获取任何snapchat名称。 这里还有一个示例,用户如何在他们的个人简历中显示他们的snapchat名称,因为我想在个人简历中包含更多要搜索的单词。

我看到的将屏幕截图与代码进行比较的问题是,该Instagram用户将其Snapchat表示为
sc:
,而不是
Snapchat:
,这正是代码所寻找的。在提供的bio中找不到用户名,因为代码正在检查是否存在
snapchat
文本(
if(“bio中的snapchat”):

您可能需要创建一个字符串列表,其中包含用户snapchat用户名的有效选项,例如
[“snapchat:”、“sc:”、“snap:”]
,然后对照该列表检查用户的bio,查看其bio是否包含其中一个字符串——如果包含,则可以解析出用户名

bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text

# print bio to ensure we actually found something
print(bio)

# this should be moved above your for loop so it's not getting initialized every iteration
snapchat_strings = [ "snapchat:", "sc:", "snap:" ]

# iterate keywords list and check if user's bio has a keyword
for keyword in snapchat_strings:

    # check if user's bio has a snapchat keyword
    if (keyword in bio):
        bio_lines = bio.split("\n")

        # iterate lines in their bio to find the line with snapchat username
        for line in bio_lines:
            if (keyword in line):
                # case: we found SC username, so save it then break out of this loop
                snapchat_username = line.replace(keyword, "")
                print(snapchat_username)
当我向您提供我的原始解决方案时,我确实提到编写解析器来正确查找Snapchat用户名是相当困难的,因为用户在其个人简历中选择了各种方式来表示他们的用户名。例如,如果用户只写下他们的Snapchat用户名,但不在其前面加上“sc”或“Snapchat”,那么代码将永远无法找到该用户名

另一个问题——bio行可能没有用
\n
字符分隔,这将导致行
bio\u lines=bio.split(“\n”)
无法正常工作。这可能需要对在
bio
中打印的字符串进行一些实验,以找到正确的分隔符

最后——您在bio_line:行中的
行上方的注释意味着您可能需要一些关于这是如何工作的解释,所以让我尝试将其分解

以下是某人instagram的个人简历示例:

Ex POINTER 
September 26
"tú, eres mis buenos dias"
snapchat:therock
注意这是如何被分成3行的。我假设这些行被
\n
隐藏字符分隔开——这是一种表示新的换行符的空白形式。因此,我们运行
bio\u lines=bio.split(“\n”)
,以获取其bio中的行列表——现在,在代码中,bio如下所示:

[ "Ex POINTER", "September 26", "\"tú, eres mis buenos dias\"", "snapchat:therock" ]
因此,我们使用
for循环
查看此列表中的每个项目,并检查它是否包含snapchat用户名。这就是
对于line in bio_line:
所做的事情——在循环内部,我们检查
如果(“Snapchat:”in line):
——当我们到达line
Snapchat:therock
时,这将返回
true
,其思想是解析出
Snapchat:
,因此我们只需看到控制台上打印的
therock