Python 在div类中查找单词并保存/打印输出_Python_Selenium_Selenium Webdriver_Web Scraping

Python 在div类中查找单词并保存/打印输出

python selenium selenium-webdriver web-scraping

Python 在div类中查找单词并保存/打印输出,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我想查找instagram用户的snapchat用户名。div类总是相同的，通过一个数组，我可以循环更多的用户。但对我来说，获取他们的snapchat字符串仍然是一个未解决的问题 for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users driver.get(Abonnenten[i]) #opens new tab of user everytime # get the text fr

我想查找instagram用户的snapchat用户名。div类总是相同的，通过一个数组，我可以循环更多的用户。但对我来说，获取他们的snapchat字符串仍然是一个未解决的问题

for i in range(len(Abonnenten)): #in my array 'Abonnenten' i got some urls of users
driver.get(Abonnenten[i]) #opens new tab of user everytime
# get the text from their instagram bio
try:
    wait = WebDriverWait(driver, 10) #let the browser catch up, if internet is slow
    #This is to get the text of the bio. I never got any names to print out ever
    bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
    #Snapnames = list.append(bio) #this was a try to get the names in a list so i can print all of names i found out
    #print(Snapnames)
    # check if text contains "snapchat" but i don´t think that works
    # also, can i add "snap" or "sc" to get a more results, because everyone puts their snap different in their bio
    if ("snapchat" in bio):

        # split the instagram bio by newline char to get line with snapchat name
        bio_lines = bio.split("\n")

        # parse over the instagram bio to find snapchat username, i dont get this line 
        for line in bio_lines:

            # if we find the line with username, strip out text to get the username
            if ("Snapchat:" in line):
                #snapchat_username = []
                snapchat_username = line.replace[("Snapchat:", "")]
                # you probably need to do something here to save to file
                print(snapchat_username)

    # case: the user does not have a bio, so just move on to the next one
except TimeoutException:
    continue

主要问题是，我无法从程序中获取任何snapchat名称。这里还有一个示例，用户如何在他们的个人简历中显示他们的snapchat名称，因为我想在个人简历中包含更多要搜索的单词。

我看到的将屏幕截图与代码进行比较的问题是，该Instagram用户将其Snapchat表示为

sc:

，而不是

Snapchat:

，这正是代码所寻找的。在提供的bio中找不到用户名，因为代码正在检查是否存在

snapchat

文本（

if（“bio中的snapchat”）：

）

您可能需要创建一个字符串列表，其中包含用户snapchat用户名的有效选项，例如

[“snapchat:”、“sc:”、“snap:”]

，然后对照该列表检查用户的bio，查看其bio是否包含其中一个字符串——如果包含，则可以解析出用户名

bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text

# print bio to ensure we actually found something
print(bio)

# this should be moved above your for loop so it's not getting initialized every iteration
snapchat_strings = [ "snapchat:", "sc:", "snap:" ]

# iterate keywords list and check if user's bio has a keyword
for keyword in snapchat_strings:

    # check if user's bio has a snapchat keyword
    if (keyword in bio):
        bio_lines = bio.split("\n")

        # iterate lines in their bio to find the line with snapchat username
        for line in bio_lines:
            if (keyword in line):
                # case: we found SC username, so save it then break out of this loop
                snapchat_username = line.replace(keyword, "")
                print(snapchat_username)

当我向您提供我的原始解决方案时，我确实提到编写解析器来正确查找Snapchat用户名是相当困难的，因为用户在其个人简历中选择了各种方式来表示他们的用户名。例如，如果用户只写下他们的Snapchat用户名，但不在其前面加上“sc”或“Snapchat”，那么代码将永远无法找到该用户名

另一个问题——bio行可能没有用

\n

字符分隔，这将导致行

bio\u lines=bio.split（“\n”）

无法正常工作。这可能需要对在

bio

中打印的字符串进行一些实验，以找到正确的分隔符

最后——您在bio_line:行中的

行上方的注释意味着您可能需要一些关于这是如何工作的解释，所以让我尝试将其分解
以下是某人instagram的个人简历示例：
Ex POINTER 
September 26
"tú, eres mis buenos dias"
snapchat:therock

注意这是如何被分成3行的。我假设这些行被\n
隐藏字符分隔开——这是一种表示新的换行符的空白形式。因此，我们运行bio\u lines=bio.split（“\n”）
，以获取其bio中的行列表——现在，在代码中，bio如下所示：
[ "Ex POINTER", "September 26", "\"tú, eres mis buenos dias\"", "snapchat:therock" ]

因此，我们使用for循环
查看此列表中的每个项目，并检查它是否包含snapchat用户名。这就是对于line in bio_line:
所做的事情——在循环内部，我们检查如果（“Snapchat:”in line）：
——当我们到达lineSnapchat:therock
时，这将返回true
，其思想是解析出Snapchat:
，因此我们只需看到控制台上打印的therock