Python 3.x 这些解析元素的函数将不会重复。美丽之群

Python 3.x 这些解析元素的函数将不会重复。美丽之群,python-3.x,parsing,beautifulsoup,Python 3.x,Parsing,Beautifulsoup,哪个函数(或etc)是理想的,这样这些昵称就不会在我的解析器上重复。我不知道怎么做。如果你能帮助我,我将非常感激 资料来源: from urllib.request import urlopen as uReq from urllib.request import Request from bs4 import BeautifulSoup as soup # save all the nicknames to 'CSV' file format filename = "BattlePassNi

哪个函数(或etc)是理想的,这样这些昵称就不会在我的解析器上重复。我不知道怎么做。如果你能帮助我,我将非常感激

资料来源:

from urllib.request import urlopen as uReq
from urllib.request import Request
from bs4 import BeautifulSoup as soup

# save all the nicknames to 'CSV' file format
filename = "BattlePassNicknames.csv"
f = open(filename, "a", encoding="utf-8")
headers1 = "Member of JAZE Battle Pass 2019\n"
b = 1
if b < 2:
    f.write(headers1)
    b += 1
# start page
i = 1
while True:
    # disable jaze guard. turn off html 'mod_security'
    link = 'https://jaze.ru/forum/topic?id=50&page='+str(i)
    my_url = Request(
        link,
        headers={'User-Agent': 'Mozilla/5.0'}
    )
    i += 1  # increment page no for next run
    uClient = uReq(my_url)
    if uClient.url != link:
        break
    page_html = uClient.read()
    # Check if there was a redirect
    uClient.close()
    # html parsing
    page_soup = soup(page_html, "html.parser")
    # grabs each name of player
    containers = page_soup.findAll("div", {"class": "top-area"})

    for container in containers:
        playerName = container.div.a.text.strip()
        print("BattlePass PlayerName: " + playerName)
        f.write(playerName + "\n")
从urllib.request导入urlopen作为uReq
从urllib.request导入请求
从bs4进口美汤作为汤
#将所有昵称保存为“CSV”文件格式
filename=“battlepass昵称.csv”
f=打开(文件名“a”,编码=“utf-8”)
headers1=“2019年JAZE战斗通行证成员\n”
b=1
如果b<2:
f、 写作(标题1)
b+=1
#起始页
i=1
尽管如此:
#禁用jaze守卫。关闭html“mod_security”
链接https://jaze.ru/forum/topic?id=50&page=“+str(i)
我的url=请求(
链接
headers={'User-Agent':'Mozilla/5.0'}
)
i+=1#增加下一次运行的页码
uClient=uReq(我的url)
如果uClient.url!=链接:
打破
page_html=uClient.read()
#检查是否有重定向
uClient.close()
#html解析
page_soup=soup(page_html,“html.parser”)
#抓住每个玩家的名字
containers=page_soup.findAll(“div”,“class”:“top area”})
对于集装箱中的集装箱:
playerName=container.div.a.text.strip()
打印(“战场通行证玩家名称:+玩家名称”)
f、 写入(playerName+“\n”)

您可以将所有名称添加到一个列表中

集合对象是不同的散列对象的无序集合。常见的用途包括成员资格测试、从序列中删除重复项以及计算数学运算,如交集、并集、差分和对称差分

在您的情况下,让我们将所有名称添加到一个集合中,然后在for循环之后写入文件

from urllib.request import urlopen as uReq
from urllib.request import Request
from bs4 import BeautifulSoup as soup

# save all the nicknames to 'CSV' file format
filename = "BattlePassNicknames.csv"
f = open(filename, "a", encoding="utf-8")
headers1 = "Member of JAZE Battle Pass 2019\n"
b = 1
if b < 2:
    f.write(headers1)
    b += 1
# start page
i = 1
names = set()
while True:
    # disable jaze guard. turn off html 'mod_security'
    link = 'https://jaze.ru/forum/topic?id=50&page='+str(i)
    my_url = Request(
        link,
        headers={'User-Agent': 'Mozilla/5.0'}
    )
    i += 1  # increment page no for next run
    uClient = uReq(my_url)
    if uClient.url != link:
        break
    page_html = uClient.read()
    # Check if there was a redirect
    uClient.close()
    # html parsing
    page_soup = soup(page_html, "html.parser")
    # grabs each name of player
    containers = page_soup.findAll("div", {"class": "top-area"})

    for container in containers:
        playerName = container.div.a.text.strip()
        names.add(playerName)

for name in names:
    f.write(name + "\n")
f.close()

您可以将所有名称添加到

集合对象是不同的散列对象的无序集合。常见的用途包括成员资格测试、从序列中删除重复项以及计算数学运算,如交集、并集、差分和对称差分

在您的情况下,让我们将所有名称添加到一个集合中,然后在for循环之后写入文件

from urllib.request import urlopen as uReq
from urllib.request import Request
from bs4 import BeautifulSoup as soup

# save all the nicknames to 'CSV' file format
filename = "BattlePassNicknames.csv"
f = open(filename, "a", encoding="utf-8")
headers1 = "Member of JAZE Battle Pass 2019\n"
b = 1
if b < 2:
    f.write(headers1)
    b += 1
# start page
i = 1
names = set()
while True:
    # disable jaze guard. turn off html 'mod_security'
    link = 'https://jaze.ru/forum/topic?id=50&page='+str(i)
    my_url = Request(
        link,
        headers={'User-Agent': 'Mozilla/5.0'}
    )
    i += 1  # increment page no for next run
    uClient = uReq(my_url)
    if uClient.url != link:
        break
    page_html = uClient.read()
    # Check if there was a redirect
    uClient.close()
    # html parsing
    page_soup = soup(page_html, "html.parser")
    # grabs each name of player
    containers = page_soup.findAll("div", {"class": "top-area"})

    for container in containers:
        playerName = container.div.a.text.strip()
        names.add(playerName)

for name in names:
    f.write(name + "\n")
f.close()

它很管用,谢谢!但是他把所有的名字都混在一起了,第一个的那个人,已经排到了44位。为什么以及如何修复?它可以工作,谢谢!但是他把所有的名字都混在一起了,第一个的那个人,已经排到了44位。为什么以及如何修复?
...
names = []
while True:
...
    for container in containers:
        playerName = container.div.a.text.strip()
        if playerName not in names:
            names.append(playerName)

for name in names:
    f.write(name + "\n")
f.close()