Python 抓取日文网站的英文版
我正在尝试刮一个日文网站的英文版,问题是日文和英文版的链接是一样的,有没有办法告诉beautifulsoup刮英文版而不是日文版 我想刮的链接:Python 抓取日文网站的英文版,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试刮一个日文网站的英文版,问题是日文和英文版的链接是一样的,有没有办法告诉beautifulsoup刮英文版而不是日文版 我想刮的链接: 要演示添加lang=enurl查询参数实际有效: >>> import requests >>> from bs4 import BeautifulSoup >>> >>> url = "https://data.j-league.or.jp/SFMS02/?match_card_
要演示添加
lang=en
url查询参数实际有效:
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> english_url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975&lang=en"
>>>
>>> print(BeautifulSoup(requests.get(url).content, "html.parser").find(class_="team-name").get_text(strip=True))
サガン鳥栖
>>> print(BeautifulSoup(requests.get(english_url).content, "html.parser").find(class_="team-name").get_text(strip=True))
Sagan Tosu
请注意,您还可以使用en
值添加SFCM01LANG
cookie:
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> response = requests.get(url, cookies={'SFCM01LANG': 'en'})
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> print(soup.find(class_="team-name").get_text(strip=True))
Sagan Tosu
要演示添加
lang=en
url查询参数实际有效,请执行以下操作:
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> english_url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975&lang=en"
>>>
>>> print(BeautifulSoup(requests.get(url).content, "html.parser").find(class_="team-name").get_text(strip=True))
サガン鳥栖
>>> print(BeautifulSoup(requests.get(english_url).content, "html.parser").find(class_="team-name").get_text(strip=True))
Sagan Tosu
请注意,您还可以使用en
值添加SFCM01LANG
cookie:
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> response = requests.get(url, cookies={'SFCM01LANG': 'en'})
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> print(soup.find(class_="team-name").get_text(strip=True))
Sagan Tosu
如果你检查这个按钮,你会发现它实际上不是同一个站点,但是有一个额外的参数lang=en。试着传递它。如果你检查这个按钮,你会发现它实际上不是同一个站点,但是有一个额外的参数lang=en。试着传过去。