Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 抓取日文网站的英文版_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 抓取日文网站的英文版

Python 抓取日文网站的英文版,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试刮一个日文网站的英文版,问题是日文和英文版的链接是一样的,有没有办法告诉beautifulsoup刮英文版而不是日文版 我想刮的链接: 要演示添加lang=enurl查询参数实际有效: >>> import requests >>> from bs4 import BeautifulSoup >>> >>> url = "https://data.j-league.or.jp/SFMS02/?match_card_

我正在尝试刮一个日文网站的英文版,问题是日文和英文版的链接是一样的,有没有办法告诉beautifulsoup刮英文版而不是日文版

我想刮的链接:


要演示添加
lang=en
url查询参数实际有效:

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> english_url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975&lang=en"
>>>
>>> print(BeautifulSoup(requests.get(url).content, "html.parser").find(class_="team-name").get_text(strip=True))
サガン鳥栖
>>> print(BeautifulSoup(requests.get(english_url).content, "html.parser").find(class_="team-name").get_text(strip=True))
Sagan Tosu
请注意,您还可以使用
en
值添加
SFCM01LANG
cookie:

>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> response = requests.get(url, cookies={'SFCM01LANG': 'en'})
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> print(soup.find(class_="team-name").get_text(strip=True)) 
Sagan Tosu

要演示添加
lang=en
url查询参数实际有效,请执行以下操作:

>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> english_url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975&lang=en"
>>>
>>> print(BeautifulSoup(requests.get(url).content, "html.parser").find(class_="team-name").get_text(strip=True))
サガン鳥栖
>>> print(BeautifulSoup(requests.get(english_url).content, "html.parser").find(class_="team-name").get_text(strip=True))
Sagan Tosu
请注意,您还可以使用
en
值添加
SFCM01LANG
cookie:

>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> response = requests.get(url, cookies={'SFCM01LANG': 'en'})
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> print(soup.find(class_="team-name").get_text(strip=True)) 
Sagan Tosu

如果你检查这个按钮,你会发现它实际上不是同一个站点,但是有一个额外的参数lang=en。试着传递它。如果你检查这个按钮,你会发现它实际上不是同一个站点,但是有一个额外的参数lang=en。试着传过去。