Beautifulsoup python Howlongtobeat.com提取元素并导出到.csv_Python_Html_Pandas_Web Scraping_Beautifulsoup

Beautifulsoup python Howlongtobeat.com提取元素并导出到.csv

python html pandas web-scraping

Beautifulsoup python Howlongtobeat.com提取元素并导出到.csv,python,html,pandas,web-scraping,beautifulsoup,Python,Html,Pandas,Web Scraping,Beautifulsoup,这就是我到目前为止所做的： from requests import get url = 'https://howlongtobeat.com/game.php?id=38050' response = get(url) from bs4 import BeautifulSoup html_soup = BeautifulSoup(response.text, 'html.parser') game_name = html_soup.select('div.profile_head

这就是我到目前为止所做的：

from requests import get



url = 'https://howlongtobeat.com/game.php?id=38050'

response = get(url)

from bs4 import BeautifulSoup

html_soup = BeautifulSoup(response.text, 'html.parser')

game_name = html_soup.select('div.profile_header')[0].text
game_length = html_soup.select('div.game_times li div')[-1].text
game_developer = html_soup.find_all('strong', string='\nDeveloper:\n')[0].next_sibling
game_publisher = html_soup.find_all('strong', string='\nPublisher:\n')[0].next_sibling
game_console = html_soup.find_all('strong', string='\nPlayable On:\n')[0].next_sibling
game_genres = html_soup.find_all('strong', string='\nGenres:\n')[0].next_sibling

print(game_name)
print(game_length)
print(game_developer)
print(game_publisher)
print(game_console)
print(game_genres)

这将产生：

God of War (2018) 
31 Hours 

SIE Santa Monica Studio 

Sony Interactive Entertainment 

PlayStation 4 

Third-Person, Action, Adventure

计划用这些数据制作一个电子表格（一旦我弄清楚如何提取游戏名称、主+附加游戏长度、开发人员姓名、发行人、可玩时间和类型字段）

所以它会存储这些数据，我认为在我存储数据之前，它应该像这样打印数据：

God of War (2018) 
31 Hours 
SIE Santa Monica Studio
Sony Interactive Entertainment
PlayStation 4
Third-Person, Action, Adventure

任何帮助都将不胜感激

编辑---

我做了一些研究，我想我需要Pandas

如果我理解正确，您可以删除字符串上应用

strip（）

的尾随空格。之后，您可以创建一个csv文件，将数据存储为df:

f = open(path_where_to_save + 'info.csv', 'a')
f.write(str(game_name)+ ',' + str(game_length) + ',' + str(game_developer))
f.close()

请注意

open

中的

，它追加而不是覆盖第一行