Python 网页抓取问题,试图将信息获取到csv和图表中
这是我的密码。它给了我非常完整的信息。我正在整理我最喜欢的10家航天科技公司的股票价格。我想得到10个小时内的股票价格,或者我可以只运行代码10次。我不能使用api。这是一个学校项目。然后,我想用matplotlib将所有数据合并成十个一个大图表,显示这些股票价格。或者每只股票十张图表。我想用这种类型的图表 任何建议都会很棒。这是我目前的代码:Python 网页抓取问题,试图将信息获取到csv和图表中,python,web-scraping,jupyter-notebook,Python,Web Scraping,Jupyter Notebook,这是我的密码。它给了我非常完整的信息。我正在整理我最喜欢的10家航天科技公司的股票价格。我想得到10个小时内的股票价格,或者我可以只运行代码10次。我不能使用api。这是一个学校项目。然后,我想用matplotlib将所有数据合并成十个一个大图表,显示这些股票价格。或者每只股票十张图表。我想用这种类型的图表 任何建议都会很棒。这是我目前的代码: #import libraries import pandas as pd #scraping my top ten favorite space
#import libraries
import pandas as pd
#scraping my top ten favorite space companies, attempted to pick compaines with pure play interest in space
urls = ['https://finance.yahoo.com/quote/GILT/', 'https://finance.yahoo.com/quote/LORL?p=LORL&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/I?p=I&.tsrc=fin-srch' , 'https://finance.yahoo.com/quote/VSAT?p=VSAT&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/RTN?p=RTN&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/UTX?ltr=1', 'https://finance.yahoo.com/quote/TDY?ltr=1', 'https://finance.yahoo.com/quote/ORBC?ltr=1', 'https://finance.yahoo.com/quote/SPCE?p=SPCE&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/BA?p=BA&.tsrc=fin-srch',]
def parsePrice(r):
df = pd.read_html(r)[0].T
cols = list(df.iloc[0,:])
temp_df = pd.DataFrame([list(df.iloc[1,:])], columns=cols)
temp_df['url'] = r
return temp_df
df = pd.DataFrame()
for r in urls:
df = df.append(parsePrice(r), sort=True).reset_index(drop=True)
df.to_csv('C:/Users/n_gor/Desktop/webscape/Nicholas Final Projects/spacestocklisting.csv', index=False)
print (df.to_string())
CSV文件输出:
52 Week Range Ask Avg. Volume Bid Day's Range Open Previous Close Volume url
0 7.32 - 9.87 8.09 x 800 23415 8.06 x 800 8.01 - 8.11 8.10 8.01 6337 https://finance.yahoo.com/quote/GILT/
1 32.14 - 42.77 32.74 x 1100 41759 32.59 x 1000 32.28 - 32.75 32.32 32.28 14685 https://finance.yahoo.com/quote/LORL?p=LORL&.t...
2 5.55 - 27.29 6.64 x 800 5746553 6.63 x 2900 6.51 - 6.68 6.64 6.65 995245 https://finance.yahoo.com/quote/I?p=I&.tsrc=fi...
3 55.93 - 97.31 72.21 x 800 281600 72.16 x 1000 71.51 - 72.80 72.26 72.32 74758 https://finance.yahoo.com/quote/VSAT?p=VSAT&.t...
4 144.27 - 220.03 215.54 x 1000 1560562 215.37 x 800 214.87 - 217.45 215.85 214.86 203957 https://finance.yahoo.com/quote/RTN?p=RTN&.tsr...
5 100.48 - 149.81 145.03 x 800 2749725 144.96 x 800 144.41 - 145.56 145.49 144.52 489169 https://finance.yahoo.com/quote/UTX?ltr=1
6 189.35 - 351.53 343.34 x 800 280325 342.80 x 800 342.84 - 346.29 344.16 343.58 42326 https://finance.yahoo.com/quote/TDY?ltr=1
7 3.5800 - 9.7900 4.1400 x 1300 778343 4.1300 x 800 4.1200 - 4.2000 4.1700 4.1500 62335 https://finance.yahoo.com/quote/ORBC?ltr=1
8 6.90 - 12.09 7.37 x 900 2280333 7.38 x 800 7.24 - 7.48 7.30 7.22 539082 https://finance.yahoo.com/quote/SPCE?p=SPCE&.t...
9 292.47 - 446.01 348.73 x 800 4420225 348.79 x 800 345.70 - 350.42 350.22 348.84 1258813 https://finance.yahoo.com/quote/BA?p=BA&.tsrc=...
我可以把股票的名字加进去吗?关于如何完成这个项目有什么建议吗?我有点迷路了。你可以用
如果列表中有所有股票名称
stock_names = ['GILT', 'LORL', 'I', 'VSAT', 'RTN', 'UTX', 'TDY', 'ORBC', 'SPCE', 'BA']
# insert to the begining(column at index 0) of the dataFrame
df.insert(0, "column_heading", stock_names)
或者,您可以使用正则表达式从URL获取所有股票名称,并将其添加到df中
import re
stock_names= [re.findall('[A-Z]+',x)[0] for x in urls]
# insert to the begining(column at index 0) of the dataFrame
df.insert(0, "column_heading", stock_names)
你可以用
如果列表中有所有股票名称
stock_names = ['GILT', 'LORL', 'I', 'VSAT', 'RTN', 'UTX', 'TDY', 'ORBC', 'SPCE', 'BA']
# insert to the begining(column at index 0) of the dataFrame
df.insert(0, "column_heading", stock_names)
或者,您可以使用正则表达式从URL获取所有股票名称,并将其添加到df中
import re
stock_names= [re.findall('[A-Z]+',x)[0] for x in urls]
# insert to the begining(column at index 0) of the dataFrame
df.insert(0, "column_heading", stock_names)
只需解析标题标题:
#import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
#scraping my top ten favorite space companies, attempted to pick compaines with pure play interest in space
urls = ['https://finance.yahoo.com/quote/GILT/', 'https://finance.yahoo.com/quote/LORL?p=LORL&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/I?p=I&.tsrc=fin-srch' , 'https://finance.yahoo.com/quote/VSAT?p=VSAT&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/RTN?p=RTN&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/UTX?ltr=1', 'https://finance.yahoo.com/quote/TDY?ltr=1', 'https://finance.yahoo.com/quote/ORBC?ltr=1', 'https://finance.yahoo.com/quote/SPCE?p=SPCE&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/BA?p=BA&.tsrc=fin-srch',]
def parsePrice(r):
response = requests.get(r)
soup = BeautifulSoup(response.text, 'html.parser')
titleHeader = soup.find('div', {'id':'quote-header-info'})
title = titleHeader.find('h1').text
comp = title.split('-')[-1].strip()
abr = title.split('-')[0].strip()
print (title)
df = pd.read_html(response.text)[0].T
cols = list(df.iloc[0,:])
temp_df = pd.DataFrame([list(df.iloc[1,:])], columns=cols)
temp_df['url'] = r
temp_df['company name'] = comp
temp_df['stock name'] = abr
return temp_df
df = pd.DataFrame()
for r in urls:
df = df.append(parsePrice(r), sort=True).reset_index(drop=True)
df.to_csv('C:/Users/n_gor/Desktop/webscape/Nicholas Final Projects/spacestocklisting.csv', index=False)
print (df.to_string())
只需解析标题标题:
#import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
#scraping my top ten favorite space companies, attempted to pick compaines with pure play interest in space
urls = ['https://finance.yahoo.com/quote/GILT/', 'https://finance.yahoo.com/quote/LORL?p=LORL&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/I?p=I&.tsrc=fin-srch' , 'https://finance.yahoo.com/quote/VSAT?p=VSAT&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/RTN?p=RTN&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/UTX?ltr=1', 'https://finance.yahoo.com/quote/TDY?ltr=1', 'https://finance.yahoo.com/quote/ORBC?ltr=1', 'https://finance.yahoo.com/quote/SPCE?p=SPCE&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/BA?p=BA&.tsrc=fin-srch',]
def parsePrice(r):
response = requests.get(r)
soup = BeautifulSoup(response.text, 'html.parser')
titleHeader = soup.find('div', {'id':'quote-header-info'})
title = titleHeader.find('h1').text
comp = title.split('-')[-1].strip()
abr = title.split('-')[0].strip()
print (title)
df = pd.read_html(response.text)[0].T
cols = list(df.iloc[0,:])
temp_df = pd.DataFrame([list(df.iloc[1,:])], columns=cols)
temp_df['url'] = r
temp_df['company name'] = comp
temp_df['stock name'] = abr
return temp_df
df = pd.DataFrame()
for r in urls:
df = df.append(parsePrice(r), sort=True).reset_index(drop=True)
df.to_csv('C:/Users/n_gor/Desktop/webscape/Nicholas Final Projects/spacestocklisting.csv', index=False)
print (df.to_string())
我想得到10个小时的股票价格,或者我可能只需要运行代码10次。我不能使用api。这是一个学校项目。然后,我想用matplotlib将所有数据合并成十个一个大图表,显示这些股票价格。或者每只股票有十张图表。雅虎只提供每日股票数据,你确定你想要每小时的数据,还是过去10个交易日的每日数据?好的,这个项目只需要在周一之前交上来,这样我就可以在接下来的3天内每天做一次。我更愿意在周日晚上把它交上来,我想得到10个小时的股票价格,或者我可以只运行代码10次。我不能使用api。这是一个学校项目。然后,我想用matplotlib将所有数据合并成十个一个大图表,显示这些股票价格。或者每只股票有十张图表。雅虎只提供每日股票数据,你确定你想要每小时的数据,还是过去10个交易日的每日数据?好的,这个项目只需要在周一之前交上来,这样我就可以在接下来的3天内每天做一次。我更愿意在周日晚上交。如果你需要在messenger上聊天,我会在附近。给我发电子邮件给杰森。schvach@gmail.com@是的。太棒了。。我给你发了电子邮件。希望收到你的来信如果你需要在messenger上聊天,我会在附近。给我发电子邮件给杰森。schvach@gmail.com@是的。太棒了。。我给你发了电子邮件。希望收到你的来信