Python 如何将beautifulsoup的输出附加到数据帧
我对Python 如何将beautifulsoup的输出附加到数据帧,python,beautifulsoup,Python,Beautifulsoup,我对python比较陌生。我打算 a) 从以下url()获取url列表,其中包含1919年以后的数据() b) 获取1919年至当年的数据(日期、类型、注册、运营商、fat、地点、类别) 然而,我遇到了一些问题,仍然陷入了困境。) 非常感谢任何形式的帮助,非常感谢 #import packages import numpy as np import pandas as pd from bs4 import BeautifulSoup #start of code mainurl = "http
python
比较陌生。我打算
a) 从以下url()获取url列表,其中包含1919年以后的数据()
b) 获取1919年至当年的数据(日期、类型、注册、运营商、fat、地点、类别)
然而,我遇到了一些问题,仍然陷入了困境。)
非常感谢任何形式的帮助,非常感谢
#import packages
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
#start of code
mainurl = "https://aviation-safety.net/database/"
def getAndParseURL(mainurl):
result = requests.get(mainurl)
soup = BeautifulSoup(result.content, 'html.parser')
datatable = soup.find('a', href = True)
#try clause to go through the content and grab the URLs
try:
for row in datatable:
cols = row.find_all("|")
if len(cols) > 1:
links.append(x, cols = cols)
except: pass
#place links into numpy array
links_array = np.asarray(links)
len(links_array)
#check if links are in dataframe
df = pd.DataFrame(links_array)
df.columns = ['url']
df.head(10)
我似乎无法获取URL
如果我能得到以下信息,那就太好了
序列号URL
1.
2.
3您没有从正在提取的标签中提取
href
属性。您要做的是查找所有带有链接的
标记(您这样做了,但需要使用find\u all
,因为find
只返回它找到的第一个1),然后遍历这些标记。我选择只让它查找子字符串'Year'
,如果是,则将其放入列表中
#import packages
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
#start of code
mainurl = "https://aviation-safety.net/database/"
def getAndParseURL(mainurl):
result = requests.get(mainurl)
soup = BeautifulSoup(result.content, 'html.parser')
datatable = soup.find_all('a', href = True)
return datatable
datatable = getAndParseURL(mainurl)
#go through the content and grab the URLs
links = []
for link in datatable:
if 'Year' in link['href']:
url = link['href']
links.append(mainurl + url)
#check if links are in dataframe
df = pd.DataFrame(links, columns=['url'])
df.head(10)
输出:
df.head(10)
Out[24]:
url
0 https://aviation-safety.net/database/dblist.ph...
1 https://aviation-safety.net/database/dblist.ph...
2 https://aviation-safety.net/database/dblist.ph...
3 https://aviation-safety.net/database/dblist.ph...
4 https://aviation-safety.net/database/dblist.ph...
5 https://aviation-safety.net/database/dblist.ph...
6 https://aviation-safety.net/database/dblist.ph...
7 https://aviation-safety.net/database/dblist.ph...
8 https://aviation-safety.net/database/dblist.ph...
9 https://aviation-safety.net/database/dblist.ph...