Python 如何更正我下面的脚本以获得并排的输出?
我已经用python编写了一个脚本来从网页收集一些信息。我用Python 如何更正我下面的脚本以获得并排的输出?,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我已经用python编写了一个脚本来从网页收集一些信息。我用css选择器以非常紧凑的方式编写了它。我的脚本能够获取数据。然而,我面临的问题是,我无法在脚本中使用css选择器来并排而不是串行地获取结果,因为我使用逗号分隔的css选择器一次获取两种类型的值。如果我无法提供清晰性,请参见下面的示例 我正在尝试使用的脚本: import requests from bs4 import BeautifulSoup res = requests.get("https://www.drugbank.ca/
css选择器
以非常紧凑的方式编写了它。我的脚本能够获取数据。然而,我面临的问题是,我无法在脚本中使用css选择器
来并排而不是串行地获取结果,因为我使用逗号分隔的css选择器
一次获取两种类型的值。如果我无法提供清晰性,请参见下面的示例
我正在尝试使用的脚本:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
items = '\n'.join([item.text for item in soup.select("dl > dt , dl > dd")])
print(items)
我的输出:
Name
Accession Number
Type
5-methyltetrahydrofolic acid
DB04789
Small Molecule
我希望得到的产出:
Name 5-methyltetrahydrofolic acid
Accession Number DB04789
Type Small Molecule
是否有可能通过在选择器中应用一些小的更改来获得预期的输出,并将其保持在一行中,正如我在上面尝试的那样。感谢您查看。分别获取
(asall\u dt
)和
(asall\u dd
)并使用
zip(all_dt,all_dd)
创建对
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
all_dt = soup.select("dl > dt")
all_dd = soup.select("dl > dd")
for dt, dd in zip(all_dt, all_dd):
print(dt.text, ":", dd.text)
您还可以使用nextSibling
获取dt
all_dt = soup.select("dl > dt")
for dt in all_dt:
dd = dt.nextSibling
print(dt.text, ":", dd.text)
我的完整代码从回答到问题是2小时前
import requests
from bs4 import BeautifulSoup
def get_details(url):
print('details:', url)
# get subpage
r = requests.get(url)
soup = BeautifulSoup(r.text ,"lxml")
# get data on subpabe
dts = soup.findAll('dt')
dds = soup.findAll('dd')
# display details
for dt, dd in zip(dts, dds):
print(dt.text)
print(dd.text)
print('---')
print('---------------------------')
def drug_data():
url = 'https://www.drugbank.ca/drugs/'
while url:
print(url)
r = requests.get(url)
soup = BeautifulSoup(r.text ,"lxml")
# get links to subpages
links = soup.select('strong a')
for link in links:
# exeecute function to get subpage
get_details('https://www.drugbank.ca' + link['href'])
# next page url
url = soup.findAll('a', {'class': 'page-link', 'rel': 'next'})
print(url)
if url:
url = 'https://www.drugbank.ca' + url[0].get('href')
else:
break
drug_data()
单独获取
(作为所有数据)和单独获取
(作为所有数据)
并使用zip(all_dt,all_dd)
创建对
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
all_dt = soup.select("dl > dt")
all_dd = soup.select("dl > dd")
for dt, dd in zip(all_dt, all_dd):
print(dt.text, ":", dd.text)
您还可以使用nextSibling
获取dt
all_dt = soup.select("dl > dt")
for dt in all_dt:
dd = dt.nextSibling
print(dt.text, ":", dd.text)
我的完整代码从回答到问题是2小时前
import requests
from bs4 import BeautifulSoup
def get_details(url):
print('details:', url)
# get subpage
r = requests.get(url)
soup = BeautifulSoup(r.text ,"lxml")
# get data on subpabe
dts = soup.findAll('dt')
dds = soup.findAll('dd')
# display details
for dt, dd in zip(dts, dds):
print(dt.text)
print(dd.text)
print('---')
print('---------------------------')
def drug_data():
url = 'https://www.drugbank.ca/drugs/'
while url:
print(url)
r = requests.get(url)
soup = BeautifulSoup(r.text ,"lxml")
# get links to subpages
links = soup.select('strong a')
for link in links:
# exeecute function to get subpage
get_details('https://www.drugbank.ca' + link['href'])
# next page url
url = soup.findAll('a', {'class': 'page-link', 'rel': 'next'})
print(url)
if url:
url = 'https://www.drugbank.ca' + url[0].get('href')
else:
break
drug_data()
如果分别获取这两种类型的数据,可以将它们压缩在一起,然后打印出来:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
categories = soup.select("dl > dt")
entries = soup.select("dl > dd")
items = zip(categories, entries)
for item in items:
print(item[0].text + ": " + item[1].text)
如果分别获取这两种类型的数据,可以将它们压缩在一起,然后打印出来:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
categories = soup.select("dl > dt")
entries = soup.select("dl > dd")
items = zip(categories, entries)
for item in items:
print(item[0].text + ": " + item[1].text)
这或多或少是我所期待的答案的版本:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
items = [': '.join([item.text,item.find_next().text]) for item in soup.select("dl > dt")]
print(items)
这或多或少是我所期待的答案的版本:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.drugbank.ca/drugs/DB04789")
soup = BeautifulSoup(res.text ,"lxml")
items = [': '.join([item.text,item.find_next().text]) for item in soup.select("dl > dt")]
print(items)
分别获取
(asall\u dt
)和
(asall\u dd
)并使用zip(all\u dt,all\u dd)
创建对。请参阅中的代码,感谢你们两位的解决方案+每个人一个。我的要求是有任何一个线性解决方案。再次感谢。分别获取
(asall\u dt
)和
(asall\u dd
)并使用zip(all\u dt,all\u dd)
创建配对。请参阅中的代码,感谢你们两位提供解决方案+每个人一个。我的要求是有任何一个线性解决方案。再次感谢。