Excel 存储使用Beatifulsoup4解析的数据
我试图将一些有关intrest的信息提取到一个Excel工作表中,该工作表的标题为名称、公式、EXACTMASS、MOLWEIGHT、CAS,但当我运行循环时,它会将每个字母/数字或字节(不确定其术语是否正确)添加到一个单元格中。我希望它存储打印中显示的全部信息,并将其作为字符串存储在每个化合物的每个框中。当下一个链接的循环再次开始时,我希望它在新行中开始。我不确定我会错在哪里Excel 存储使用Beatifulsoup4解析的数据,excel,parsing,beautifulsoup,organization,Excel,Parsing,Beautifulsoup,Organization,我试图将一些有关intrest的信息提取到一个Excel工作表中,该工作表的标题为名称、公式、EXACTMASS、MOLWEIGHT、CAS,但当我运行循环时,它会将每个字母/数字或字节(不确定其术语是否正确)添加到一个单元格中。我希望它存储打印中显示的全部信息,并将其作为字符串存储在每个化合物的每个框中。当下一个链接的循环再次开始时,我希望它在新行中开始。我不确定我会错在哪里 import urllib import urllib.request from bs4 import Beautif
import urllib
import urllib.request
from bs4 import BeautifulSoup
import os
import csv
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
compoundlist = []
soup = make_soup("http://www.genome.jp/dbget-bin/www_bget?ko00020")
i = 1
file = open("Compoundlist.csv", "w")
for record in soup.findAll("nobr"):
compound = ''
if (record.text[0] == "C" and record.text[1] == '0') or (record.text[0] == "C" and record.text[1] == '1'):
compoundlist ="http://www.genome.jp/dbget-bin/www_bget?cpd:" + record.text[:6] + '\n'
file.write(compoundlist)
# print(compoundlist)
file.close()
compoundinfo = []
linklist =open('Compoundlist.csv')
#
# def CASnumber(soup):
# for tag in soup.findAll("div", {"style":"margin-left:3em"}):
# tag = tag.text
# return tag
for items in linklist:
soupcomp = make_soup(items)
for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
for NAMES in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0]:
NAMES = NAMES.text
print(NAMES)
for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:3]:
FORMULA = INFO.text
print(FORMULA)
for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:4]:
EXACTMASS = INFO.text
print(EXACTMASS)
for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:5]:
MOLWEIGHT = INFO.text
print(MOLWEIGHT)
for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
for CAS in soupcomp.findAll("div", {"style":"margin-left:3em"}):
CAS = CAS.text
print(CAS)
with open("Compoundinfo.csv", 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerows([NAMES,FORMULA,EXACTMASS,MOLWEIGHT,CAS])
两件事:
1) 将打开的(“Compoundinfo.csv”,“a”)作为csv_文件放置在链接列表中项目的之前:
-无需在每个循环中重新打开文件
2) 对于您的案例,正确的方法是writer.writerow
(您有writerows
)
writerow
采用一维数据,writerows
采用二维数据作为参数