Python 提取<;b>;标记和<;部门>;分别用靓汤
编辑我需要df的两个独立列中的数据,格式如下。Python 提取<;b>;标记和<;部门>;分别用靓汤,python,beautifulsoup,Python,Beautifulsoup,编辑我需要df的两个独立列中的数据,格式如下。 def parser(url): page_content=BeautifulSoup(url.content,'html.parser') df=pd.DataFrame(columns=['Dialogues','Character']) for item in page_content.findAll('div',{'class':'quote'}): character= item.find(
def parser(url):
page_content=BeautifulSoup(url.content,'html.parser')
df=pd.DataFrame(columns=['Dialogues','Character'])
for item in page_content.findAll('div',{'class':'quote'}):
character= item.find('b').text[:-1]
quotes=item.text
df=df.append({'Dialogues':quotes,'Character': character},ignore_index=True)
return df
试着这样做:
Character Quote
Head 1 Text 1
Head 2 Text 2
Head 3 Text 3
输出:
targets = page_content.select('div.quote')
for target in targets:
for s in target.stripped_strings:
print(s)
import pandas as pd
heads = []
tails = []
targets = page_content.select('div.quote')
for target in targets:
data = target.stripped_strings
mu = list(data)
for i in range(0,len(mu),2):
heads.append(mu[i])
tails.append(mu[i+1])
items = list(zip(heads,tails))
pd.DataFrame(items, columns=['Character','Quote'])
编辑:
要添加到数据帧,请执行以下操作:
Head 1
Text 1
Head 2
Text 2
Head 3
Text 3
输出:
targets = page_content.select('div.quote')
for target in targets:
for s in target.stripped_strings:
print(s)
import pandas as pd
heads = []
tails = []
targets = page_content.select('div.quote')
for target in targets:
data = target.stripped_strings
mu = list(data)
for i in range(0,len(mu),2):
heads.append(mu[i])
tails.append(mu[i+1])
items = list(zip(heads,tails))
pd.DataFrame(items, columns=['Character','Quote'])
您好@JackFleeting,我可以使用上面的方法进行提取,但是有没有一种方法可以分别获取它们,以便我可以将它们附加到两个单独的文件中columns@nerd_cs-他们已经分开了。编辑您的问题,以准确显示您希望“两个独立的列”的外观。到底是什么问题?你试过什么,做过什么研究吗?堆栈溢出不是免费的代码编写服务。见:。