Python 提取<;b>;标记和<;部门>;分别用靓汤

Python 提取<;b>;标记和<;部门>;分别用靓汤,python,beautifulsoup,Python,Beautifulsoup,编辑我需要df的两个独立列中的数据,格式如下。 def parser(url): page_content=BeautifulSoup(url.content,'html.parser') df=pd.DataFrame(columns=['Dialogues','Character']) for item in page_content.findAll('div',{'class':'quote'}): character= item.find(

编辑我需要df的两个独立列中的数据,格式如下。

def parser(url):
    page_content=BeautifulSoup(url.content,'html.parser')
    df=pd.DataFrame(columns=['Dialogues','Character'])
    for item in page_content.findAll('div',{'class':'quote'}):
            character= item.find('b').text[:-1]
            quotes=item.text
            df=df.append({'Dialogues':quotes,'Character': character},ignore_index=True)

    return df

试着这样做:

Character   Quote
Head 1  Text 1
Head 2  Text 2
Head 3  Text 3
输出:

targets = page_content.select('div.quote')
for target in targets:
    for s in target.stripped_strings:
        print(s)
import pandas as pd
heads = []
tails = []
targets = page_content.select('div.quote')
for target in targets:
    data = target.stripped_strings
    mu = list(data)
    for i in range(0,len(mu),2):
        heads.append(mu[i])
        tails.append(mu[i+1])

items = list(zip(heads,tails))
pd.DataFrame(items, columns=['Character','Quote'])
编辑:

要添加到数据帧,请执行以下操作:

Head 1
Text 1
Head 2
Text 2
Head 3
Text 3
输出:

targets = page_content.select('div.quote')
for target in targets:
    for s in target.stripped_strings:
        print(s)
import pandas as pd
heads = []
tails = []
targets = page_content.select('div.quote')
for target in targets:
    data = target.stripped_strings
    mu = list(data)
    for i in range(0,len(mu),2):
        heads.append(mu[i])
        tails.append(mu[i+1])

items = list(zip(heads,tails))
pd.DataFrame(items, columns=['Character','Quote'])

您好@JackFleeting,我可以使用上面的方法进行提取,但是有没有一种方法可以分别获取它们,以便我可以将它们附加到两个单独的文件中columns@nerd_cs-他们已经分开了。编辑您的问题,以准确显示您希望“两个独立的列”的外观。到底是什么问题?你试过什么,做过什么研究吗?堆栈溢出不是免费的代码编写服务。见:。