Python Can';在从网页中抓取数据后,无法生成一些自定义输出

Python Can';在从网页中抓取数据后,无法生成一些自定义输出,python,json,python-3.x,dictionary,web-scraping,Python,Json,Python 3.x,Dictionary,Web Scraping,我试图将数据附加到字典中,同时从WebPage中删除相同的数据。我现在的输出不是我希望如何安排它们。这是最新的 我试过: import requests from bs4 import BeautifulSoup from pprint import pprint url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm' data = [] r = requests.get(url) soup =

我试图将数据附加到字典中,同时从WebPage中删除相同的数据。我现在的输出不是我希望如何安排它们。这是最新的

我试过:

import requests
from bs4 import BeautifulSoup
from pprint import pprint

url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm'
data = []

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("#transcript p"):
    d = {}

    if "Aimee:" in item.text:
        d['Aimee'] = item.text.replace("Aimee:","").strip()

    elif "Todd:" in item.text:
        d['Todd'] = item.text.replace("Todd:","").strip()

    data.append(d)

pprint(data)
我得到的结果是:

[{'Aimee': 'So Todd, where are you from?'},
 {'Todd': "I am from the U.S., I am from San Francisco. It's on the west "
          'coast.'},
 {'Aimee': 'And what do you do?'},
 {'Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
          'lot.'}
预期产出:

[{'Aimee': 'So Todd, where are you from?','Todd': "I am from the U.S., I am from San Francisco. It's on the west "
          'coast.'},

 {'Aimee': 'And what do you do?','Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
          'lot.'},
如何生成第二个输出


你如何决定谈话何时结束?我将发布一个说明如何获得这种效果的答案,但这并不是一个真正可靠的解决方案。我可以使用
data=[]
,而不是
data={}
data['Conversation']=[]
来生成一个输出,该输出将表示对话何时结束。这样可以存储对话,但它并没有告诉你如何知道谈话何时结束。所有对话都会有一个问题和一个回答吗?如果是的话,那么我怀疑我的答案是最优的。很抱歉,我没能理解你的问题。是的,对话是一问一答的基础。作为Tim回答的替代方法,您可以将转录本标记存储在列表
l
中,并对范围(0,len(l)-1,2)内的i进行两步迭代。然后你会看到Aimee在
l[i]
中的转换,Tom在
l[i+1]
中的转换。是的,这个解决方案很有效,但没有更好的方法。
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
d = {}
for item in soup.select("#transcript p"):

    if "Aimee:" in item.text:
        d['Aimee'] = item.text.replace("Aimee:","").strip()

    elif "Todd:" in item.text:
        d['Todd'] = item.text.replace("Todd:","").strip()
        data.append(d)
        d = {}

pprint(data)