Python Can';在从网页中抓取数据后,无法生成一些自定义输出
我试图将数据附加到字典中,同时从WebPage中删除相同的数据。我现在的输出不是我希望如何安排它们。这是最新的 我试过:Python Can';在从网页中抓取数据后,无法生成一些自定义输出,python,json,python-3.x,dictionary,web-scraping,Python,Json,Python 3.x,Dictionary,Web Scraping,我试图将数据附加到字典中,同时从WebPage中删除相同的数据。我现在的输出不是我希望如何安排它们。这是最新的 我试过: import requests from bs4 import BeautifulSoup from pprint import pprint url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm' data = [] r = requests.get(url) soup =
import requests
from bs4 import BeautifulSoup
from pprint import pprint
url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm'
data = []
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("#transcript p"):
d = {}
if "Aimee:" in item.text:
d['Aimee'] = item.text.replace("Aimee:","").strip()
elif "Todd:" in item.text:
d['Todd'] = item.text.replace("Todd:","").strip()
data.append(d)
pprint(data)
我得到的结果是:
[{'Aimee': 'So Todd, where are you from?'},
{'Todd': "I am from the U.S., I am from San Francisco. It's on the west "
'coast.'},
{'Aimee': 'And what do you do?'},
{'Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
'lot.'}
预期产出:
[{'Aimee': 'So Todd, where are you from?','Todd': "I am from the U.S., I am from San Francisco. It's on the west "
'coast.'},
{'Aimee': 'And what do you do?','Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
'lot.'},
如何生成第二个输出
你如何决定谈话何时结束?我将发布一个说明如何获得这种效果的答案,但这并不是一个真正可靠的解决方案。我可以使用
data=[]
,而不是data={}
,data['Conversation']=[]
来生成一个输出,该输出将表示对话何时结束。这样可以存储对话,但它并没有告诉你如何知道谈话何时结束。所有对话都会有一个问题和一个回答吗?如果是的话,那么我怀疑我的答案是最优的。很抱歉,我没能理解你的问题。是的,对话是一问一答的基础。作为Tim回答的替代方法,您可以将转录本标记存储在列表l
中,并对范围(0,len(l)-1,2)内的i进行两步迭代。然后你会看到Aimee在l[i]
中的转换,Tom在l[i+1]
中的转换。是的,这个解决方案很有效,但没有更好的方法。
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
d = {}
for item in soup.select("#transcript p"):
if "Aimee:" in item.text:
d['Aimee'] = item.text.replace("Aimee:","").strip()
elif "Todd:" in item.text:
d['Todd'] = item.text.replace("Todd:","").strip()
data.append(d)
d = {}
pprint(data)