在我的抓取代码(beautifulsoup+;python)中将信息与输出分离

在我的抓取代码(beautifulsoup+;python)中将信息与输出分离,python,web,beautifulsoup,screen-scraping,Python,Web,Beautifulsoup,Screen Scraping,我正在刮的个人资料是。我正在把教育和专业协会一起打印出来,我怎样才能把它们分开呢 for item in soup.find_all("dl", {"class": "description-list list-with-badges"}): y = item.find_all("span",attrs={"itemprop":"name"}) if y: print("Education:", item.get_text(strip=True, separato

我正在刮的个人资料是。我正在把教育和专业协会一起打印出来,我怎样才能把它们分开呢

for item in soup.find_all("dl", {"class": "description-list list-with-badges"}):
    y = item.find_all("span",attrs={"itemprop":"name"})
    if y:
        print("Education:", item.get_text(strip=True, separator= '|').split('|'))
输出为:

Education: ['Santa Clara University School of Law', 'J.D. ', '  Law', '1998', 'Honors:', 'Awarded "Certificate in High Technology Law"', 'Activities:', 'Editor, Santa Clara Computer & High Technology Law Journal;  Editor-in-Chief, The Advocate, Santa Clara University Law School Newspaper.']
Education: ['Michigan State University, James Madison College', 'B.A. ', '  Political Philosophy', '1995', 'Honors:', 'Overseas Study Program in Caribbean and South America, Summer Semester 1994Vice-President, MSU Adventure Club']
Education: ['Michigan State University, James Madison College', 'B.A. ', '  International Relations', '1995']
Education: ['California State Bar', '# 200701', 'Member', 'Current']
Education: ['California Bar Association', 'Member', 'Current']
Education: ['San Francisco Bar Association', 'Member', 'Current']
Education: ['American Bar Association', 'Member', 'Current']
Education: ['Internet Corporation for Assigned Names and Numbers (ICANN) - Noncommercial Stakeholders Group', 'Executive Committee', '2010', '- Current']
Education: ['Executive Committee of FreeMuse', 'Member', '2009', '-', '2016']
Education: ['Public Interest Registry - Advisory Council', 'Member', '2012', '-', '2014']
您正在使用
“类”:“带徽章的描述列表”
获取您的物品。如果您查看代码,您将看到
教育
专业协会
中的两个项目都有这些课程


如果要单独捕获它们,可以使用
itemtype
标记<代码>http://schema.org/CollegeOrUniversity是
教育
标签的值和
http://schema.org/Organization
适用于
专业协会

没问题!如果你的问题解决了,别忘了接受答案:Aweasome不知道,但是ty。我正使用你提出的想法来获取有关奖项的信息,但看起来没有独特的标签,就像专业协会和教育一样,你知道在这种情况下我会怎么做吗?你可以先使用基于文本的搜索来查找
奖项
div,然后使用
.parent
来获取该
奖项
的所有信息。我尝试了以下方法:查找汤中的项目。findAll(“div”,“class”:“heading-3块标题图标标题font-w-bold”):j=item.find\u parent('div'))打印(“奖励:”,item.get_text(strip=True,separator='|')。拆分('|'))