Python 3.x “我如何打印?”;打印我1“;及;打印我2“;从html代码?

Python 3.x “我如何打印?”;打印我1“;及;打印我2“;从html代码?,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,如果我有这个HTML代码 <div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>

如果我有这个HTML代码

<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>
逻辑

1) Encapsulate HTML in a single quoted string.
2) Initialize BeautifulSoup
3) Locate all Span tags where presumably only text will occur in between tags
4) Iterate across all returned values (strings) that have span in them
5) If div or any other tag occurs (other tag case not covered in answer) then ignore record
6) Otherwise print value, after removing span tags
代码

#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup


category = BeautifulSoup('<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>')

def printSpan(s):
  s = s.find_all("span")
  for string in s:
    if len(string.find_all("div")) != 0:
      continue
    else:
      print (str(string).replace("<span>", "").replace("</span>", ""))

printSpan(category)
#导入Beauty soup函数以解析从网站返回的数据
从bs4导入BeautifulSoup
category=BeautifulSoup('print me 1patates print me 2')
def打印范围:
s=s.find_all(“span”)
对于s中的字符串:
如果len(string.find_all(“div”))!=0:
持续
其他:
打印(str(string).replace(“,”).replace(“,”))
printSpan(类别)

那不是html代码,那只是一个字符串。如果你有实际的html,请回答问题并将其包括在内。谢谢,回答得好。我要添加/更改的唯一一件事是,如果您使用BeautifulSoup
,则无需执行
。替换
或转换为
str
。text
函数:更改:
打印(str(string)。替换(“,”)。替换(“,”)
只需
打印(string.text)
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup


category = BeautifulSoup('<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>')

def printSpan(s):
  s = s.find_all("span")
  for string in s:
    if len(string.find_all("div")) != 0:
      continue
    else:
      print (str(string).replace("<span>", "").replace("</span>", ""))

printSpan(category)