Python 3.x “我如何打印？”；打印我1“；及；打印我2“；从html代码？_Python 3.x_Web Scraping_Beautifulsoup

Python 3.x “我如何打印？”；打印我1“；及；打印我2“；从html代码？

python-3.x web-scraping

Python 3.x “我如何打印？”；打印我1“；及；打印我2“；从html代码？,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,如果我有这个HTML代码 <div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>

如果我有这个HTML代码

<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>

逻辑

1) Encapsulate HTML in a single quoted string.
2) Initialize BeautifulSoup
3) Locate all Span tags where presumably only text will occur in between tags
4) Iterate across all returned values (strings) that have span in them
5) If div or any other tag occurs (other tag case not covered in answer) then ignore record
6) Otherwise print value, after removing span tags

代码

#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup


category = BeautifulSoup('<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>')

def printSpan(s):
  s = s.find_all("span")
  for string in s:
    if len(string.find_all("div")) != 0:
      continue
    else:
      print (str(string).replace("<span>", "").replace("</span>", ""))

printSpan(category)

#导入Beauty soup函数以解析从网站返回的数据
从bs4导入BeautifulSoup
category=BeautifulSoup（'print me 1patates print me 2'）
def打印范围：
s=s.find_all（“span”）
对于s中的字符串：
如果len（string.find_all（“div”））！=0:
持续
其他：
打印（str（string）.replace（“，”）.replace（“，”））
printSpan（类别）

那不是html代码，那只是一个字符串。如果你有实际的html，请回答问题并将其包括在内。谢谢，回答得好。我要添加/更改的唯一一件事是，如果您使用BeautifulSoup

，则无需执行。替换或转换为str
。text
函数：更改：打印（str（string）。替换（“，”）。替换（“，”）
只需打印（string.text）
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup


category = BeautifulSoup('<div class="_1GGPkHIiaumnRMT-S1cU29"><span>print me 1</span><span><div class="_2ZBv5UiBzOiApuonYSpb92"><div>patates</div></div></span><span>print me 2</span></div>')

def printSpan(s):
  s = s.find_all("span")
  for string in s:
    if len(string.find_all("div")) != 0:
      continue
    else:
      print (str(string).replace("<span>", "").replace("</span>", ""))

printSpan(category)