Python 如何检测强标记并添加一个"*&引用；给每个人？_Python_Web_Beautifulsoup_Findall

Python 如何检测强标记并添加一个"*&引用；给每个人？

python web

Python 如何检测强标记并添加一个"*&引用；给每个人？,python,web,beautifulsoup,findall,Python,Web,Beautifulsoup,Findall,我用python编写了这段代码，它对我的作用是从web扩展。web文章的文本内容，并将其保存在不同的文件中。我想知道，如何检测一个强标记，并在每个标记之前或之后添加一个“” 这是我需要的结果： import urllib2 import re from bs4 import BeautifulSoup import time def _remove_attrs(soup): for tag in soup.findAll(True):

我用python编写了这段代码，它对我的作用是从web扩展。web文章的文本内容，并将其保存在不同的文件中。我想知道，如何检测一个强标记，并在每个标记之前或之后添加一个“”

这是我需要的结果：

 import urllib2
    import re
    from bs4 import BeautifulSoup
    import time


    def _remove_attrs(soup):
        for tag in soup.findAll(True):

            href=''
            if (tag.has_attr('href')):
            href=tag.get('href')

            src=''
            if (tag.has_attr('src')):
                src=tag.get('src')

            # tag.attrs = None
        tag.attrs = {}
         if (href!=''):
            tag['href']= href

        if (src!=''):
            tag['src']= src

    return soup

def _remove_empty(soup):
    return soup
    for x in soup.find_all():
        if len(x.text) == 0:
            x.extract()
    return soup


    base_url= 'http://www.scavonehnos.com.py/index.php?
    mact=Vmcs,cntnt01,print,0&cntnt01articleid='

   for x in range(10,12):
       n_url=base_url + str(x)
       print ("#PAGINA: "+n_url)
       page = urllib2.urlopen(n_url)
       soup = BeautifulSoup(page, 'html.parser')

       contenido=(soup.div.get_text())

       file = open('vicentec/prod_'+str(x)+'.txt', 'w')
       file.write(u' '.strip(contenido).join((contenido)).encode('utf-
8'))
       file.close()


       time.sleep(5)

正如您将看到的，我想在web上的

标记中添加星号。
对于那些访问此问题的人，此案例我已经解决了，它仍然有效
import urllib2
import re
from bs4 import BeautifulSoup
import time


def _remove_attrs(soup):
    for tag in soup.findAll(True):

        href=''
        if (tag.has_attr('href')):
            href=tag.get('href')

        src=''
        if (tag.has_attr('src')):
            src=tag.get('src')

        # tag.attrs = None
        tag.attrs = {}
        if (href!=''):
            tag['href']= href

        if (src!=''):
            tag['src']= src

    return soup

def _remove_empty(soup):
    return soup
    for x in soup.find_all(''):
        if len(x.text) == 0:
            x.extract()
    return soup

base_url= 'http://www.scavonehnos.com.py/index.php?mact=Vmcs,cntnt01,print,0&cntnt01articleid='

for x in range(10,225):
    n_url=base_url + str(x)
    print ("#PAGINA: "+n_url)
    page = urllib2.urlopen(n_url)
    soup = BeautifulSoup(page, 'html.parser')

    for strong in soup.select('strong'):
        strong.replace_with('#'+strong.get_text())

    contenido=(soup.div.get_text())


    fprod = 'vicentec/prod_'+(str(x))+'.txt'
    file = open(fprod, "w")
    file.write(u' '.strip(contenido).join((contenido)).encode('utf-8'))
    file.close()

有什么东西毁了你的照片，请再次将其编辑到你的问题中。感谢你回答问题——别忘了，以后你也会接受的。