Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/90.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 附加列表元素“;“随机无重复”;复制到多个html文件_Python_Html_Random_Beautifulsoup_Href - Fatal编程技术网

Python 附加列表元素“;“随机无重复”;复制到多个html文件

Python 附加列表元素“;“随机无重复”;复制到多个html文件,python,html,random,beautifulsoup,href,Python,Html,Random,Beautifulsoup,Href,我正在尝试使用regex用结果值替换hrefurl,我也尝试了Beautifulsoup模块,但没有成功。请在所有html文件中不断获取一个相同的url class RandomChoiceNoImmediateRepeat(object): def __init__(self, lst): self.lst = lst self.last = None def choice(self): if self.last is None:

我正在尝试使用
regex
用结果值替换
href
url
,我也尝试了
Beautifulsoup
模块,但没有成功。请在所有html文件中不断获取一个相同的url

class RandomChoiceNoImmediateRepeat(object):
    def __init__(self, lst):
        self.lst = lst
        self.last = None
    def choice(self):
        if self.last is None:
            self.last = random.choice(self.lst)
            return self.last
        else:
            nxt = random.choice(self.lst)
            # make a new choice as long as it's equal to the last.
            while nxt == self.last:   
                nxt = random.choice(self.lst)
            # Replace the last and return the choice
            self.last = nxt
            return nxt

for filename in glob.glob('/docs/*.txt'):
    file_metadata = { 'name': 'file.txt', 'mimeType': '*/*' }
    media = MediaFileUpload(filename, mimetype='*/*', resumable=True)
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    link = 'https://drive.google.com/uc?export=download&id=' + file.get('id')
    linkd = []
    linkd.append(link)
    for filename in glob.glob('/docs/htmlz/*.html'):
        with open(filename, "r") as html_file:
            soup = BeautifulSoup(html_file,'html.parser')
            for anchor in soup.findAll("a", attrs={ "class" : "downloadme" }):
                gen = RandomChoiceNoImmediateRepeat(linkd)
                i = gen.choice()
                anchor['href'] = str(i)
                with open(filename, "w") as html_file:
                    html_file.write(str(soup))
                    html_file.close()



首先,根本原因是
re.sub
需要类似字符串或字节的对象,但您提供了其他类型

编辑:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<tr class="hello">first_elem</tr><tr>second_elem</tr>', "html.parser")
trs = soup.find_all("tr")
print("Content: {}  -->  Type: {}".format(trs, type(trs)))
print("Content: {}  -->  Type: {}".format(trs[0], type(trs[0])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"], type(trs[0]["class"])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"][0], type(trs[0]["class"][0])))
>>> python3 ci/common/python_utils/test_file.py 
Content: [<tr class="hello">first_elem</tr>, <tr>second_elem</tr>]  -->  Type: <class 'bs4.element.ResultSet'>
Content: <tr class="hello">first_elem</tr>  -->  Type: <class 'bs4.element.Tag'>
Content: ['hello']  -->  Type: <class 'list'>
Content: hello  -->  Type: <class 'str'>
我创建了一个示例,说明如何访问
bs4.element.ResultSet
类型的元素

代码:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<tr class="hello">first_elem</tr><tr>second_elem</tr>', "html.parser")
trs = soup.find_all("tr")
print("Content: {}  -->  Type: {}".format(trs, type(trs)))
print("Content: {}  -->  Type: {}".format(trs[0], type(trs[0])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"], type(trs[0]["class"])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"][0], type(trs[0]["class"][0])))
>>> python3 ci/common/python_utils/test_file.py 
Content: [<tr class="hello">first_elem</tr>, <tr>second_elem</tr>]  -->  Type: <class 'bs4.element.ResultSet'>
Content: <tr class="hello">first_elem</tr>  -->  Type: <class 'bs4.element.Tag'>
Content: ['hello']  -->  Type: <class 'list'>
Content: hello  -->  Type: <class 'str'>
从bs4导入美化组
soup=BeautifulSoup('first_elemsecond_elem',“html.parser”)
trs=汤。全部查找(“tr”)
打印(“内容:{}-->类型:{}”。格式(trs,类型(trs)))
打印(“内容:{}-->类型:{}”。格式(trs[0],类型(trs[0]))
打印(“内容:{}-->类型:{}”。格式(trs[0][“类”],类型(trs[0][“类”]))
打印(“内容:{}-->类型:{}”。格式(trs[0][“类”][0],类型(trs[0][“类”][0]))
输出:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<tr class="hello">first_elem</tr><tr>second_elem</tr>', "html.parser")
trs = soup.find_all("tr")
print("Content: {}  -->  Type: {}".format(trs, type(trs)))
print("Content: {}  -->  Type: {}".format(trs[0], type(trs[0])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"], type(trs[0]["class"])))
print("Content: {}  -->  Type: {}".format(trs[0]["class"][0], type(trs[0]["class"][0])))
>>> python3 ci/common/python_utils/test_file.py 
Content: [<tr class="hello">first_elem</tr>, <tr>second_elem</tr>]  -->  Type: <class 'bs4.element.ResultSet'>
Content: <tr class="hello">first_elem</tr>  -->  Type: <class 'bs4.element.Tag'>
Content: ['hello']  -->  Type: <class 'list'>
Content: hello  -->  Type: <class 'str'>
python3 ci/common/python\u utils/test\u file.py 内容:[第一要素,第二要素]-->类型: 内容:第一要素-->类型: 内容:['hello']-->键入: 内容:hello-->类型:
如上所示,
.findAll
提供了一个
bs4.element.ResultSet
类型,其中包含
bs4.element.Tag
元素。如果选择标记,您将得到一个列表,如:
['hello']
,您必须使用正确的索引,如:
[0]
,您将得到字符串类型变量(如您在输出的最后一行中所见)

re.sub需要字符串。因此,请转换为“str(anchor)”,看看它是否有效。我没有这样做,错误消失了,但href的结果url没有改变。我这次总是出错,关键是“”回溯(最近一次调用):文件“newpost.py”,第95行,在re.sub中(“https\:\/\/drive.google.com\/uc\?export\=download&id\=(.*),r'\g'+e,anchor[0])文件“/.local/lib/python3.6/site packages/bs4/element.py”,第971行,在getitem return self.attrs[key]KeyError:0”我明白了!您应该打印
anchor
变量,并检查内容以及如何访问元素。问题是
锚定
“容器”没有
0
键。我已更新了答案。我希望它能帮助解决你的问题。如果你需要更多的支持,请告诉我。我没有成功,我尝试了另一种方法,但如果你能帮助我,我有一个问题