Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在抓取时绕过googletagmanager_Python_Python 3.x_Csv_Beautifulsoup - Fatal编程技术网

Python 如何在抓取时绕过googletagmanager

Python 如何在抓取时绕过googletagmanager,python,python-3.x,csv,beautifulsoup,Python,Python 3.x,Csv,Beautifulsoup,当网站添加了脚本googletagmanadger时,我无法得到我所需要的。有了这段代码,我可以从中删除链接 现在我每行都有“www.googletagmanager.com”。。。所以我不知道该怎么处理。谢谢 [HTML][1] [CSV文件现在看起来如何][2] from bs4 import BeautifulSoup import csv import pandas as pd from csv import writer data_list = ["LINKI"

当网站添加了脚本googletagmanadger时,我无法得到我所需要的。有了这段代码,我可以从中删除链接 现在我每行都有“www.googletagmanager.com”。。。所以我不知道该怎么处理。谢谢

[HTML][1]

[CSV文件现在看起来如何][2]

from bs4 import BeautifulSoup
import csv
import pandas as pd
from csv import writer



data_list = ["LINKI", "GOWNO", "JAJCO"]

with open('innovators.csv', 'w', newline='') as file:
    writer = csv.writer(file, delimiter=',')
    writer.writerow(data_list)
    for i in range(0,50):
        #df = pd.read_csv("C:\\Users\\Lukasz\\Desktop\\PROJEKTY PYTHON\\W TRAKCIE\\bf3_strona2.csv")
        #url = "https://bf3.space/" + df['LINKS'][i]
        url='https://bf3.space/a-Byu6am3P'
        response = requests.get(url)
        data = response.text
        soup = BeautifulSoup(data, 'lxml')
        rows = soup.find('iframe')
        q = (rows.get('src'))
        writer.writerow([q])


[1]: https://i.stack.imgur.com/Ogq0N.png
[2]: https://i.stack.imgur.com/3JYqc.png

您可以对lambda使用
soup.find()

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://bf3.space/a-Byu6am3P'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print( soup.find('iframe', src=lambda s: 'googletagmanager.com' not in s) )
打印第一个非谷歌标签管理器
标签:

<iframe align="center" frameborder="0" height="1500" src="https://ven-way.x.yupoo.com/albums/83591895?uid=1" style="margin: 10px 0;padding: 0px 0px; border:none" width="100%"></iframe>

您可以对lambda使用
soup.find()

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://bf3.space/a-Byu6am3P'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print( soup.find('iframe', src=lambda s: 'googletagmanager.com' not in s) )
打印第一个非谷歌标签管理器
标签:

<iframe align="center" frameborder="0" height="1500" src="https://ven-way.x.yupoo.com/albums/83591895?uid=1" style="margin: 10px 0;padding: 0px 0px; border:none" width="100%"></iframe>