使用python从链接中删除/url?q=

使用python从链接中删除/url?q=,python,list,beautifulsoup,python-requests,Python,List,Beautifulsoup,Python Requests,首先,我对python比较陌生,所以我写的这个程序可能不是最有效的,但我遇到了一个问题。我的程序应该是获取有关科罗纳的新闻,并打印标题以及新闻网站的链接。我已经设法让它打印标题和链接,但是链接上的输出总是在https://之前包含/url?q= 代码如下: import requests from bs4 import BeautifulSoup import re #Fetching the site r = requests.get('https://www.google.com/sear

首先,我对python比较陌生,所以我写的这个程序可能不是最有效的,但我遇到了一个问题。我的程序应该是获取有关科罗纳的新闻,并打印标题以及新闻网站的链接。我已经设法让它打印标题和链接,但是链接上的输出总是在
https://
之前包含
/url?q=

代码如下:

import requests
from bs4 import BeautifulSoup
import re

#Fetching the site
r = requests.get('https://www.google.com/search?q=corona+nyheter&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiJ5eut64ztAhUliIsKHVyJDwIQ_AUoAXoECA0QAw&biw=1920&bih=1127')
src = r.content

#Letting soup do its thing
soup = BeautifulSoup(src, 'lxml')
data_text = soup.find_all('div', attrs={'class':'BNeawe vvjwJb AP7Wnd'})
data_link = soup.find_all('div', attrs={'class':'kCrYT'})

#Printing the titles of the news
for i in range(0, len(data_text)):
    print(str(i) + '.'+ data_text[i].text)

#Looking for links inside of the div with the class 'kCrYT'
links = []
for i in range(0, len(data_link)):
    for link in data_link[i].find_all('a'):
        links.append(link.get('href'))

#Printing each link
for i in range(0, len(links), 2):
    print(links[i])
    print('')

#print(data_link[0])
#print(dir(data_link[0]))
标题工作正常。问题是链接的输出,即:

/url?q=https://www.svt.se/nyheter/utrikes/200-anstallda-smittade-med-corona-pa-minkfarmar-i-danmark&sa=U&ved=2ahUKEwjzgJeEgo3tAhXT7HMBHf9BAYsQxfQBMAF6BAgHEAE&usg=AOvVaw1bQmvySzBxBofyWJgMx6L_
但是我不知道如何删除链接的
/url?q=
部分。
感谢您的帮助。

有很多方法可以做到这一点;如果保证将处理的每个字符串的开头都会出现相同的7个字符,那么最简单的方法就是应用子字符串操作:

link = "/url?q=https://www.svt.se/nyheter/utrikes/200-anstallda-smittade-med-corona-pa-minkfarmar-i-danmark&sa=U&ved=2ahUKEwjzgJeEgo3tAhXT7HMBHf9BAYsQxfQBMAF6BAgHEAE&usg=AOvVaw1bQmvySzBxBofyWJgMx6L_"
clean_link = link[7:]
print(clean_link) # https://www.svt.se/nyheter/utrikes/200-anstallda-smittade-med-corona-pa-minkfarmar-i-danmark&sa=U&ved=2ahUKEwjzgJeEgo3tAhXT7HMBHf9BAYsQxfQBMAF6BAgHEAE&usg=AOvVaw1bQmvySzBxBofyWJgMx6L_

您可以
拆分()
链接
/url?q=
,并打印第二个索引:

# Printing each link
for i in range(0, len(links), 2):
    print(links[i].split("/url?q=")[1])
    print('')
输出:

https://www.svt.se/nyheter/inrikes/ingen-storre-skillnad-i-skydd-mot-corona-med-munskydd-visar-dansk-studie&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAB6BAgFEAE&usg=AOvVaw0YLZPy1QJqH0p8cmdj7Icv

https://www.svt.se/nyheter/utrikes/200-anstallda-smittade-med-corona-pa-minkfarmar-i-danmark&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAF6BAgIEAE&usg=AOvVaw1s9zLwWfl6Yi5-t0stOmoC

https://www.svt.se/nyheter/snabbkollen/skolor-corona-stangs-ater-i-new-york&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAJ6BAgJEAE&usg=AOvVaw0jeJ0xPGp2kcNwuLKLpRRc

https://www.svt.se/datajournalistik/corona-de-senaste-tio-veckorna/&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAN6BAgHEAE&usg=AOvVaw2_Ey5bEltmctv9NvmGymrt

https://www.svt.se/nyheter/lokalt/varmland/kor-pa-lantbruksgymnasium-smittade-av-corona&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAR6BAgEEAE&usg=AOvVaw08P70_kTjfh_QxtxaGNTT7

https://www.svt.se/nyheter/utrikes/vaccin-mot-coronavirus-uppges-ha-95-procents-effektivitet&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAV6BAgCEAE&usg=AOvVaw0I9PE9T-lVfPJpPTkJnO3B

https://www.svt.se/sport/artikel/alpina-landslagets-damchef-har-testats-positivt-for-corona&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQxfQBMAZ6BAgAEAE&usg=AOvVaw0WojYKAkW7sikZxeZzR4m5

https://www.svt.se/nyheter/lokalt/jonkoping/sa-drabbas-utsatta-manniskor-under-pandemin&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQ0Y8FMAd6BAgGEAI&usg=AOvVaw1nCkVbnm6ROrNCv4p8mcqE

https://www.expressen.se/nyheter/myndigheterna-kallar-till-presstraff-om-coronalaget/&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQ0Y8FMAh6BAgDEAI&usg=AOvVaw1W77kVTR2M_69BgNjx49ve

https://www.aftonbladet.se/nyheter/a/7K5K33/fhm-undersoker-aterinsjuknande-i-corona&sa=U&ved=2ahUKEwjdremFhI3tAhXVPXAKHVA5DCgQ0Y8FMAl6BAgBEAI&usg=AOvVaw1pZYw6_UhTjWy79HXPSk7v

lstrip
将从字符串开头删除已定义的字符(如果字符串以这些字符开头)

因此,只需在字符串上执行
.lstrip(“/url?q=”)

尝试
.lstrip(“/url?q=”)