Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/299.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 网页抓取时如何使用FindAll_Python_Web Scraping_Beautifulsoup_Findall - Fatal编程技术网

Python 网页抓取时如何使用FindAll

Python 网页抓取时如何使用FindAll,python,web-scraping,beautifulsoup,findall,Python,Web Scraping,Beautifulsoup,Findall,我想刮去瓷砖(微软Xbox 360 E 250 GB黑色控制台,微软Xbox One S 1TB白色控制台,带有2个无线控制器等)。在适当的时候,我想为Python脚本提供不同的eBay URL,但是为了这个问题,我只想关注一个特定的eBay URL 然后,我想将它们的标题添加到一个数据框中,并将其写入Excel。我想我可以自己做这部分 不起作用- for post in soup.findAll('a',id='ListViewInner'): print (post.get('hre

我想刮去瓷砖(微软Xbox 360 E 250 GB黑色控制台,微软Xbox One S 1TB白色控制台,带有2个无线控制器等)。在适当的时候,我想为Python脚本提供不同的eBay URL,但是为了这个问题,我只想关注一个特定的eBay URL

然后,我想将它们的标题添加到一个数据框中,并将其写入Excel。我想我可以自己做这部分

不起作用-

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))
for post in soup.findAll('a',id='body'):
      print (post.get('href'))
for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})
print(h1)
for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))
for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))
不起作用-

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))
for post in soup.findAll('a',id='body'):
      print (post.get('href'))
for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})
print(h1)
for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))
for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))
不起作用-

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))
for post in soup.findAll('a',id='body'):
      print (post.get('href'))
for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})
print(h1)
for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))
for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))
不起作用-

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))
for post in soup.findAll('a',id='body'):
      print (post.get('href'))
for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})
print(h1)
for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))
for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))
不起作用-

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))
for post in soup.findAll('a',id='body'):
      print (post.get('href'))
for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})
print(h1)
for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))
for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))
这给了我网页错误部分的链接,我知道href是超链接而不是标题,但我想如果下面的代码有效,我可以修改标题-

for post in soup.findAll('a'):
    print (post.get('href'))
这是我所有的密码-

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import urllib.request
from bs4 import BeautifulSoup

#BaseURL, Syntax1 and Syntax2 should be standard across all
#Ebay URLs, whereas Request and PageNumber can change 

BaseURL = "https://www.ebay.co.uk/sch/i.html?_from=R40&_sacat=0&_nkw="

Syntax1 = "&_skc=50&rt=nc"

Request = "xbox"

Syntax2  = "&_pgn="

PageNumber ="2"

URL = BaseURL + Request + Syntax2 + PageNumber + Syntax1


print (URL)
HTML = urllib.request.urlopen(URL).read()

#print(HTML)

soup=b(HTML,"html.parser")

#print (soup)

for post in soup.findAll('a'):
    print (post.get('href'))

使用更快的css选择器

import requests
from bs4 import  BeautifulSoup

url = 'https://www.ebay.co.uk/sch/i.html?_from=R40&_sacat=0&_nkw=xbox&_pgn=2&_skc=50&rt=nc'
Res = requests.get(url)
soup = BeautifulSoup(Res.text,'html.parser')
for post in soup.select("#ListViewInner a"):
    print(post.get('href'))

使用
format()
函数代替串联字符串

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import urllib.request
from bs4 import BeautifulSoup

BaseURL = "https://www.ebay.co.uk/sch/i.html?_from=R40&_sacat=0&_nkw={}&_pgn={}&_skc={}&rt={}"

skc = "50"
rt = "nc"
Request = "xbox"
PageNumber = "2"

URL = BaseURL.format(Request,PageNumber,skc,rt)
print(URL)
HTML = urllib.request.urlopen(URL).read()
soup = BeautifulSoup(HTML,"html.parser")
for post in soup.select('#ListViewInner a'):
    print(post.get('href'))

多谢各位much@RossSymonds如果你的问题得到了满意的答复,请考虑接受。