Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用Python和BeautifulSoup进行刮取-使用Javascript处理表_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

如何使用Python和BeautifulSoup进行刮取-使用Javascript处理表

如何使用Python和BeautifulSoup进行刮取-使用Javascript处理表,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试学习如何使用Python获取信息,不幸的是,我在这里遇到了很多麻烦。我要处理的问题是,我想要的信息似乎不包含在页面源代码中,它只在您选中其中一个框后出现 该网址为: 举个例子,我想要在你检查视频卡20的类别后,页面上显示的所有信息。当我查看页面源代码时,似乎有一个名为submitformfilter的脚本,如下所示: function submitformfilter() { var querystring = "dofilter=1"; $("input:checkbox:check

我正在尝试学习如何使用Python获取信息,不幸的是,我在这里遇到了很多麻烦。我要处理的问题是,我想要的信息似乎不包含在页面源代码中,它只在您选中其中一个框后出现

该网址为: 举个例子,我想要在你检查视频卡20的类别后,页面上显示的所有信息。当我查看页面源代码时,似乎有一个名为submitformfilter的脚本,如下所示:

function submitformfilter()
{
var querystring = "dofilter=1";
$("input:checkbox:checked").each(function()
{
querystring = querystring + '&'+$(this).attr("name")+'='+$(this).val()
}
);
if($("#promokw").val() !="")
{
querystring =   querystring+'&promokw='+ $("#promokw").val();
}
$.getJSON("http://www.ncix.com/promo/openboxfilter.cfm?jsoncallback=?&"+querystring);
}
function dosearch()
{
if($("#promokw").val() =="")
{
alert("Please enter the keyword.");
return false;
}
submitformfilter();
return false;
}

在这种情况下,我不知道如何解析我想要的数据。任何帮助都将不胜感激

您需要发布数据,尤其是与视频卡相关的minorcatid:

import requests
from bs4 import BeautifulSoup

data = {"dofilter": "1",
        "minorcatid": ""}
不一定是必需的,但至少可以添加一个用户代理 标题={ 用户代理:Mozilla/5.0 X11;Linux x86_64 AppleWebKit/537.36 KHTML,如Gecko Chrome/51.0.2704.106 Safari/537.36, X-request-With:XMLHttpRequest}

with requests.Session() as s:
    # use bs4 to get the minorcatid programmatically
    soup = BeautifulSoup(s.get("http://www.ncix.com/openbox/").content, "lxml")
    _id = soup.select_one("img[alt^=Video]")["src"].rsplit("/", 1)[1][:-4]
    data["minorcatid"] = _id
    resp = s.post("http://www.ncix.com/promo/openboxfilter.cfm", data=data, headers=headers).text

    print(resp)
这将为您提供回调中的数据,有更多表单数据,但我们只需传递id即可

您可以在chrome工具中看到选择框时发生的确切情况:

响应与您在开发人员控制台中看到的相同:

我们通过解析原始页面中的src属性,从与视频卡相关的img标记中提取id:

<img src="http://img.ncix.com/categoryimages/108.jpg" width="110" height="55" title="Video Cards" alt="Video Cards">

如果需要对任何内容进行编码,请求将为您处理。

您将无法使用beautifulsoup在页面上触发javascript。例如,使用带有无头浏览器(如phantomJS)的selenium web驱动程序,使页面处于您想要的状态,然后将其刮除。@Mr.Yetti,无头浏览器意味着什么?如果我为Selenium使用Chrome/Firefox驱动程序,它会工作吗?@Ambushes,你不需要Selenium,你只需要模仿ajax请求即可。@Ambushes,你不了解哪些部分,你知道ajax post请求是如何工作的吗。。工作?@Padriac Cunningham非常感谢你详尽的回答!不幸的是,我无法理解所有内容,因此我希望能够进行一些详细说明:数据[minorcatid]=\u id和查找\u id值的过程是否必要?我查看了Chrome中的元素,发现视频卡的minorcatid值为108。第二,您能解释一下最后三行中到底发生了什么,以及数据和头变量的用途吗?谢谢我以前也从未处理过AJAX/post请求。@伏击,如果id改变怎么办?您必须发布表单数据,而表单数据正是数据所包含的内容,您是否了解http请求、标头等的方式。。工作?不幸的是,我以前从未做过这种事情。如果它太难解释,你有没有我应该访问的链接来阅读它?我还尝试运行您提供的代码,但printresp返回以下结果:这是否正确?再次感谢。@Ambushes,该链接不应该存在,请运行更新的代码。如果您想精通抓取web数据,您需要了解html和msot(重要的是http)是如何工作的。如果你想要一本优秀的书,那么我强烈推荐HTTP:uidehttp://shop.oreilly.com/product/9781565925090.do
In [6]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("http://www.ncix.com/openbox/").content, "lxml")
   ...:         _id = soup.select_one("img[alt^=Video]")["src"].rsplit("/", 1)[1][:-4]
   ...:         print(_id)
   ...:         data["minorcatid"] = _id
   ...:         resp = s.post("http://www.ncix.com/promo/openboxfilter.cfm", data=data, headers=headers).content.decode("unicode_escape")
   ...:         soup2 = BeautifulSoup(resp.replace(r'\""', ''), "lxml")
   ...:         for tr in soup2.select("table tr"):
   ...:                 print(tr.select_one("div a[title^=SKU]"))   ...:         


108
<a href="http://www.ncix.com/detail/gigabyte-radeon-r9-fury-x-42-110438.htm#openbox" title="SKU: 110438
Mfr part #: GV%2DR9FURYX%2D4GD%2DB">GIGABYTE Radeon R9 Fury X 1050MHZ 4GB 1GHz HBM HDMI DISPLAYPORTX3 PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/gigabyte-geforce-gtx-970-oc-6d-102012.htm#openbox" title="SKU: 102012
Mfr part #: GV%2DN970WF3OC%2D4GD">GIGABYTE GeForce GTX 970 OC 1253 MHz 4GB 7.0GHZ GDDR5 2xDVI HDMI DisplayPort PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-x1950-pro-dual-96-24458.htm#openbox" title="SKU: 24458
Mfr part #: 11095%2D08%2D40R">Sapphire Radeon X1950 Pro Dual 580MHZ 1GB 1.4GHZ GDDR3 PCI-E 2XDVI-I TV Out Dual GPU Video Card</a>
<a href="http://www.ncix.com/detail/zotac-geforce-gt-730-zone-54-129574.htm#openbox" title="SKU: 129574
Mfr part #: ZT%2D71114%2D20L">Zotac GeForce GT 730 Zone Edition 1GB 902MHZ 1600MHz DDR3 DirectX 12 DVI + HDMI + VGA Video Card</a>
<a href="http://www.ncix.com/detail/club3d-radeon-r9-285-royal-43-101460.htm#openbox" title="SKU: 101460
Mfr part #: CGAX%2DR92856">CLUB3D Radeon R9 285 Royal Queen 945MHZ 2GB 5.5GHZ GDDR5 2xDVI HDMI DisplayPort PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/soltek-refurbished-nvidia-fx5600-agp8x-c6-11182.htm#openbox" title="SKU: 11182
Mfr part #: SL%2D5600%2DXD%2DR">SOLTEK REFURBISHED nVIDIA FX5600 AGP8X /128BIT/128MB DDR *30DAY WARRANTY* CARD ONLY</a>
<a href="http://www.ncix.com/detail/chaintech-apogee-geforce-6800-gt-ad-12556.htm#openbox" title="SKU: 12556
Mfr part #: AA6800G">Chaintech APOGEE GeForce 6800 GT 256MB DDR3 AGP8X VGA DVI-I TV Out Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-x800-gto2-256mb-1f-17166.htm#openbox" title="SKU: 17166
Mfr part #: 102%2DA47466%2D11%2DAT%20%2821067%2D01%2D20%29">Sapphire Radeon X800 GTO2 256MB GDDR3 PCI-E Dual DVI VIVO OEM Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-x1600-pro-advantage-de-21813.htm#openbox" title="SKU: 21813
Mfr part #: 88%2D8C87%2D11%2DSA">Sapphire Radeon X1600 Pro Advantage PCI-E 256MB DDR VGA DVI-I TV Out Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-hd-5850-725mhz-96-50790.htm#openbox" title="SKU: 50790
Mfr part #: 21162%2D00%2D40R">Sapphire Radeon HD 5850 725MHZ 1GB 4.0GHZ GDDR5 PCI-E Display Port 2XDVI HDMI DirectX 11 Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-hd-6570-650mhz-1b-82646.htm#openbox" title="SKU: 82646
Mfr part #: 11191%2D03%2D20G">Sapphire Radeon HD 6570 650MHZ 512MB 4GHZ GDDR5 DVI HDMI PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/sapphire-radeon-r9-fury-core-93-122143.htm#openbox" title="SKU: 122143
Mfr part #: 11247%2D03%2D40G">Sapphire Radeon R9 Fury Core 1050MHZ 4G HBM PCI-E HDMI/DVI-D Triple DP TRI-X OC+ (UEFI) Graphic Card</a>
<a href="http://www.ncix.com/detail/bfg-geforce-7600gs-oc-420mhz-cc-18204.htm#openbox" title="SKU: 18204
Mfr part #: BFGR76256GSOCE">BFG GeForce 7600GS OC 420MHZ PCI-E 256MB 800MHZ DDR2 VGA DVI-I HDTV Out Video Card</a>
<a href="http://www.ncix.com/detail/xfx-geforce-7800gtx-450mhz-256mb-f4-15636.htm#openbox" title="SKU: 15636
Mfr part #: PV%2DT70F%2DUNF7">XFX GeForce 7800GTX 450MHZ 256MB 256BIT 1.25GHZ DDR3 PCI-E Dual DVI-I TV-OUT Video Card</a>
<a href="http://www.ncix.com/detail/xfx-geforce-7600-gs-400mhz-6c-21791.htm#openbox" title="SKU: 21791
Mfr part #: PVT73PYDJ3">XFX GeForce 7600 GS 400MHZ PCI-E 512MB 128BIT 533MHZ DDR2 VGA DVI-I HDTV Out Video Card</a>
<a href="http://www.ncix.com/detail/xfx-radeon-r7-260x-dual-74-95523.htm#openbox" title="SKU: 95523
Mfr part #: R7%2D260X%2DCDF4">XFX Radeon R7 260X Dual Fan OC 1.1GHZ 2GB GDDR5 2xDVI HDMI DisplayPort PCI-E Video Card R7-260X-CDF4</a>
<a href="http://www.ncix.com/detail/evga-geforce-gtx-470-superclocked-8c-53391.htm#openbox" title="SKU: 53391
Mfr part #: 012%2DP3%2D1475%2DAR">EVGA GeForce GTX 470 SUPERCLOCKED+ 625MHZ Fermi 1280MB 3.4GHZ GDDR5 2XDVI Mini-HDMI PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/evga-e-geforce-8600gt-540mhz-256mb-31-23715.htm#openbox" title="SKU: 23715
Mfr part #: 256%2DP2%2DN751%2DTR">EVGA E-GEFORCE 8600GT 540MHZ 256MB 1.4GHZ GDDR3 PCI-E Dual DVI-I HDTV Out DIRECTX10 Video Card</a>
<a href="http://www.ncix.com/detail/evga-e-geforce-8600gt-540mhz-512mb-2e-26168.htm#openbox" title="SKU: 26168
Mfr part #: 512%2DP2%2DN756%2DTR">EVGA E-GEFORCE 8600GT 540MHZ 512MB 800HZ DDR2 PCI-E VGA DVI-I HDTV Out DIRECTX10 Video Card</a>
<a href="http://www.ncix.com/detail/evga-e-geforce-7600-gt-co-13-17949.htm#openbox" title="SKU: 17949
Mfr part #: 256%2DP2%2DN555">EVGA E-GEFORCE 7600 GT CO Superclocked 580MHZ PCI-E 256MB 1.5GHZ GDDR3 Dual DVI HDTV Out Video Card</a>
<a href="http://www.ncix.com/detail/evga-geforce-gtx-980-4gb-5e-102000.htm#openbox" title="SKU: 102000
Mfr part #: 04G%2DP4%2D2982%2DKR">EVGA GeForce GTX 980 4GB Super Clocked GAMING Silent Cooling 1241MHZ Boost 1342MHZ Graphics Card</a>
<a href="http://www.ncix.com/detail/gigabyte-geforce-gtx-960-g1-73-108014.htm#openbox" title="SKU: 108014
Mfr part #: GV%2DN960G1%20GAMING%2D4GD">GIGABYTE GeForce GTX 960 G1 1307MHZ 4GB 7.0GHZ GDDR5 DVI HDMI 3xDisplayPort PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/powercolor-radeon-hd-7870-pcs-70-78372.htm#openbox" title="SKU: 78372
Mfr part #: AX7870%202GBD5%2D2DHPPV3E">Powercolor Radeon HD 7870 PCS+ MYST.(TAHITI LE) 2GB 6Gbps GDDR5 DVI HDMI 2XMINIDP PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/powercolor-radeon-hd-3850-pcs-f8-29730.htm#openbox" title="SKU: 29730
Mfr part #: AG3850%20512MD3%2DP">Powercolor Radeon HD 3850 PCs 668MHZ 512MB 1.65GHZ GDDR3 AGP 2XDVI HDTV Out Video Card</a>
<a href="http://www.ncix.com/detail/msi-geforce-gtx-580-twin-79-58685.htm#openbox" title="SKU: 58685
Mfr part #: N580GTX%20Twin%20Frozr%20II%2FOC">MSI GeForce GTX 580 Twin Frozr II OC 800MHZ 1536MB GDDR5 2xDVI Mini-HDMI PCI-E DirectX 11 Video Card</a>
<a href="http://www.ncix.com/detail/gigabyte-radeon-hd-rx3870-775mhz-12-27208.htm#openbox" title="SKU: 27208
Mfr part #: GV%2DRX387512H%2DB">GIGABYTE Radeon HD RX3870 775MHZ 512MB 2.4GHZ GDDR4 Dual DVI-I HDCP HDTV Out PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/gigabyte-radeon-hd-r7-240-bc-90927.htm#openbox" title="SKU: 90927
Mfr part #: GV%2DR724OC%2D2GI%20REV2%2E0">GIGABYTE Radeon HD R7 240 OC 900MHZ 2GB 1.8GHZ GDDR3 DVI HDMI VGA PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/gigabyte-geforce-gtx-980-ti-ff-121258.htm#openbox" title="SKU: 121258
Mfr part #: GV%2DN98TXTREME%20W%2D6GD">GIGABYTE GeForce GTX 980 Ti Xtreme Waterforce 1317MHZ 6GB 7.2GHZ GDDR5 HDMI/3XDPORT/PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/ati-radeon-x1900xtx-650mhz-512mb-75-17528.htm#openbox" title="SKU: 17528
Mfr part #: 100%2D435805">ATI Radeon X1900XTX 650MHZ 512MB 256BIT 1.55GHZ GDDR3 PCI-E Dual DVI-I VIVO HDTV Video Card</a>
<a href="http://www.ncix.com/detail/asus-geforce-8800gtx-575mhz-768mb-d2-21403.htm#openbox" title="SKU: 21403
Mfr part #: EN8800GTX%2FHTDP%2F768M">ASUS GeForce 8800GTX 575MHZ 768MB 1.8GHZ GDDR3 Dual DVI-I HDTV Out DIRECTX10 Video Card</a>
<a href="http://www.ncix.com/detail/asus-geforce-gtx-750-oc-84-94415.htm#openbox" title="SKU: 94415
Mfr part #: GTX750%2DPHOC%2D1GD5">ASUS GeForce GTX 750 OC 1GB GDDR5 PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/asus-geforce-gtx-550-ti-25-59650.htm#openbox" title="SKU: 59650
Mfr part #: ENGTX550%20TI%20DC%20TOP%2FDI%2F1GD5">ASUS GeForce GTX 550 Ti DC Top 975MHZ 1GB 4.1GHZ GDDR5 DVI HDMI VGA PCI-E Video Card</a>
<a href="http://www.ncix.com/detail/asus-geforce-gt-520-700mhz-54-70010.htm#openbox" title="SKU: 70010
Mfr part #: ENGT520SL%2FDI%2F2GD3%28LP%29">ASUS GeForce GT 520 700MHZ 2GB 1.2GHZ DDR3 Low Profile DVI HDMI PCI-E DirectX 11 Video Card</a>
<a href="http://www.ncix.com/detail/asus-geforce-gtx-980-ti-1a-111058.htm#openbox" title="SKU: 111058
Mfr part #: STRIX%2DGTX980TI%2DDC3OC%2D6GD5%2DGAMING">ASUS GeForce GTX 980 Ti Strix 1317MHZ 6GB 7.2GHZ GDDR5 DVI HDMI 3XDISPLAYPORT PCI-E Video Card</a>
In [8]: with requests.Session() as s:
   ...:         soup = BeautifulSoup(s.get("http://www.ncix.com/openbox/").content, "lxml")
   ...:         _id = soup.select_one("img[alt^=Video]")["src"].rsplit("/", 1)[1][:-4]
   ...:         data["minorcatid"] = _id
   ...:         resp = s.post("http://www.ncix.com/promo/openboxfilter.cfm", data=data, headers=headers)
   ...:         req = resp.request
   ...:         print(req.body)
   ...:     

dofilter=1&minorcatid=108