Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup并没有获取所有数据,只有一些数据 导入请求 从bs4导入BeautifulSoup def trade_spider(最大页数): 第页=0 当页面_Python_Html_Web Scraping_Beautifulsoup_Html Parsing - Fatal编程技术网

Python BeautifulSoup并没有获取所有数据,只有一些数据 导入请求 从bs4导入BeautifulSoup def trade_spider(最大页数): 第页=0 当页面

Python BeautifulSoup并没有获取所有数据,只有一些数据 导入请求 从bs4导入BeautifulSoup def trade_spider(最大页数): 第页=0 当页面,python,html,web-scraping,beautifulsoup,html-parsing,Python,Html,Web Scraping,Beautifulsoup,Html Parsing,而不是获取帖子正文的.string(为我工作)时: 作为旁注,您的脚本具有阻塞“特性”,您可以通过切换到来显著加快速度 你就快到了。只需将item\u name.string更改为item\u name.text import requests from bs4 import BeautifulSoup def trade_spider(max_pages): page = 0 while page <= max_pages: url = 'http://orangecount

而不是获取帖子正文的
.string
(为我工作)时:


作为旁注,您的脚本具有阻塞“特性”,您可以通过切换到来显著加快速度

你就快到了。只需将
item\u name.string
更改为
item\u name.text

import requests
from bs4 import BeautifulSoup


def trade_spider(max_pages):
page = 0
while page <= max_pages:
    url = 'http://orangecounty.craigslist.org/search/foa?s=' + str(page * 100)
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for link in soup.findAll('a', {'class':'hdrlnk'}):
        href = 'http://orangecounty.craigslist.org/' + link.get('href')
        title = link.string
        print title
        #print href
        get_single_item_data(href)
    page += 1

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('section', {'id':'postingbody'}):
        print item_name.string



trade_spider(1)
item_name.get_text(strip=True)