Python ValueError:不支持的格式字符'；a'；（0x61）在索引79处_Python_Python 2.7_Beautifulsoup_Screen Scraping

Python ValueError:不支持的格式字符'；a'；（0x61）在索引79处

python python-2.7

Python ValueError:不支持的格式字符'；a'；（0x61）在索引79处,python,python-2.7,beautifulsoup,screen-scraping,Python,Python 2.7,Beautifulsoup,Screen Scraping,我正在尝试使用漂亮的soup4和python从一个网站上抓取数据。这是我的密码 from bs4 import BeautifulSoup import urllib2 i = 0 for i in xrange(0,38): page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&a

我正在尝试使用漂亮的soup4和python从一个网站上抓取数据。这是我的密码

from bs4 import BeautifulSoup
import urllib2
i = 0
for i in xrange(0,38):
    page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" %i) 
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'
i= i+ 1

我认为问题出在我给出的URL和increment语句中。我能一页一页地刮。但只有当我给出xrange时

ValueError的原因

您将

{}

格式与

格式混合在一起

>>> '{}%20la' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'

完整代码您不需要

i=0

和

i=i+1

，因为

对于x范围（0,38）中的i

要注意它

import urllib2 # Import standard library module first. (PEP-8)

from bs4 import BeautifulSoup

for i in xrange(0,38):
    page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'

ValueError的原因

您将

{}

格式与

格式混合在一起

>>> '{}%20la' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'

完整代码您不需要

i=0

和

i=i+1

，因为

对于x范围（0,38）中的i

要注意它

import urllib2 # Import standard library module first. (PEP-8)

from bs4 import BeautifulSoup

for i in xrange(0,38):
    page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'

ValueError的原因

您将

{}

格式与

格式混合在一起

>>> '{}%20la' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'

完整代码您不需要

i=0

和

i=i+1

，因为

对于x范围（0,38）中的i

要注意它

import urllib2 # Import standard library module first. (PEP-8)

from bs4 import BeautifulSoup

for i in xrange(0,38):
    page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'

ValueError的原因

您将

{}

格式与

格式混合在一起

>>> '{}%20la' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'

完整代码您不需要

i=0

和

i=i+1

，因为

对于x范围（0,38）中的i

要注意它

import urllib2 # Import standard library module first. (PEP-8)

from bs4 import BeautifulSoup

for i in xrange(0,38):
    page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'

您可能需要在处理它之前。。。此外，您的i值未被使用（在循环之外），您确定要使用第0页吗？是的。分页从0开始到38。那么我该如何给出它呢？只要使用@falsetru的答案——循环似乎是正确的（上升到37，而不是38）。我只是认为URL中的页面通常从1开始，而不是从零开始。在处理它之前，您可能需要。。。此外，您的i值未被使用（在循环之外），您确定要使用第0页吗？是的。分页从0开始到38。那么我该如何给出它呢？只要使用@falsetru的答案——循环似乎是正确的（上升到37，而不是38）。我只是认为URL中的页面通常从1开始，而不是从零开始。在处理它之前，您可能需要。。。此外，您的i值未被使用（在循环之外），您确定要使用第0页吗？是的。分页从0开始到38。那么我该如何给出它呢？只要使用@falsetru的答案——循环似乎是正确的（上升到37，而不是38）。我只是认为URL中的页面通常从1开始，而不是从零开始。在处理它之前，您可能需要。。。此外，您的i值未被使用（在循环之外），您确定要使用第0页吗？是的。分页从0开始到38。那么我该如何给出它呢？只要使用@falsetru的答案——循环似乎是正确的（上升到37，而不是38）。我只是认为URL中的页面通常从1开始，而不是从零开始。我如何将输出推送到以逗号分隔的.xlx文件中？@Venky，使用module来实现这一点。对我来说，将其写入csv文件变得越来越困难，因为数据乱七八糟，而且很安静，无法像我所希望的那样放入精确的列中。有什么方法可以让它精确地列在这一列吗？@Venky，我认为

.xlx

类似于

.csv

（因为

逗号分隔的

）。我不知道

.xlx

。对不起，是的。我试图写在csv只。就其本身而言，它是混乱的。我如何将输出推入一个以逗号分隔的.xlx文件？@Venky，使用module来做到这一点。对我来说，将其写入csv文件变得越来越困难，因为数据是混乱的，而且它是安静的，不会像我所希望的那样落在准确的列中。有什么方法可以让它精确地列在这一列吗？@Venky，我认为

.xlx

类似于

.csv

（因为

逗号分隔的

）。我不知道

.xlx

.xlx

类似于

.csv

（因为

逗号分隔的

）。我不知道

.xlx

.xlx

类似于

.csv

（因为

逗号分隔的

）。我不知道

.xlx

。对不起，是的。我试图写在csv只。这本身就是一团糟