Python 美丽汤中的url错误_Python_Python 2.7_Beautifulsoup_Ubuntu 12.04

Python 美丽汤中的url错误

python python-2.7

Python 美丽汤中的url错误,python,python-2.7,beautifulsoup,ubuntu-12.04,Python,Python 2.7,Beautifulsoup,Ubuntu 12.04,我正在尝试使用beautifulsoup从Craigslist获取数据PID和价格。我已经编写了一个单独的代码，它为我提供了CLallsites.txt文件。在这段代码中，我试图从txt文件中获取每个站点，并获取前10页中所有条目的PID。我的代码是： from bs4 import BeautifulSoup from urllib2 import urlopen readfile = open("CLallsites.txt") product = "mcy"

我正在尝试使用beautifulsoup从Craigslist获取数据PID和价格。我已经编写了一个单独的代码，它为我提供了CLallsites.txt文件。在这段代码中，我试图从txt文件中获取每个站点，并获取前10页中所有条目的PID。我的代码是：

  from bs4 import BeautifulSoup       
  from urllib2 import urlopen 
  readfile = open("CLallsites.txt")
  product = "mcy"
  while 1:
    u = ""
    count = 0
    line = readfile.readline()
    commaposition = line.find(',')
    site = line[0:commaposition]
    location = line[commaposition+1:]
    site_filename = location + '.txt'
    f = open(site_filename, "a")
    while (count < 10):
       sitenow = site + "\\" + product + "\\" + str(u)
       html = urlopen(str(sitenow))                      
       soup = BeautifulSoup(html)                
       postings = soup('p',{"class":"row"})
       for post in postings:
            y = post['data-pid']
            print y
       count = count +1
       index = count*100
       u = "index" + str(index) + ".html"
    if not line:
        break
    pass

从bs4导入美化组
从urllib2导入urlopen
readfile=open（“CLallsites.txt”）
product=“mcy”
而1：
u=“”
计数=0
line=readfile.readline（）
逗号=行。查找（'，'））
站点=行[0:通信]
位置=行[位置+1:]
site_filename=位置+'.txt'
f=打开（站点文件名“a”）
而（计数<10）：
sitenow=site+“\\”+产品+“\\”+str（u）
html=urlopen（str（sitenow））
soup=BeautifulSoup（html）
postings=soup（'p'，{“类”：“行”}）
对于“入职”职位：
y=post['data-pid']
打印y
计数=计数+1
索引=计数*100
u=“index”+str（index）+.html”
如果不是直线：
打破
通过

我的CLallsites.txt如下所示：

craiglist站点，位置（Stackoverflow不允许使用cragslist链接发布，因此我无法显示文本，如果有帮助，我可以尝试附加文本文件。）

运行代码时，出现以下错误：

回溯（最近一次呼叫最后一次）：

文件“reading.py”，第16行，在 html=urlopen（str（sitenow））

文件“/usr/lib/python2.7/urllib2.py”，第126行，在urlopen中 return\u opener.open（url、数据、超时）

文件“/usr/lib/python2.7/urllib2.py”，第400行，打开响应=自身打开（请求，数据）

文件“/usr/lib/python2.7/urllib2.py”，第418行，打开 "开放",

文件“/usr/lib/python2.7/urllib2.py”，第378行，在调用链中结果=func（*args）

文件“/usr/lib/python2.7/urllib2.py”，第1207行，在http\u open中返回self.do_open（httplib.HTTPConnection，req）

文件“/usr/lib/python2.7/urllib2.py”，第1177行，打开引发URL错误（err）

urllib2.URLError：

你知道我做错了什么吗

我不知道

sitenow

的内容是什么，但它看起来是一个无效的URL。请注意，URL使用斜杠而不是反斜杠（因此该语句应该类似于

sitenow=site+“/”+product+“/”+str（u）

）

能否将“print sitenow”放在URL打开之前，然后查看打印出的内容？