python中的正则表达式引发异常_Python_Regex

python中的正则表达式引发异常

python regex

python中的正则表达式引发异常,python,regex,Python,Regex,我正在设计一个自动售票机，让沃尔玛脱销股票并改变价格。。。但我被卡住了：当我试图获取项目的id（链接中的结束号）时，我无法解析它。这是密码 # -*- coding: utf-8 -*- import re import urllib2 def walmart(): fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read() prods = fileprod.split("

我正在设计一个自动售票机，让沃尔玛脱销股票并改变价格。。。但我被卡住了：当我试图获取项目的id（链接中的结束号）时，我无法解析它。这是密码

# -*- coding: utf-8 -*-

import re
import urllib2

def walmart():
    fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
    prods = fileprod.split("|")
    print prods
    lenp = len(prods)
    counter = 0
    while 1:
        while counter < lenp:
            data = urllib2.urlopen(prods[counter]).read()
            path = re.compile("class=\"Outofstock\"") #\s space - \w char - \W Tutto meno che char - 
            matching = path.match(data)
            if matching == None: 
                pass
            else:
                print "Out of stock"
            name = re.compile("\d") 
            m = name.match(str(prods[counter])).group #prods counter è il link
            print m


def main():
    walmart()

if __name__ == "__main__":
    main()

您应该查看，这使解析html变得易于管理且相当容易。正则表达式通常不会做得很好

不过，要回答您的问题，您的错误是因为没有找到匹配项。一般来说，最好像这样运行正则表达式：

m = name.match(str(prods[counter]))  # if no match is found, then None is returned
if m:
    m = m.group()  # be sure to call the method here

您应该查看，这使解析html变得易于管理且相当容易。正则表达式通常不会做得很好

不过，要回答您的问题，您的错误是因为没有找到匹配项。一般来说，最好像这样运行正则表达式：

m = name.match(str(prods[counter]))  # if no match is found, then None is returned
if m:
    m = m.group()  # be sure to call the method here

您的正则表达式不匹配。您使用的是

re.match（）

而不是

re.search（）

；前者仅在字符串开头匹配：

m = name.search(str(prods[counter])).group()

您也不需要在循环中重新编译正则表达式；将它们移出循环，只编译一次

当有更好的工具可用时，您真的不应该使用正则表达式来解析HTML。改用

您还应该直接循环

产品

，在那里循环时不需要

：
import urllib
from bs4 import BeautifulSoup

fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
prods = fileprod.split("|")

for prod in prods:
    # split off last part of the URL for the product code
    product_code = prod.rsplit('/', 1)[-1]

    data = urllib2.urlopen(prod).read()
    soup = BeautifulSoup(data)
    if soup.find(class_='Outofstock'):
        print product_code, 'out of stock!'
        continue

    price = soup.find('span', class_='camelPrice').text
    print product_code, price

对于初学者URL，它将打印：
7812821 $32.98

您的正则表达式不匹配。您使用的是re.match（）
而不是re.search（）
；前者仅在字符串开头匹配：
m = name.search(str(prods[counter])).group()

您也不需要在循环中重新编译正则表达式；将它们移出循环，只编译一次
当有更好的工具可用时，您真的不应该使用正则表达式来解析HTML。改用
您还应该直接循环产品
，在那里循环时不需要：
import urllib
from bs4 import BeautifulSoup

fileprod = urllib2.urlopen("http://testh3x.altervista.org/walmart.txt").read()
prods = fileprod.split("|")

for prod in prods:
    # split off last part of the URL for the product code
    product_code = prod.rsplit('/', 1)[-1]

    data = urllib2.urlopen(prod).read()
    soup = BeautifulSoup(data)
    if soup.find(class_='Outofstock'):
        print product_code, 'out of stock!'
        continue

    price = soup.find('span', class_='camelPrice').text
    print product_code, price

对于初学者URL，它将打印：
7812821 $32.98

您不需要编译re
每个循环-您可以在之前和期间执行此操作。另外，您可以使用单个外部引号重写“class=\”Outofstock\”
，这样您就不必像注释一样逃避双引号，使用正则表达式解析html不是一个好主意：您不必编译re
每个循环-您可以在之前执行此操作，而时执行此操作。另外，你可以用一个外部引号重写“class=\“Outofstock\”
，这样你就不需要像注释一样逃避双引号，用正则表达式解析html不是一个好主意：我正在解析这样的链接：我想得到最后的数字…@user3423076:yes，我明白你想用那行代码做什么了。拆分文本要容易得多。我正在解析这样一个链接：我想得到最终的数字…@user3423076：是的，我知道你试图用解析行做什么。拆分文本要容易得多。