Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python BeautifulSoup从垃圾页获取变量_Python_Beautifulsoup - Fatal编程技术网

python BeautifulSoup从垃圾页获取变量

python BeautifulSoup从垃圾页获取变量,python,beautifulsoup,Python,Beautifulsoup,各位, 试图从格式不好的页面中获取一些变量 html = response.read() soup = BeautifulSoup(html) links = soup.findAll('a') for link in links: for x in link.attrs: print x 输出: (u'href', u"javascript:Set_Variables('FIRSTNAME,LASTNAME', \r\n\t\t\t\t\t\t\t\t\t\t\t\

各位, 试图从格式不好的页面中获取一些变量

html =  response.read()
soup = BeautifulSoup(html)
links = soup.findAll('a')

for link in links:
    for x in link.attrs:
       print x
输出:

(u'href', u"javascript:Set_Variables('FIRSTNAME,LASTNAME', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'123456789123', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'FOOOOOOO',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'54',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'2014',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BAZZZZ',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BARRRRRRRRRR',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'07/31/2015',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'')")
(u'onmouseover', u"javascript: return window.status=''")
(u'href', u"javascript:Set_Variables('FIRSTNAME,LASTNAME', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'123456789123', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'FOOOOOOO',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'54',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'2014',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BAZZZZ',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BARRRRRRRRRR',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'07/31/2015',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'')")
(u'onmouseover', u"javascript: return window.status=''")
问题: 我怎样才能从这些乱七八糟的东西中得到
FIRSTNAME,LASTNAME
foooooo
barrrr
BAZZZZZ
123456789123


谢谢

首先,您只需要关注这里的
href
属性

将所有内容放在括号内,按空格分隔,并删除逗号和引号:

args = link['href'].partition('(')[-1].rpartition(')')[0]
args = [v.rstrip(',').strip("'") for v in args.split()]
演示:

>>> href = u"javascript:Set_Variables('FIRSTNAME,LASTNAME', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'123456789123', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'FOOOOOOO',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'54',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'2014',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BAZZZZ',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BARRRRRRRRRR',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'07/31/2015',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'')"
>>> href.partition('(')[-1].rpartition(')')[0]
u"'FIRSTNAME,LASTNAME', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'123456789123', \r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'FOOOOOOO',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'54',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'2014',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BAZZZZ',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'BARRRRRRRRRR',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t'07/31/2015',\r\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t''"
>>> [v.rstrip(',').strip("'") for v in href.partition('(')[-1].rpartition(')')[0].split()]
[u'FIRSTNAME,LASTNAME', u'123456789123', u'FOOOOOOO', u'54', u'2014', u'BAZZZZ', u'BARRRRRRRRRR', u'07/31/2015', u'']