Python XML字符串重组

Python XML字符串重组,python,xml,selenium,beautifulsoup,Python,Xml,Selenium,Beautifulsoup,我有以下代码用于从网络中提取乐透号码: from BeautifulSoup import BeautifulSoup from selenium import webdriver lottonumbers=[] url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html" driver = webdriver.Firefox() driver.get(url) soup = BeautifulSoup(driver.p

我有以下代码用于从网络中提取乐透号码:

from BeautifulSoup import BeautifulSoup

from selenium import webdriver

lottonumbers=[]

url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
driver = webdriver.Firefox()
driver.get(url)
soup = BeautifulSoup(driver.page_source)

for ul in soup.findAll("div", {"class": "winning_numbers boxRow clearfix"}):
    n = ','.join(''.join(_ for _ in li if _.isdigit()) for li in ul.text.split())
    if n:
        print format(n)
返回:
625262728475

应该是:
6,25,26,27,28,47,5

逗号不见了。最好将每个数字写入列表
lottonumbers

有人能帮忙吗?

您可能会分配一个空的
lottonumbers
列表,并将n附加到另一个
for/loop
下,如下所示:

# ... previous code ...
lottonumbers = []
for ul in soup.findAll("div", {"class": "winning_numbers boxRow clearfix"}):
    for li in ul.text.split():
        n = ''.join(_ for _ in li if _.isdigit())
        if n:
            lottonumbers.append(int(n))
print lottonumbers
[6, 25, 26, 27, 28, 47, 5]

同时,我用一个hack(正则表达式)实现了它

从美化组导入美化组
从selenium导入webdriver
进口稀土
url=”https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
driver=webdriver.PhantomJS(可执行文件_path=“C://Users//Royskatt//Downloads//PhantomJS-2.0.0-windows//bin//PhantomJS.exe”)
#driver=webdriver.Firefox()
获取驱动程序(url)
soup=BeautifulSoup(驱动程序页\源)
乐透会员=[]
对于汤中的ul.findAll(“div”,{“class”:“winning_numbers boxRow clearfix”}):

对于re.findall(r')中的i(?如果您使此代码易于复制,这将有所帮助——例如,阅读此代码的其他人不会有C://Users//Royskatt//Downloads//phantomjs-2.0.0-windows//bin//phantomjs.exei get[625262728475](使用webdriver.Firefox())现在运行Linux上的BTW.RoSkyt,然后代码上的一点改动,我在路上,晚上会更新:@royskatt,这很奇怪,我使用的是Firefox webdriver,解决方案的结果是正确的。你能在
n
之前打印li.text
,看看结果是什么吗?给我一个AttributeError:“unicode”对象没有属性“text”。顺便说一句,这里是返回[625262728475]的完整代码:Python 2.7.6,Ubuntu 14。04@royskatt啊,对不起,我是想说
print li
from BeautifulSoup import BeautifulSoup
from selenium import webdriver
import re

url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
driver = webdriver.PhantomJS(executable_path="C://Users//Royskatt//Downloads//phantomjs-2.0.0-windows//bin//phantomjs.exe")
#driver = webdriver.Firefox()
driver.get(url)
soup = BeautifulSoup(driver.page_source)

lottonumbers = []

for ul in soup.findAll("div", {"class": "winning_numbers boxRow clearfix"}):
    for i in re.findall(r'(?<=zahl\([1-6]\)">)\d{1,2}|(?<="last">)\d', str(ul)):
        lottonumbers.append(i)

print lottonumbers