将字符串从html转换为列表
我正在尝试将字符串转换为列表,并希望在换行时拆分它 html div中的字符串如下所示:将字符串从html转换为列表,html,css,string,list,split,Html,Css,String,List,Split,我正在尝试将字符串转换为列表,并希望在换行时拆分它 html div中的字符串如下所示: [<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>] [<div class="address-lg w-b
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n Franz-Josefs-Kai 31,\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
...
address = result.select('div.bottom-content div.address-lg.w-brk-ln-1')[0].get_text().strip().replace("\r\n","").split()
address2 = list(reversed(address))
到目前为止,我一直试图解决这个问题:
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n Franz-Josefs-Kai 31,\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
[<div class="address-lg w-brk-ln-1 ">\r\n \r\n 1010\r\n \r\n \r\n Wien, 01. Bezirk, Innere Stadt\r\n </div>]
...
address = result.select('div.bottom-content div.address-lg.w-brk-ln-1')[0].get_text().strip().replace("\r\n","").split()
address2 = list(reversed(address))
但我得到的总是:
[u'Stadt', u'Innere', u'Bezirk,', u'01.', u'Wien,', u'1010']
[u'Stadt', u'Innere', u'Bezirk,', u'01.', u'Wien,', u'1010']
[u'Stadt', u'Innere', u'Bezirk,', u'01.', u'Wien,', u'1010', u'Sch\xf6nlaterngasse,']
由于它是unicode,我认为需要对它进行.encode(),但我也需要在正确的位置拆分它。看起来您是在按空格拆分,但应该按逗号拆分:
拆分(“,”
)。如果这样做,您可能需要修剪结果,因为它们可能包含大量尾随空格
但由于您没有指定任何编程语言,这只是猜测。您没有指定使用哪种编程语言。也许将这些信息添加到标签中是个好主意。