Python 无法从打印中删除转义字符
嗨,我正试图提取信息放入包含纯文本的列表中,但找不到删除转义字符的方法 一般来说,我对python和编程非常陌生。我一直在试图解决这个问题,但找不到一个 这是我的代码:Python 无法从打印中删除转义字符,python,escaping,beautifulsoup,html-escape-characters,Python,Escaping,Beautifulsoup,Html Escape Characters,嗨,我正试图提取信息放入包含纯文本的列表中,但找不到删除转义字符的方法 一般来说,我对python和编程非常陌生。我一直在试图解决这个问题,但找不到一个 这是我的代码: import urllib import re from bs4 import BeautifulSoup x=1 while x<2: url = "http://search.insing.com/ts/food-drink/bars-pubs/bars-pubs?page=" +str(x) h
import urllib
import re
from bs4 import BeautifulSoup
x=1
while x<2:
url = "http://search.insing.com/ts/food-drink/bars-pubs/bars-pubs?page=" +str(x)
htmlfile = urllib.urlopen(url).read()
soup = BeautifulSoup(htmlfile.decode('utf-8','ignore'))
reshtml = soup.find("div", "results").find_all("h3")
reslist = []
for item in reshtml:
res = item.get_text()
reslist.append(res)
print reslist
x += 1
导入urllib
进口稀土
从bs4导入BeautifulSoup
x=1
当x时,电流输出如下所示:
[u'\n\r\n Parco Caffe\n',
u'\n\r\n AdstraGold Microbrewery & Bistro Bar\n',
u'\n\r\n Alkaff Mansion Ristorante\n',
u'\n\r\n The Fat Cat Bistro\n',
u'\n\r\n Gravity Bar\n',
u'\n\r\n The Wine Company\r\n (Evans Road)\r\n \n',
u'\n\r\n Serenity Spanish Bar & Restaurant\r\n (VivoCity)\r\n \n',
u'\n\r\n The New Harbour Cafe & Bar\n',
u'\n\r\n Indian Times\n',
u'\n\r\n Sunset Bay Beach Bar\n',
u'\n\r\n Friends @ Jelita\n',
u'\n\r\n Talk Cock Sing Song @ Thomson\n',
u'\n\r\n En Japanese Dining Bar\r\n (UE Square)\r\n \n',
u'\n\r\n Magma German Wine Bistro\n',
u"\n\r\n Tam Kah Shark's Fin\n",
u'\n\r\n Senso Ristorante & Bar\n',
u'\n\r\n Hard Rock Cafe\r\n (HPL House)\r\n \n',
u'\n\r\n St. James Power Station \n',
u'\n\r\n The St. James\n',
u'\n\r\n Brotzeit German Bier Bar & Restaurant\r\n (Vivocity)\r\n \n']
在打印之前添加以下行:
reslist = [y.replace('\n','').replace('\r','') for y in reslist]
reslist = [y.strip() for y in reslist]
给我这个输出:
[u'Alkaff Mansion Ristorante',
u'Parco Caffe',
u'AdstraGold Microbrewery & Bistro Bar',
u'Gravity Bar',
u'The Fat Cat Bistro',
u'The Wine Company (Evans Road)',
u'Serenity Spanish Bar & Restaurant (VivoCity)',
u'The New Harbour Cafe & Bar',
u'Indian Times',
u'Sunset Bay Beach Bar',
u'Friends @ Jelita',
u'Talk Cock Sing Song @ Thomson',
u'En Japanese Dining Bar (UE Square)',
u'Magma German Wine Bistro',
u"Tam Kah Shark's Fin",
u'Senso Ristorante & Bar',
u'Hard Rock Cafe (HPL House)',
u'St. James Power Station',
u'The St. James',
u'Brotzeit German Bier Bar & Restaurant (Vivocity)']
这就是你要找的吗
<> Py的回答要好得多,而且更具体的汤。 好像你真的在跟踪锚文本,考虑改变< /P>
reshtml = soup.find("div", "results").find_all("h3")
致:
也改变了:
reslist.append(res)
致:
以下是我换衣服后得到的:
[u'Parco Caffe', u'AdstraGold Microbrewery & Bistro Bar',
u'Alkaff Mansion Ristorante', u'The Fat Cat Bistro', u'Gravity Bar',
u'The Wine Company (Evans Road)', u'Serenity Spanish Bar & Restaurant (VivoCity)',
u'The New Harbour Cafe & Bar', u'Indian Times', u'Sunset Bay Beach Bar',
u'Friends @ Jelita', u'Talk Cock Sing Song @ Thomson',
u'En Japanese Dining Bar (UE Square)', u'Magma German Wine Bistro',
u"Tam Kah Shark's Fin", u'Senso Ristorante & Bar',
u'Hard Rock Cafe (HPL House)', u'St. James Power Station',
u'The St. James', u'Brotzeit German Bier Bar & Restaurant (Vivocity)']
你的预期产出与实际产出相比是多少?而且,我不明白这个x=1;虽然这很好。谢谢
reslist.append(' '.join(res.split()))
[u'Parco Caffe', u'AdstraGold Microbrewery & Bistro Bar',
u'Alkaff Mansion Ristorante', u'The Fat Cat Bistro', u'Gravity Bar',
u'The Wine Company (Evans Road)', u'Serenity Spanish Bar & Restaurant (VivoCity)',
u'The New Harbour Cafe & Bar', u'Indian Times', u'Sunset Bay Beach Bar',
u'Friends @ Jelita', u'Talk Cock Sing Song @ Thomson',
u'En Japanese Dining Bar (UE Square)', u'Magma German Wine Bistro',
u"Tam Kah Shark's Fin", u'Senso Ristorante & Bar',
u'Hard Rock Cafe (HPL House)', u'St. James Power Station',
u'The St. James', u'Brotzeit German Bier Bar & Restaurant (Vivocity)']