Python 3数据类型不兼容问题_Python_Beautifulsoup_Python 3.3

Python 3数据类型不兼容问题

python

Python 3数据类型不兼容问题,python,beautifulsoup,python-3.3,Python,Beautifulsoup,Python 3.3,你好，这里是stack社区我遇到了一个似乎无法解决的问题，因为它看起来像是Python 2.7的大部分帮助我想从网页中提取一个表，然后只获取链接文本，而不是整个锚代码如下：从urllib.request导入urlopen 从bs4导入BeautifulSoup 进口稀土 url = 'http://www.craftcount.com/category.php?cat=5' html = urlopen(url).read() soup = BeautifulSoup(html) al

你好，这里是stack社区

我遇到了一个似乎无法解决的问题，因为它看起来像是Python 2.7的大部分帮助

我想从网页中提取一个表，然后只获取链接文本，而不是整个锚

代码如下：从urllib.request导入urlopen 从bs4导入BeautifulSoup 进口稀土

url = 'http://www.craftcount.com/category.php?cat=5'

html = urlopen(url).read()
soup = BeautifulSoup(html)
alltables = soup.findAll("table")

## This bit captures the input from the previous sequence
results=[]
for link in alltables:
    rows = link.findAll('a')
## Find just the names
    top100 = re.findall(r">(.*?)<\/a>",rows)
print(top100)

url='1〕http://www.craftcount.com/category.php?cat=5'
html=urlopen（url）.read（）
soup=BeautifulSoup（html）
alltables=soup.findAll（“表”）
##此位捕获前一序列的输入
结果=[]
对于所有表中的链接：
行=link.findAll（'a'））
##只找到名字
top100=re.findall（r“>（*？），行）
印刷品（top100）

当我运行它时，我得到：“TypeError:预期的字符串或缓冲区”

直到最后一行的第二行，它做的一切都是正确的（当我将“打印（top100）”替换为“打印（行）”时）

作为我得到的答复的一个例子：

<a href="http://www.etsy.com/shop.php?user_id=5323531"target="_blank">michellechangjewelry</a>

我只需要得到：米歇勒昌珠宝

根据pythex.org的说法，我的（ir）正则表达式应该可以工作，所以我想看看是否有人知道如何做到这一点。另外一个问题是，看起来大多数人喜欢走另一条路，也就是说，从拥有全文到只想要URL部分

最后，出于“方便”的考虑，我使用BeautifulSoup，但如果您能建议一个更好的包来缩小对链接文本的解析范围，我并不感激它

非常感谢

BeautifulSoup结果不是字符串；大部分是这样

查找

的文本：
这将查找表中的第一个
链接。要查找所有链接的所有文本，请执行以下操作：
for table in alltables:
    links = table.find_all('a')
    top100 = [link.string for link in links]
    print(top100)

谢谢这正是我需要的。您对不同数据类型的解释也是说明性的。干杯
for table in alltables:
    links = table.find_all('a')
    top100 = [link.string for link in links]
    print(top100)