Python 如何获取特定的HTML标记

Python 如何获取特定的HTML标记,python,parsing,beautifulsoup,Python,Parsing,Beautifulsoup,如果项目没有文本,我将尝试获取HTML标记。 例如:我正在遍历所有“a”属性(URL)。 但是,有些URL中有文本,有些则没有。 在这种情况下,我试图获取那些没有文本的URL。 所以我就这样做了 response = requests.get('https://fw.tmall.com/tmall/ser/tmall_detail.htm?spm=a1z1g.2177293.0.0.qF9gPO&service_code=ts-4078').text soup = BeautifulSo

如果项目没有文本,我将尝试获取HTML标记。
例如:我正在遍历所有“a”属性(URL)。
但是,有些URL中有文本,有些则没有。
在这种情况下,我试图获取那些没有文本的URL。
所以我就这样做了

response = requests.get('https://fw.tmall.com/tmall/ser/tmall_detail.htm?spm=a1z1g.2177293.0.0.qF9gPO&service_code=ts-4078').text
soup = BeautifulSoup(response)
main_wrapper = soup.find('div',attrs={'id':'success-case'}).findAll('a')
for items in main_wrapper:
    dictionary = {}
    href = items['href']
    if items.string is None:
        print items['href']
    else:
        print items.string
如果items.string为None:仅获取特定于项目的URL,而不是所有URL,我该如何操作

我正在尝试获取那些没有文本的URL

你可以使用列表理解

hrefs = [a['href'] for a in main_wrapper if a.string is None]
仅获取该项目特定的url,而不是所有url

不清楚这意味着什么。每个
a
标记只有一个特定的URL。您在一个
a
标记列表上进行迭代,因此得到一个URL列表


我想获得特定的HTML属性,在本例中,它将是

然后,您需要在循环中使用另一个
find
方法来提取该
属性

我正在尝试获取那些没有文本的URL

你可以使用列表理解

hrefs = [a['href'] for a in main_wrapper if a.string is None]
仅获取该项目特定的url,而不是所有url

不清楚这意味着什么。每个
a
标记只有一个特定的URL。您在一个
a
标记列表上进行迭代,因此得到一个URL列表


我想获得特定的HTML属性,在本例中,它将是


然后,您需要在循环中使用另一个
find
方法来提取
属性我假定您正在尝试从div中的无序列表中获取唯一的锚。您可以看到每个锚都有一个唯一的类,
rel ink
vs
rel name

 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-ink"><img alt="NIHAOMARKET官方海外旗舰店" src="//img.alicdn.com/top/i1/TB1urimJFXXXXabaXXXwu0bFXXX.png" class="rel-img"></a>
 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-name">NIHAOMARKET官方海外旗舰店</a>
或者使用css选择器:

两者都将为您提供:

['//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=2087799889', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=449764134', '//store.taobao.com/shop/view_shop.htm?user_number_id=698389964', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1125022434', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=795947607', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358']

我假定您正在尝试从div中的无序列表中获取唯一的锚。您可以看到每个锚都有一个唯一的类,
rel ink
vs
rel name

 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-ink"><img alt="NIHAOMARKET官方海外旗舰店" src="//img.alicdn.com/top/i1/TB1urimJFXXXXabaXXXwu0bFXXX.png" class="rel-img"></a>
 <a href="//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358" target="_blank" class="rel-name">NIHAOMARKET官方海外旗舰店</a>
或者使用css选择器:

两者都将为您提供:

['//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=2087799889', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=449764134', '//store.taobao.com/shop/view_shop.htm?user_number_id=698389964', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1125022434', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=795947607', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=692020965', '//store.taobao.com/shop/view_shop.htm?user_number_id=1071997040', '//store.taobao.com/shop/view_shop.htm?user_number_id=509711360', '//store.taobao.com/shop/view_shop.htm?user_number_id=377676745', '//store.taobao.com/shop/view_shop.htm?user_number_id=2367059695', '//store.taobao.com/shop/view_shop.htm?user_number_id=2469022358']

你能澄清一下吗?你得到了什么?你想得到什么?我想得到特定的HTML属性,在这种情况下,如果该元素没有文本,它将是位于该元素内部的IMG URL。你能试着澄清一下吗?你得到了什么?你想得到什么?我想得到特定的HTML属性,在本例中,如果该元素没有文本,则它将是位于该元素内部的IMG URL。欢迎。你可以用帖子旁边的复选标记表示感谢。欢迎。你可以用帖子旁边的复选标记表示感谢。