Python 紫苏提取_Python_Beautifulsoup_Scrape

Python 紫苏提取

python

Python 紫苏提取,python,beautifulsoup,scrape,Python,Beautifulsoup,Scrape,我有几个简单的bs问题（1-3个一起问，4-6个一起问）。假设我有如下结构的HTML： <meta property="tall"/> <meta property="wide" content="spiral"/> <meta name="red"/> <meta name="tall"/> 但是，在那之后，我必须访问结果列表中的每个元素，然后才能获得类似属性和名称的内容。但是如果可能的话，我宁愿跳过这一步，直接获取property和name的

我有几个简单的bs问题（1-3个一起问，4-6个一起问）。假设我有如下结构的HTML：

<meta property="tall"/>
<meta property="wide" content="spiral"/>
<meta name="red"/>
<meta name="tall"/>

但是，在那之后，我必须访问结果列表中的每个元素，然后才能获得类似

属性

和

名称

的内容。但是如果可能的话，我宁愿跳过这一步，直接获取

property

和

name

的所有实例

最后，如果我想使用

requests.get

从一个网站获取url，而这是一个网站，你必须点击底部的一个按钮以使其加载更多，我想要额外的东西，我如何才能做到这一点

我不是使用BeautifulSoup的专家，但我尝试了一下，下面是我的想法，希望这足以让您开始使用。请注意，可能会有更优雅的解决方案

样板：

from bs4 import BeautifulSoup
import re

a = """<meta property="tall"/>
<meta property="wide" content="spiral"/>
<meta name="red"/>
<meta name="tall"/>"""

soup = BeautifulSoup(a)

三、我不知道你是什么意思，也许它已经被覆盖了

四,

alltall=soup.findAll（'meta'，attrs={'name'：'tall'}）
alltall+=（soup.findAll（'meta'，attrs={'property'：'tall'}））
>> [, ]

我花了一些时间寻找，但没有找到一个优雅的方法来这样做。也许我忽略了什么。

我不是使用BeautifulSoup的专家，但我尝试了一下，下面是我的想法，希望这足以让你开始。请注意，可能会有更优雅的解决方案

样板：

from bs4 import BeautifulSoup
import re

a = """<meta property="tall"/>
<meta property="wide" content="spiral"/>
<meta name="red"/>
<meta name="tall"/>"""

soup = BeautifulSoup(a)

三、我不知道你是什么意思，也许它已经被覆盖了

四,

alltall=soup.findAll（'meta'，attrs={'name'：'tall'}）
alltall+=（soup.findAll（'meta'，attrs={'property'：'tall'}））
>> [, ]

我花了一些时间寻找，但没有找到一个优雅的方法来这样做。也许我忽略了一些东西。

Beautiful soup

是关于提取数据的游戏，但这里有一些东西可以开始：

这里

test.html

是您发布的内容。它之所以有一个

try，catch块

，是因为如果find操作失败，那么它不会打印错误，而不会打印任何内容

from bs4 import BeautifulSoup

soup = BeautifulSoup (open(r'd:\test.html','r'))
#print soup.prettify()

items = soup.findAll("meta")

try:
    print "#How can I find all of the instances of property?"
    for all_prop in items:
        if all_prop['property']:
            print all_prop
except:
    print ""

try:
    print "#How can I then extract tall and wide?"
    for properties in items:
        print(properties['property'])
except:
    print ""


try:
    print "#all of the instances of tall"
    print soup.findAll('meta', attrs = {'property':'tall'})
    print soup.findAll('meta', attrs = {'name':'tall'})
    print ""
except:
    print ""

try:
    print "#How can I then extract tall?"
    for just_tall in items:
        if just_tall.get('property') == 'tall': 
            print just_tall.get('property')
        if just_tall.get('name') == 'tall':
            print just_tall.get('name')
except:
    print ""

输出：

#How can I find all of the instances of property?
<meta property="tall"/>
<meta content="spiral" property="wide"/>

#How can I then extract tall and wide?
tall
wide

#all of the instances of tall
[<meta property="tall"/>]
[<meta name="tall"/>]

#How can I then extract tall?
tall
tall

#如何找到属性的所有实例？
#然后如何提取高宽？
高的
宽的
#所有的例子都很高
[]
[]
#那我怎么才能拔出高个子呢？
高的
高的

其余的都是游戏，但上面的内容将帮助你开始。有些问题仍然模棱两可，所以我在上面给你举了一些例子来帮助你

教程和更多示例：

Beautiful soup

是关于提取数据的游戏，但这里有一些东西可以开始：

这里

test.html

是您发布的内容。它之所以有一个

try，catch块

，是因为如果find操作失败，那么它不会打印错误，而不会打印任何内容

from bs4 import BeautifulSoup

soup = BeautifulSoup (open(r'd:\test.html','r'))
#print soup.prettify()

items = soup.findAll("meta")

try:
    print "#How can I find all of the instances of property?"
    for all_prop in items:
        if all_prop['property']:
            print all_prop
except:
    print ""

try:
    print "#How can I then extract tall and wide?"
    for properties in items:
        print(properties['property'])
except:
    print ""


try:
    print "#all of the instances of tall"
    print soup.findAll('meta', attrs = {'property':'tall'})
    print soup.findAll('meta', attrs = {'name':'tall'})
    print ""
except:
    print ""

try:
    print "#How can I then extract tall?"
    for just_tall in items:
        if just_tall.get('property') == 'tall': 
            print just_tall.get('property')
        if just_tall.get('name') == 'tall':
            print just_tall.get('name')
except:
    print ""

输出：

#How can I find all of the instances of property?
<meta property="tall"/>
<meta content="spiral" property="wide"/>

#How can I then extract tall and wide?
tall
wide

#all of the instances of tall
[<meta property="tall"/>]
[<meta name="tall"/>]

#How can I then extract tall?
tall
tall

#如何找到属性的所有实例？
#然后如何提取高宽？
高的
宽的
#所有的例子都很高
[]
[]
#那我怎么才能拔出高个子呢？
高的
高的

其余的都是游戏，但上面的内容将帮助你开始。有些问题仍然模棱两可，所以我在上面给你举了一些例子来帮助你

教程和更多示例：

可能是一个很好的开始…每个线程的问题可能太多了。这可能是一个很好的开始…这可能是太多的问题，每个线程。

#How can I find all of the instances of property?
<meta property="tall"/>
<meta content="spiral" property="wide"/>

#How can I then extract tall and wide?
tall
wide

#all of the instances of tall
[<meta property="tall"/>]
[<meta name="tall"/>]

#How can I then extract tall?
tall
tall