PythonWebScrape多个标记中的文本_Python_Python 3.x_Web Scraping_Beautifulsoup_Yahoo Finance

PythonWebScrape多个标记中的文本

python python-3.x web-scraping

PythonWebScrape多个标记中的文本,python,python-3.x,web-scraping,beautifulsoup,yahoo-finance,Python,Python 3.x,Web Scraping,Beautifulsoup,Yahoo Finance,我试图在雅虎财经页面上返回一些值。它们用标签包着。我能够让它返回这些值 543.46 546.8 None None 595.73 0.65 我对我所得到的无价值观存在问题。我应该返回“537.51 x 100”和“537.60 x 100”，因为网站的原因，数字确实发生了变化。我只需要那个格式作为输出。下面是我从源页面看到的特定html。这段代码包含了更多的标记，但BeautifulSoup并不关心这一点 <tr> <th scope="row" width="48%"&g

我试图在雅虎财经页面上返回一些值。它们用标签包着。我能够让它返回这些值

543.46
546.8
None
None
595.73
0.65

我对我所得到的无价值观存在问题。我应该返回“537.51 x 100”和“537.60 x 100”，因为网站的原因，数字确实发生了变化。我只需要那个格式作为输出。下面是我从源页面看到的特定html。这段代码包含了更多的标记，但BeautifulSoup并不关心这一点

<tr>
<th scope="row" width="48%">
    Prev Close:</th>
<td class="yfnc_tabledata1">
    543.46</td>
</tr>

<tr>
<th scope="row" width="48%">
    Open:</th>
<td class="yfnc_tabledata1">
    546.80</td>
</tr>

<tr>
<th scope="row" width="48%">
    Bid:</th>
<td class="yfnc_tabledata1">
    <span id="yfs_b00_aapl">
        536.55</span>
    <small> x 
        <span id="yfs_b60_aapl">
            100</span>
    </small>
</td>
</tr>

<tr><
th scope="row" width="48%">
    Ask:</th>
<td class="yfnc_tabledata1">
    <span id="yfs_a00_aapl">
        536.63</span>
    <small> x 
        <span id="yfs_a50_aapl">
            100</span>
    </small>
</td>
</tr>

<tr>
<th scope="row" width="48%">
    1y Target Est:</th>
<td class="yfnc_tabledata1">
    595.73</td>
</tr>

<tr>
<th scope="row" width="48%">
    Beta:</th>
<td class="yfnc_tabledata1">
    0.65</td>
</tr>

我想在第一个for循环中需要另一个for循环来解释额外的标记或者if语句。我不确定编码会是什么样子。

就我个人而言，我想顺便说一下：

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    if data.parent.name == "tr":
            print (data.text)

产出：

>>>
543.46
546.80
536.50 x 100
536.60 x 100
595.73
0.65
>>>

效果很好：）

注意：我将urlopen函数更改为urllib2。

您还可以使用以下任一选项：

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    print (data.text)

或

就我个人而言，我想顺便说一句：

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    if data.parent.name == "tr":
            print (data.text)

产出：

>>>
543.46
546.80
536.50 x 100
536.60 x 100
595.73
0.65
>>>

效果很好：）

注意：我将urlopen函数更改为urllib2。

您还可以使用以下任一选项：

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    print (data.text)

或

就我个人而言，我想顺便说一句：

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    if data.parent.name == "tr":
            print (data.text)

产出：

>>>
543.46
546.80
536.50 x 100
536.60 x 100
595.73
0.65
>>>

效果很好：）

注意：我将urlopen函数更改为urllib2。

您还可以使用以下任一选项：

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    print (data.text)

或

就我个人而言，我想顺便说一句：

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://finance.yahoo.com/q?s=AAPL&q1=1")
soup = BeautifulSoup(html)

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    if data.parent.name == "tr":
            print (data.text)

产出：

>>>
543.46
546.80
536.50 x 100
536.60 x 100
595.73
0.65
>>>

效果很好：）

注意：我将urlopen函数更改为urllib2。

您还可以使用以下任一选项：

for data in soup.find_all('td', class_="yfnc_tabledata1")[0:6]:
    print (data.text)

或

最简短的回答是，他们在bs4中添加了

您的代码可能类似于：

for data in soup.find_all('td', attrs={'class': 'yfnc_tabledata1'})[0:6]:
    print '--> ',(''.join(data.strings))

保留“\n”字符，以便您可以根据自己的喜好剥离和重新组合字符串。

最短的答案是在bs4中添加了“\n”字符

您的代码可能类似于：

for data in soup.find_all('td', attrs={'class': 'yfnc_tabledata1'})[0:6]:
    print '--> ',(''.join(data.strings))

保留“\n”字符，以便您可以根据自己的喜好剥离和重新组合字符串。

最短的答案是在bs4中添加了“\n”字符

您的代码可能类似于：

for data in soup.find_all('td', attrs={'class': 'yfnc_tabledata1'})[0:6]:
    print '--> ',(''.join(data.strings))

保留“\n”字符，以便您可以根据自己的喜好剥离和重新组合字符串。

最短的答案是在bs4中添加了“\n”字符

您的代码可能类似于：

for data in soup.find_all('td', attrs={'class': 'yfnc_tabledata1'})[0:6]:
    print '--> ',(''.join(data.strings))

保留“\n”字符，以便您可以根据自己的喜好剥离和重新组合字符串。

需要一个if语句来检查版本中的

“x”

，因为他希望其中两个值显示为

“None”

而不是

536.52 x 100

和

536.61 x 100

或检查数据是否

小

或

span

@Hyflex实际上这就是他和你的代码所显示的，他明确表示他想要

536.52 x 100

而不是

无

我的错，你是对的，有点不清楚，因为我没有完全阅读他的评论。我把你的打印行放进去了，出现了语法错误。还有什么我应该添加到该行以获得我需要的值吗？user2859603，请给出语法错误并向我们展示您尝试的代码？尝试：

print（'-->'，（'.join（data.strings））

当您使用python时，需要一个if语句来检查版本中的

“x”

，因为他希望其中两个值显示为

“None”

而不是

536.52 x 100

和

536.61 x 100

或检查数据是否

小

或

span

@Hyflex实际上这就是他和你的代码所显示的，他明确表示他想要

536.52 x 100

而不是

无

print（'-->'，（'.join（data.strings））

当您使用python时，需要一个if语句来检查版本中的

“x”

，因为他希望其中两个值显示为

“None”

而不是

536.52 x 100

和

536.61 x 100

或检查数据是否

小

或

span

@Hyflex实际上这就是他和你的代码所显示的，他明确表示他想要

536.52 x 100

而不是

无

print（'-->'，（'.join（data.strings））

当您使用python时，需要一个if语句来检查版本中的

“x”

，因为他希望其中两个值显示为

“None”

而不是

536.52 x 100

和

536.61 x 100

或检查数据是否

小

或

span

@Hyflex实际上这就是他和你的代码所显示的，他明确表示他想要

536.52 x 100

而不是

无

我的错，你是对的，有点不清楚，因为我没有完全阅读他的评论。我把你的打印行放进去了，出现了语法错误。还有什么我应该添加到该行以获得我需要的值吗？user2859603，请给出语法错误并向我们展示您尝试的代码？请尝试：

print（'-->'，（'.join（data.strings））

因为您正在使用python 3这并没有解决主要问题。我不想要无值。引用我的原始帖子：我应该返回“537.51 x 100”和“537.60 x 100”用户2859603添加了另外两个备选方案。这有帮助，所有三个选项都应该有效，但我个人会使用第一个选项，因为它还检查家长是否

tr

，以防他们对自己的网站做了一些有趣的事情。是的，我用了你的第一个选择。现在我要去看无花果