Python BeasutifulSoup4中的导航_Python_Html_Web Scraping_Beautifulsoup

Python BeasutifulSoup4中的导航

python html web-scraping

Python BeasutifulSoup4中的导航,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我需要从几个片段中提取文本325和550。如何使用Python3.6.0、bs4和urllib实现这一点。我将把获得的数据添加到csv文件中 <div class="a-row a-spacing-none"> <a class="a-link-normal a-text-normal" href="https://www.amazon.in/Game-Thrones-Song-Ice-Fire/dp/0007428545"> <span c

我需要从几个片段中提取文本325和550。如何使用Python3.6.0、bs4和urllib实现这一点。我将把获得的数据添加到csv文件中

<div class="a-row a-spacing-none">
    <a class="a-link-normal a-text-normal" href="https://www.amazon.in/Game-Thrones-Song-Ice-Fire/dp/0007428545">
        <span class="a-size-small a-color-secondary">
        </span>

        <span class="a-size-base a-color-price s-price a-text-bold">

            <span class="currencyINR">  
            </span>
        325
        </span>

    </a>
    <span class="a-letter-space">
    </span>

    <span aria-label='Suggested Retail Price: &lt;span class="currencyINR"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;550' class="a-size-small a-color-secondary a-text-strike">
        <span class="currencyINR"> 
        </span>
    550
    </span>

 </div>

对于初学者，可以选择所有具有currencyINR类的跨度元素

我后来解决了这个问题。显然，导航并不像我截获的那样困难。然而，这是可行的解决方案

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = "https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire"


# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


# html parsing
page_soup = soup(page_html, "html.parser")


# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})


# Creates New File:
fileName = "H:\WEBSCRAPER\Result\Products.csv"
headers = "Product Name, Current Price, Original Price\n"

f = open(fileName, "w")
f.write(headers)


errorMsg = "Error! Not Found"
# obtains the data
for contain in containers:
    try:
        title = contain.h2.text
    except IndexError:
        title =  errorMsg
    try:
        priceCurrent = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
        CurrentSP = priceCurrent[0].text.strip()
    except IndexError:
        CurrentSP =  errorMsg
    try:
        priceSuggested = contain.findAll("span", {"class":"a-size-small a-color-secondary a-text-strike"})
        SuggestedSP = priceSuggested[0].text.strip()
    except IndexError:
        SuggestedSP =  errorMsg


    print("title: " + title)
    print("CurrentSP: " + CurrentSP)
    print("SuggestedSP: " + SuggestedSP)

    f.write(title.replace(",", "|") + "," + CurrentSP.replace(",", "") + "," + SuggestedSP.replace(",", "") + "\n")

f.close()

我后来解决了这个问题。显然，导航并不像我截获的那样困难。然而，这是可行的解决方案

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = "https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire"


# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


# html parsing
page_soup = soup(page_html, "html.parser")


# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})


# Creates New File:
fileName = "H:\WEBSCRAPER\Result\Products.csv"
headers = "Product Name, Current Price, Original Price\n"

f = open(fileName, "w")
f.write(headers)


errorMsg = "Error! Not Found"
# obtains the data
for contain in containers:
    try:
        title = contain.h2.text
    except IndexError:
        title =  errorMsg
    try:
        priceCurrent = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
        CurrentSP = priceCurrent[0].text.strip()
    except IndexError:
        CurrentSP =  errorMsg
    try:
        priceSuggested = contain.findAll("span", {"class":"a-size-small a-color-secondary a-text-strike"})
        SuggestedSP = priceSuggested[0].text.strip()
    except IndexError:
        SuggestedSP =  errorMsg


    print("title: " + title)
    print("CurrentSP: " + CurrentSP)
    print("SuggestedSP: " + SuggestedSP)

    f.write(title.replace(",", "|") + "," + CurrentSP.replace(",", "") + "," + SuggestedSP.replace(",", "") + "\n")

f.close()

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


my_url = "https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=a+song+of+ice+and+fire"


# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()


# html parsing
page_soup = soup(page_html, "html.parser")


# grabs each product
containers = page_soup.findAll("div", {"class":"s-item-container"})


# Creates New File:
fileName = "H:\WEBSCRAPER\Result\Products.csv"
headers = "Product Name, Current Price, Original Price\n"

f = open(fileName, "w")
f.write(headers)


errorMsg = "Error! Not Found"
# obtains the data
for contain in containers:
    try:
        title = contain.h2.text
    except IndexError:
        title =  errorMsg
    try:
        priceCurrent = contain.findAll("span", {"class":"a-size-base a-color-price s-price a-text-bold"})
        CurrentSP = priceCurrent[0].text.strip()
    except IndexError:
        CurrentSP =  errorMsg
    try:
        priceSuggested = contain.findAll("span", {"class":"a-size-small a-color-secondary a-text-strike"})
        SuggestedSP = priceSuggested[0].text.strip()
    except IndexError:
        SuggestedSP =  errorMsg


    print("title: " + title)
    print("CurrentSP: " + CurrentSP)
    print("SuggestedSP: " + SuggestedSP)

    f.write(title.replace(",", "|") + "," + CurrentSP.replace(",", "") + "," + SuggestedSP.replace(",", "") + "\n")

f.close()