Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-网页抓取-BeautifulSoup&;CSV_Python_Python 2.7_Csv_Beautifulsoup - Fatal编程技术网

Python-网页抓取-BeautifulSoup&;CSV

Python-网页抓取-BeautifulSoup&;CSV,python,python-2.7,csv,beautifulsoup,Python,Python 2.7,Csv,Beautifulsoup,我希望从一个城市和许多城市的生活成本对比中得出变化。我计划在一个CSV文件中列出我想要比较的城市,并使用此列表创建一个web链接,将我带到我正在寻找的信息的网站 以下是一个示例的链接: 不幸的是,我遇到了几个挑战。非常感谢对以下挑战的任何帮助 输出仅显示百分比,但没有指示是更贵还是更便宜。对于上面列出的示例,基于当前代码的输出显示为48%、129%、63%、43%、42%和42%。我试图通过添加一个“if语句”来纠正这个问题,如果价格更高,则添加“+”符号,如果价格更便宜,则添加“-”符号。但是

我希望从一个城市和许多城市的生活成本对比中得出变化。我计划在一个CSV文件中列出我想要比较的城市,并使用此列表创建一个web链接,将我带到我正在寻找的信息的网站

以下是一个示例的链接:

不幸的是,我遇到了几个挑战。非常感谢对以下挑战的任何帮助

  • 输出仅显示百分比,但没有指示是更贵还是更便宜。对于上面列出的示例,基于当前代码的输出显示为48%、129%、63%、43%、42%和42%。我试图通过添加一个“if语句”来纠正这个问题,如果价格更高,则添加“+”符号,如果价格更便宜,则添加“-”符号。但是,此“if语句”功能不正确
  • 当我将数据写入CSV文件时,每个百分比都会写入新行。我似乎不知道如何把它写成一行的列表
  • (与第2项相关)当我将数据写入上面列出的示例的CSV文件时,数据将以下面列出的格式写入。我如何更正格式并以下面列出的首选格式写入数据(也没有百分号)
  • 当前CSV格式(注意:“if语句”功能不正确):

    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,8,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,1,2,9,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,6,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    
    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    new-york-city, 48,129,63,43,42,42
    
    import requests
    import csv
    from bs4 import BeautifulSoup
    
    #Read text file
    Textfile = open("City.txt")
    Textfilelist = Textfile.read()
    Textfilelistsplit = Textfilelist.split("\n")
    HomeCity = 'Phoenix'
    
    i=0
    while i<len(Textfilelistsplit):
        url = "http://www.expatistan.com/cost-of-living/comparison/" + HomeCity + "/" + Textfilelistsplit[i]
        page  = requests.get(url).text
        soup_expatistan = BeautifulSoup(page)
    
        #Prepare CSV writer.
        WriteResultsFile = csv.writer(open("Expatistan.csv","w"))
        WriteResultsFile.writerow(["City","Food","Housing","Clothes","Transportation","Personal Care", "Entertainment"])
    
        expatistan_table = soup_expatistan.find("table",class_="comparison")
        expatistan_titles = expatistan_table.find_all("tr",class_="expandable")
    
        for expatistan_title in expatistan_titles:
                percent_difference = expatistan_title.find("th",class_="percent")
                percent_difference_title = percent_difference.span['class']
                if percent_difference_title == "expensiver":
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '+' + percent_difference.span.string)
                else:
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '-' + percent_difference.span.string)
        i+=1
    
    首选CSV格式:

    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,8,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,1,2,9,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,6,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    
    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    new-york-city, 48,129,63,43,42,42
    
    import requests
    import csv
    from bs4 import BeautifulSoup
    
    #Read text file
    Textfile = open("City.txt")
    Textfilelist = Textfile.read()
    Textfilelistsplit = Textfilelist.split("\n")
    HomeCity = 'Phoenix'
    
    i=0
    while i<len(Textfilelistsplit):
        url = "http://www.expatistan.com/cost-of-living/comparison/" + HomeCity + "/" + Textfilelistsplit[i]
        page  = requests.get(url).text
        soup_expatistan = BeautifulSoup(page)
    
        #Prepare CSV writer.
        WriteResultsFile = csv.writer(open("Expatistan.csv","w"))
        WriteResultsFile.writerow(["City","Food","Housing","Clothes","Transportation","Personal Care", "Entertainment"])
    
        expatistan_table = soup_expatistan.find("table",class_="comparison")
        expatistan_titles = expatistan_table.find_all("tr",class_="expandable")
    
        for expatistan_title in expatistan_titles:
                percent_difference = expatistan_title.find("th",class_="percent")
                percent_difference_title = percent_difference.span['class']
                if percent_difference_title == "expensiver":
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '+' + percent_difference.span.string)
                else:
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '-' + percent_difference.span.string)
        i+=1
    
    这是我当前的代码:

    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,8,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,1,2,9,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,6,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,3,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,2,%
    
    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    new-york-city, 48,129,63,43,42,42
    
    import requests
    import csv
    from bs4 import BeautifulSoup
    
    #Read text file
    Textfile = open("City.txt")
    Textfilelist = Textfile.read()
    Textfilelistsplit = Textfilelist.split("\n")
    HomeCity = 'Phoenix'
    
    i=0
    while i<len(Textfilelistsplit):
        url = "http://www.expatistan.com/cost-of-living/comparison/" + HomeCity + "/" + Textfilelistsplit[i]
        page  = requests.get(url).text
        soup_expatistan = BeautifulSoup(page)
    
        #Prepare CSV writer.
        WriteResultsFile = csv.writer(open("Expatistan.csv","w"))
        WriteResultsFile.writerow(["City","Food","Housing","Clothes","Transportation","Personal Care", "Entertainment"])
    
        expatistan_table = soup_expatistan.find("table",class_="comparison")
        expatistan_titles = expatistan_table.find_all("tr",class_="expandable")
    
        for expatistan_title in expatistan_titles:
                percent_difference = expatistan_title.find("th",class_="percent")
                percent_difference_title = percent_difference.span['class']
                if percent_difference_title == "expensiver":
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '+' + percent_difference.span.string)
                else:
                    WriteResultsFile.writerow(Textfilelistsplit[i] + '-' + percent_difference.span.string)
        i+=1
    
    导入请求
    导入csv
    从bs4导入BeautifulSoup
    #读取文本文件
    Textfile=open(“City.txt”)
    Textfilelist=Textfile.read()
    Textfilelistsplit=Textfilelist.split(“\n”)
    家乡=‘凤凰城’
    i=0
    而我回答:

    • 问题1:
      span
      的类别是一个列表,您需要检查
      expensiver
      是否在此列表中。换言之,替换:

      if percent_difference_title == "expensiver" 
      
      与:

    • 问题2和3:您需要将列值列表传递给
      writerow()
      ,而不是字符串。而且,由于每个城市只需要一条记录,因此在循环外部调用
      writerow()
      (通过
      tr
      s)
    其他问题:

    • 打开
      csv
      文件以便在循环之前写入
    • 在处理文件时使用上下文管理器
    • 试着遵循风格指南
    以下是经过修改的代码:

    import requests
    import csv
    from bs4 import BeautifulSoup
    
    BASE_URL = 'http://www.expatistan.com/cost-of-living/comparison/{home_city}/{city}'
    home_city = 'Phoenix'
    
    with open('City.txt') as input_file:
        with open("Expatistan.csv", "w") as output_file:
            writer = csv.writer(output_file)
            writer.writerow(["City", "Food", "Housing", "Clothes", "Transportation", "Personal Care", "Entertainment"])
            for line in input_file:
                city = line.strip()
                url = BASE_URL.format(home_city=home_city, city=city)
                soup = BeautifulSoup(requests.get(url).text)
    
                table = soup.find("table", class_="comparison")
                differences = []
                for title in table.find_all("tr", class_="expandable"):
                    percent_difference = title.find("th", class_="percent")
                    if "expensiver" in percent_difference.span['class']:
                        differences.append('+' + percent_difference.span.string)
                    else:
                        differences.append('-' + percent_difference.span.string)
                writer.writerow([city] + differences)
    
    对于只包含一行纽约市的
    City.txt
    ,它将生成包含以下内容的
    Expatistan.csv

    City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
    new-york-city,+48%,+129%,+63%,+43%,+42%,+42%
    
    确保您了解我所做的更改。如果您需要进一步帮助,请告诉我。

    csv.writer.writerow()
    获取一个序列,并使每个元素成为一列;通常你会给它一个列列表,但你是在传递字符串;这将添加单个字符作为列

    只需构建一个列表,然后将其写入CSV文件

    首先,打开CSV文件一次,而不是针对每个单独的城市;每次打开文件时,您都在清除它

    import requests
    import csv
    from bs4 import BeautifulSoup
    
    HomeCity = 'Phoenix'
    
    with open("City.txt") as cities, open("Expatistan.csv", "wb") as outfile:
        writer = csv.writer(outfile)
        writer.writerow(["City", "Food", "Housing", "Clothes",
                         "Transportation", "Personal Care", "Entertainment"])
    
        for line in cities:
            city = line.strip()
            url = "http://www.expatistan.com/cost-of-living/comparison/{}/{}".format(
                HomeCity, city)
            resp = requests.get(url)
            soup = BeautifulSoup(resp.content, from_encoding=resp.encoding)
    
            titles = soup.select("table.comparison tr.expandable")
    
            row = [city]
            for title in titles:
                percent_difference = title.find("th", class_="percent")
                changeclass = percent_difference.span['class']
                change = percent_difference.span.string
                if "expensiver" in changeclass:
                    change = '+' + change
                else:
                    change = '-' + change
                row.append(change)
             writer.writerow(row)
    

    因此,首先,向
    writerow
    方法传递一个iterable,然后该iterable中的每个对象都会被写入,并用逗号分隔。因此,如果给它一个字符串,那么每个字符都会被分隔开:

    WriteResultsFile.writerow('hello there')
    

    h,e,l,l,o, ,t,h,e,r,e
    
    hello,there
    
    但是

    h,e,l,l,o, ,t,h,e,r,e
    
    hello,there
    
    这就是为什么你会得到这样的结果

    n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,8,%
    
    你剩下的问题都是你的网络垃圾中的错误。首先,当我浏览网站时,搜索带有CSS类“comparison”的表时,我得到了
    None
    。所以我不得不使用

    expatistan_table = soup_expatistan.find("table","comparison")
    
    现在,你的“if语句被破坏”的原因是

    percent_difference.span['class']
    
    返回一个列表。如果我们将其修改为

    差异百分比。跨度['class'][0]

    事情会按你期望的方式进行

    现在,您真正的问题是,在最内部的循环中,您会发现单个项目的价格变化百分比。您希望这些项目作为差价行中的项目,而不是单独的行。因此,我声明了一个空列表
    items
    ,并将
    percent_difference.span.string
    附加到该列表中,然后在最里面的循环外写入行,如下所示:

    items = []
    for expatistan_title in expatistan_titles:
            percent_difference = expatistan_title.find("th","percent")
            percent_difference_title = percent_difference.span["class"][0]
            print percent_difference_title
            if percent_difference_title == "expensiver":
                items.append('+' + percent_difference.span.string)
            else:
                items.append('-' + percent_difference.span.string)
    row = [Textfilelistsplit[i]]
    row.extend(items)
    WriteResultsFile.writerow(row)
    
    最后一个错误是,当您重新打开csv文件并覆盖所有内容时,
    循环中出现的错误,因此您最终只拥有最终城市。考虑到所有这些错误(其中许多您应该能够在没有帮助的情况下找到),我们将面临以下问题:

    #Prepare CSV writer.
    WriteResultsFile = csv.writer(open("Expatistan.csv","w"))
    
    i=0
    while i<len(Textfilelistsplit):
        url = "http://www.expatistan.com/cost-of-living/comparison/" + HomeCity + "/" + Textfilelistsplit[i]
        page  = requests.get(url).text
        print url
        soup_expatistan = BeautifulSoup(page)
    
        WriteResultsFile.writerow(["City","Food","Housing","Clothes","Transportation","Personal Care", "Entertainment"])
    
        expatistan_table = soup_expatistan.find("table","comparison")
        expatistan_titles = expatistan_table.find_all("tr","expandable")
    
        items = []
        for expatistan_title in expatistan_titles:
                percent_difference = expatistan_title.find("th","percent")
                percent_difference_title = percent_difference.span["class"][0]
                print percent_difference_title
                if percent_difference_title == "expensiver":
                    items.append('+' + percent_difference.span.string)
                else:
                    items.append('-' + percent_difference.span.string)
        row = [Textfilelistsplit[i]]
        row.extend(items)
        WriteResultsFile.writerow(row)
        i+=1
    
    #准备CSV编写器。
    WriteResultsFile=csv.writer(打开(“Expatistan.csv”、“w”))
    i=0
    
    而我YAA-又是另一个答案

    与其他答案不同,这将数据视为一系列键值对;ie:字典列表,然后写入CSV。向csv编写器(
    DictWriter
    )提供所需字段的列表,该列表将丢弃附加信息(超出指定字段),并将缺少的信息留空。此外,如果原始页面上的信息顺序发生更改,此解决方案不受影响

    我还假设您将在类似Excel的东西中打开CSV文件。需要为csv编写器提供额外的参数,才能很好地实现这一点(请参见
    方言
    参数)。考虑到我们没有清理返回的数据,我们应该使用无条件引号明确地对其进行分隔(请参见
    quoting
    parameter)


    TIL:BeautifulGroup中的选择器。