Python 2.7 Webscraping-为什么我只获取HTML表中的最后一行？美丽之群_Python 2.7_Web Scraping_Beautifulsoup

Python 2.7 Webscraping-为什么我只获取HTML表中的最后一行？美丽之群

python-2.7 web-scraping

Python 2.7 Webscraping-为什么我只获取HTML表中的最后一行？美丽之群,python-2.7,web-scraping,beautifulsoup,Python 2.7,Web Scraping,Beautifulsoup,我正试图从我电脑上的文件夹中抓取一堆HTML文件。我想要的数据存储在一个表中，我可以从每个文件中获取表中的最后一行，但忽略其他行我已将部分HTML复制到Pastebin，如下所示：这是我目前掌握的代码。同样，它适用于最后一行，但不适用于其他行。所以我猜这个循环有问题？我试着想出来，但到目前为止没有结果： def processData( pageFile ): f = open(pageFile, "r") page = f.read() f.close()

我正试图从我电脑上的文件夹中抓取一堆HTML文件。我想要的数据存储在一个表中，我可以从每个文件中获取表中的最后一行，但忽略其他行

我已将部分HTML复制到Pastebin，如下所示：

这是我目前掌握的代码。同样，它适用于最后一行，但不适用于其他行。所以我猜这个循环有问题？我试着想出来，但到目前为止没有结果：

def processData( pageFile ):
    f = open(pageFile, "r")
    page = f.read()
    f.close()
    soup = BeautifulSoup(page)
    ewo = soup.find_all("td", {"class":"date"})
    ewo2 = soup.find_all("td", {"class":"user"})
    ewo3 = soup.find_all("p", {"class":"single"})
fishs = [ ]
dogs = [ ]
rats = [ ]
for html in ewo:
    feedbacks = BeautifulSoup(str(html).strip()).get_text().encode("utf-8").replace("\n", "") # convert the html to text
    fishs.append(feedbacks.encode("utf-8").strip())
for html2 in ewo2:
    feedbacks2 = BeautifulSoup(str(html2).strip()).get_text().encode("utf-8").replace("\n", "") # convert the html to text
    dogs.append(feedbacks2.encode("utf-8").strip())
    str1 = ''.join(dogs)
for html3 in ewo3:
    feedbacks3 = BeautifulSoup(str(html3).strip()).encode("utf-8").replace("\n", "") # convert the html to text
    rats.append(feedbacks3.encode("utf-8").split('<p class="single">')[1].split("</p>")[0].strip())
csvfile = open(today + ' evo.csv', 'ab')
writer = csv.writer(csvfile)
for fish, dog, rat in zip(fishs, dogs, rats):
    writer.writerow([fish, dog, rat])
csvfile.close()
today = datetime.datetime.now().strftime('%Y-%m-%d')
dir = "files/"
csvFile = today + " file.csv"
csvfile = open(csvFile, 'wb')
writer = csv.writer(csvfile)
writer.writerow(["F", "I", "V"])
csvfile.close()
fileList = os.listdir(dir)
totalLen = len(fileList)
count = 1
for htmlFile in fileList:
    path = os.path.join(dir, htmlFile) # get the file path
    processData(path) # process the data in the file
    print "Processed '" + path + "'(" + str(count) + "/" + str(totalLen) + ")..." # display status
    count = count + 1 # incriment counter

不要使用BeautifulSoap的find，而是通过CSS选择器（如soup.selecttr）从soup中删除所需的HTMl，从而使您的生活更轻松。您的类tdyour_ID您的代码不正确。例如，您的第一个For循环表示ewo中的For html。然而，ewo还没有被宣布。首先，它在一个函数内部，所以这就是作用域，其次，直到最后一个for循环，该函数才会被执行。也许你的例子中的缩进是错的？循环的那些都应该在processData函数中吗？