使用python读取包含多行的csv文件文件夹中的单元格并对行进行评分_Python_Csv

使用python读取包含多行的csv文件文件夹中的单元格并对行进行评分

python csv

使用python读取包含多行的csv文件文件夹中的单元格并对行进行评分,python,csv,Python,Csv,我是一个多层次问题的新手我试图根据关键字是否存在于一列中，对文件夹中每个CSV文件中的每条记录进行评分每个文件夹有五个CSV文件每个CSV文件包含大约20条记录每条记录有五列：标题、链接、作者、日期和正文我想阅读正文，计算“土豆”出现的时间，并将分数添加到记录的末尾当我成功地改编了Ben Welsh的[this code][1]时，我感到非常兴奋，因为它可以读取和评分整个CSV文件这里，被计算的关键词是“hasse” 这给了我每个文件一分，但我需要每个文件每行一分感谢所有提供帮助

我是一个多层次问题的新手

我试图根据关键字是否存在于一列中，对文件夹中每个CSV文件中的每条记录进行评分

每个文件夹有五个CSV文件

每个CSV文件包含大约20条记录

每条记录有五列：标题、链接、作者、日期和正文

我想阅读正文，计算“土豆”出现的时间，并将分数添加到记录的末尾

当我成功地改编了Ben Welsh的[this code][1]时，我感到非常兴奋，因为它可以读取和评分整个CSV文件

这里，被计算的关键词是“hasse”

这给了我每个文件一分，但我需要每个文件每行一分

感谢所有提供帮助的人。

编辑：此代码读取指定目录中所有扩展名为.csv的文件，然后将这些文件拆分为一系列行（所有这些行都包含在列表对象splitCsv中），然后统计每行中感兴趣的searchTerm（在第4行中定义）的实例。然后，它创建一个名为“foundHits.txt”的新文件，在该文件中，它写入给定行中搜索词的实例总数，后跟一个选项卡，后跟出现（或不出现）搜索词的行的内容：

如果下面的代码不适合您的目的，只需说出您希望如何修改它，我将很乐意帮助您Hi duhaime，谢谢您的帮助。我原以为这样行得通，但似乎列拆分不起作用。回溯（最近一次调用last）：文件“C:/Python27/Rank”，第21行，在splitRow[4]的if“potato”中：Indexer:list index out-rangecont'd——看起来拆分创建了一个包含标题[“‘title’，“‘author’，…]的行列表，然后将其作为一个列处理？？？您正在读取.csv文件吗？你能编辑你的问题并发布你的数据片段吗？您确定您的数据有5列（或更多列）吗？此外，拆分可以识别引用文本中的逗号。所以“16000个土豆”变成了“16000个土豆”——这是数据的链接。如果你有什么困难，请告诉我。使用任何你想测试的关键词。

import re, os

path = "./nietzsche"
freddys_library = os.listdir(path)
hate = open("hate.txt", "w")

for book in freddys_library:
    file = os.path.join(path, book)
    text = open(file, "r")
    hit_count = 0
    for line in text:
        if re.match("(.*)(hasse|hasst)(.*)", line):
            hit_count = hit_count + 1
            print >>  hate, book + "|" + line,

    print book + " => " + str(hit_count)
    text.close()

import os
import glob

searchTerm = "China"

#create output file in which we'll store all of the data that contains the word of interest
out = open("foundHits.txt", "w")

hitCount = 0

for csv in glob.glob("C:\\Users\\Douglas\\Desktop\\potato\\*.csv"):
    openCsv = open(csv, "r")
    readCsv = openCsv.read()

    #split each row into a string within the list "splitCsv"
    splitCsv = readCsv.split("\n")

    #for each row in the current csv
    for row in splitCsv:

        #split each row on the phrase "PM ET", which separates your metadata from the text data you want to scan
        splitRow = row.split("PM ET")

        #determine whether the string of interest appears in what I take to be the data of interest

        #use this if condition in order to prevent the script from breaking if it encounters a row that doesn't contain "PM ET"
        if len(splitRow) > 1:

            #split each time you encounter that word
            splitOnTerm = splitRow[1].split(searchTerm)

            searchTermInstances = len(splitOnTerm) - 1

            #print row for review
            out.write(str(searchTermInstances) + "\t" + str(row) + "\n")