Python 仅打印文件夹中第一个文件的内容，即使我要打印所有文件_Python

Python 仅打印文件夹中第一个文件的内容，即使我要打印所有文件

python

Python 仅打印文件夹中第一个文件的内容，即使我要打印所有文件,python,Python,我有一个函数，它可以在删除停止字后返回文件夹中的所有文件，但问题是，当我打印此函数的结果时，只打印第一个文件的内容，我想在删除停止字后打印所有文件我怎样才能解决这个问题 def remove_stop_word_from_files(): stop_words_list = get_stop_words() dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/' save_dir = &qu

我有一个函数，它可以在删除停止字后返回文件夹中的所有文件，但问题是，当我打印此函数的结果时，只打印第一个文件的内容，我想在删除停止字后打印所有文件

我怎样才能解决这个问题

def remove_stop_word_from_files():
    stop_words_list = get_stop_words()
    dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
    save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"

    for document in os.listdir(dir_path):
        with open(dir_path + document, "r") as reader:
            save_file = open(save_dir + document, 'w')
            text = reader.read()
            text_tokens = word_tokenize(text)
            tokens_without_sw = [word.replace(',', '').replace('.', '') for word in 
                     text_tokens if (word not in stop_words_list)]
            save_file.writelines(["%s " % item.replace(',', '').replace('.', '') for item in 
            tokens_without_sw])
            
    return tokens_without_sw
    
print(remove_stop_word_from_files())

您

return

语句在循环中。您需要将其缩进减少一级。此函数在执行第一次迭代后返回

此外，您在每次迭代后都会对其进行重击，而不是附加一个运行计数

def remove_stop_word_from_files():
    stop_words_list = get_stop_words()
    dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
    save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"

    all_tokens_without_sw = []
    for document in os.listdir(dir_path):
        with open(dir_path + document, "r") as reader, \
             open(save_dir + document, 'w') as save_file::
            text = reader.read()
            text_tokens = word_tokenize(text)
            tokens_without_sw =[word.replace(',', '').replace('.', '')
                                for word in text_tokens
                                if (word not in stop_words_list)])
            save_file.writelines(["%s " % item.replace(',', '').replace('.', '')
                                 for item in tokens_without_sw])
            all_tokens_without_sw.extend(tokens_without_sw)
            
    return all_tokens_without_sw
    
print(remove_stop_word_from_files())

问题是您在循环中使用了

返回令牌，\u而没有\u sw

。在循环中使用

return

语句时，它将中断循环。相反，尝试使用

yield

继续返回结果，因为结果将是一个

列表

。因此，您的代码应该如下所示：

def remove_stop_word_from_files():
    stop_words_list = get_stop_words()
    dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
    save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"

    for document in os.listdir(dir_path):
        with open(dir_path + document, "r") as reader:
            save_file = open(save_dir + document, 'w')
            text = reader.read()
            text_tokens = word_tokenize(text)
            tokens_without_sw = [word.replace(',', '').replace('.', '') for word in 
                     text_tokens if (word not in stop_words_list)]
            save_file.writelines(["%s " % item.replace(',', '').replace('.', '') for item in 
            tokens_without_sw])
            
        yield tokens_without_sw
    
print(list(remove_stop_word_from_files()))

线路

return tokens_without_sw

将导致函数在for循环的第一次迭代时结束。您可以创建另一个变量，如all_tokens_without_sw，您可以在for循环的末尾附加tokens_without_sw，而不是在for循环中返回tokens_without_sw。然后在for循环之后，您可以返回所有不带\u sw的\u令牌

def remove_stop_word_from_files():
    stop_words_list = get_stop_words()
    dir_path = 'C:/Users/Super/Desktop/IR/homework/Lab4/corpus/corpus/'
    save_dir = "C:/Users/Super/Desktop/IR/homework/Files_Without_SW/"
    
    all_tokens_without_sw = []

    for document in os.listdir(dir_path):
        with open(dir_path + document, "r") as reader:
            save_file = open(save_dir + document, 'w')
            text = reader.read()
            text_tokens = word_tokenize(text)
            tokens_without_sw = [word.replace(',', '').replace('.', '') for word in 
                     text_tokens if (word not in stop_words_list)]
            save_file.writelines(["%s " % item.replace(',', '').replace('.', '') for item in 
            tokens_without_sw])
            
        all_tokens_without_sw = all_tokens_without_sw + tokens_without_sw

    return all_tokens_without_sw
    
print(remove_stop_word_from_files())

我像你说的那样编辑我的帖子，但问题仍然存在。我也编辑了我的答案，但我想我现在必须修复一些其他问题。