Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python如何从文件路径限制字符串_Python_String_Split - Fatal编程技术网

Python如何从文件路径限制字符串

Python如何从文件路径限制字符串,python,string,split,Python,String,Split,在python2中,如何限制从目录导入所有txt文件时字符串的长度?like wordlength=6000 import glob raw_text = "" path = "/workspace/simple/*.txt" for filename in glob.glob(path): with open(filename, 'r') as f: for line in f: raw_text += line words = raw_t

在python2中,如何限制从目录导入所有txt文件时字符串的长度?like wordlength=6000

import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        for line in f:
            raw_text += line

words = raw_text.split()
print(words)

此代码仅输入所有txt文件并在屏幕上打印。如何将其限制为6000个单词并仅打印6000个单词?

这取决于您对单词的定义。如果只是用空格分隔的文本,那就相当容易了:当单词经过时数一数,当你有足够的单词时就停下来。例如:

    word_limit = 6000
    word_count = 0
    for line in f:
        word_count += len(line.split())
        if word_count > word_limit:
            break
        raw_text += line
如果您想要6000个单词,可以修改循环,从最后一行中获取足够的单词,使其精确到6000个单词

如果你想让它更有效一点,那么放下原始文本,在循环中构建单词,一次一行,使用

        line_words = line.split()
        words.extend(line_words)
在这种情况下,您需要使用len(行字)进行检查。

import glob
import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
            for line in f:
                if len(raw_text.split())< N:  ###here you put your number
                    raw_text += line
                else:
                    break
words = raw_text.split()
print(words)
raw_text=“” path=“/workspace/simple/*.txt” 对于glob.glob(路径)中的文件名: 将open(filename,'r')作为f: 对于f中的行: 如果len(raw_text.split())
假设您希望每个文件包含6000个或更少的单词

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []

for file in glob.glob(path):
    with open(file) as f: 
        words += f.read().split()[:count]

print(words)

>>>python test.py "/workspace/simple/*.txt" 6000
您还可以为要归档的单词设置词典:

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}

for file in glob.glob(path):
    with open(file) as f: 
        fwords[file] = f.read().split()[:count]

print(fwords)
如果只需要包含字数的文件

for file in glob.glob(path):
    with open(file) as f: 
        tmp = f.read().split()
        if len(tmp) == count :  # only the count 
            fwords[file] = tmp

尝试用以下内容替换代码:

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        word_limit = 12000
        word_count = 0
        for line in f:
            word_count += len(line)
            if word_count > word_limit:
                break
            raw_text += line

您可以添加
if
语句
如果原始文本.split()
那么
原始文本+=line
我想你的意思是
len(line.spilt())
。。。为什么要分开两次呢?把这些单词放在单词表中,并在最后打印出来。我理解为单词总数不到6000个,但我可能误解了。如果是这样的话,我很乐意删除答案no-pb:)对不起,我看到你在那里做了什么。但效率不高。如果您有更有效的方法,请随时分享;)我输入了6000作为