Python如何从文件路径限制字符串_Python_String_Split

Python如何从文件路径限制字符串

python string

Python如何从文件路径限制字符串,python,string,split,Python,String,Split,在python2中，如何限制从目录导入所有txt文件时字符串的长度？like wordlength=6000 import glob raw_text = "" path = "/workspace/simple/*.txt" for filename in glob.glob(path): with open(filename, 'r') as f: for line in f: raw_text += line words = raw_t

在python2中，如何限制从目录导入所有txt文件时字符串的长度？like wordlength=6000

import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        for line in f:
            raw_text += line

words = raw_text.split()
print(words)

此代码仅输入所有txt文件并在屏幕上打印。如何将其限制为6000个单词并仅打印6000个单词？

这取决于您对单词的定义。如果只是用空格分隔的文本，那就相当容易了：当单词经过时数一数，当你有足够的单词时就停下来。例如：

    word_limit = 6000
    word_count = 0
    for line in f:
        word_count += len(line.split())
        if word_count > word_limit:
            break
        raw_text += line

如果您想要6000个单词，可以修改循环，从最后一行中获取足够的单词，使其精确到6000个单词

如果你想让它更有效一点，那么放下原始文本，在循环中构建单词，一次一行，使用

        line_words = line.split()
        words.extend(line_words)

在这种情况下，您需要使用len（行字）进行检查。

import glob
import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
            for line in f:
                if len(raw_text.split())< N:  ###here you put your number
                    raw_text += line
                else:
                    break
words = raw_text.split()
print(words)

raw_text=“”
path=“/workspace/simple/*.txt”
对于glob.glob（路径）中的文件名：
将open（filename，'r'）作为f：
对于f中的行：
如果len（raw_text.split（））

假设您希望每个文件包含6000个或更少的单词

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []

for file in glob.glob(path):
    with open(file) as f: 
        words += f.read().split()[:count]

print(words)

>>>python test.py "/workspace/simple/*.txt" 6000

您还可以为要归档的单词设置词典：

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}

for file in glob.glob(path):
    with open(file) as f: 
        fwords[file] = f.read().split()[:count]

print(fwords)

如果只需要包含字数的文件

for file in glob.glob(path):
    with open(file) as f: 
        tmp = f.read().split()
        if len(tmp) == count :  # only the count 
            fwords[file] = tmp

尝试用以下内容替换代码：

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        word_limit = 12000
        word_count = 0
        for line in f:
            word_count += len(line)
            if word_count > word_limit:
                break
            raw_text += line

您可以添加

if

语句

如果原始文本.split（）

那么原始文本+=line
我想你的意思是len（line.spilt（））
。。。为什么要分开两次呢？把这些单词放在单词表中，并在最后打印出来。我理解为单词总数不到6000个，但我可能误解了。如果是这样的话，我很乐意删除答案no-pb:）对不起，我看到你在那里做了什么。但效率不高。如果您有更有效的方法，请随时分享；）我输入了6000作为


[string]相关文章推荐



                                                        
String 如何生成满足某些限制的字符串？
string 
String 字符串比较：比较'；zü；里奇'；和'；苏黎世'；结果-1
string 
String 声明一个字符串变量，该变量可以在vb6中保存大于1024个字符的字符串
stringvb6 
String Oracle比较字符串
stringoracle 
String 为什么这段代码返回一个值两次？
string 
String 如何在smarty中将变量连接到字符串？
stringvariables 
String JPA在地图中搜索<；字符串，字符串>；
stringsearchjpajoinmap 
String 使用cellstr（情绪）时，输入必须是字符串MATLAB错误
stringmatlab 
String 从包含子字符串中特定字符的字符串中提取单词
stringexcelexcel-formula 
String Bash4：通过任意分隔符对字符串的子字符串（n）进行一般访问？
stringbash 
String 使用python处理列中的多个值
stringpython-2.7pandasdataframe 
String 如何返回字符串向量？
stringrust 
String 如何在Go中将uint16转换为2字节字符串？
stringgo 
tensorflow，如何将SparsetSensor与tf.string类型连接
stringtensorflow 
String 在golang中，为什么'a:=[]int32（'hello"；）起作用，而'a:=[]int（'hello"；）不起作用？
stringgo 
String 如何截断、删除或删除“后的字符串”：&引用；在Groovy？
stringgroovy 
String Ada中的多行字符串文字
stringada 
String 有没有办法在Scala中对CSV文件中的值进行排序？
stringscalacsvsorting 
String 主表中还应包含空值的PostgreSQL查询
stringpostgresqljoinselect 
String 如何在flatter中用多个分隔符分割字符串？
stringflutterdart 
                                       





随机文章推荐



                                                        
Atom editor Atom编辑器中可用的变量有哪些？如何使用它们？
atom-editor 
Atom editor 如何跨文件替换变量名？
atom-editor 
Atom editor 原子中的随机绿色高亮显示？
atom-editor 
Atom editor 在哪里可以找到Atom中python文件的默认颜色方案？
atom-editorless 
Atom editor Atom代码编辑器。为什么我的代码不是彩色的？
atom-editor 
Atom editor 如何更改多个特定关键字的颜色
atom-editor