Python如何从文件路径限制字符串
在python2中,如何限制从目录导入所有txt文件时字符串的长度?like wordlength=6000Python如何从文件路径限制字符串,python,string,split,Python,String,Split,在python2中,如何限制从目录导入所有txt文件时字符串的长度?like wordlength=6000 import glob raw_text = "" path = "/workspace/simple/*.txt" for filename in glob.glob(path): with open(filename, 'r') as f: for line in f: raw_text += line words = raw_t
import glob
raw_text = ""
path = "/workspace/simple/*.txt"
for filename in glob.glob(path):
with open(filename, 'r') as f:
for line in f:
raw_text += line
words = raw_text.split()
print(words)
此代码仅输入所有txt文件并在屏幕上打印。如何将其限制为6000个单词并仅打印6000个单词?这取决于您对单词的定义。如果只是用空格分隔的文本,那就相当容易了:当单词经过时数一数,当你有足够的单词时就停下来。例如:
word_limit = 6000
word_count = 0
for line in f:
word_count += len(line.split())
if word_count > word_limit:
break
raw_text += line
如果您想要6000个单词,可以修改循环,从最后一行中获取足够的单词,使其精确到6000个单词
如果你想让它更有效一点,那么放下原始文本,在循环中构建单词,一次一行,使用
line_words = line.split()
words.extend(line_words)
在这种情况下,您需要使用len(行字)进行检查。import glob
import glob
raw_text = ""
path = "/workspace/simple/*.txt"
for filename in glob.glob(path):
with open(filename, 'r') as f:
for line in f:
if len(raw_text.split())< N: ###here you put your number
raw_text += line
else:
break
words = raw_text.split()
print(words)
raw_text=“”
path=“/workspace/simple/*.txt”
对于glob.glob(路径)中的文件名:
将open(filename,'r')作为f:
对于f中的行:
如果len(raw_text.split())
假设您希望每个文件包含6000个或更少的单词
import glob, sys
path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []
for file in glob.glob(path):
with open(file) as f:
words += f.read().split()[:count]
print(words)
>>>python test.py "/workspace/simple/*.txt" 6000
您还可以为要归档的单词设置词典:
import glob, sys
path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}
for file in glob.glob(path):
with open(file) as f:
fwords[file] = f.read().split()[:count]
print(fwords)
如果只需要包含字数的文件
for file in glob.glob(path):
with open(file) as f:
tmp = f.read().split()
if len(tmp) == count : # only the count
fwords[file] = tmp
尝试用以下内容替换代码:
for filename in glob.glob(path):
with open(filename, 'r') as f:
word_limit = 12000
word_count = 0
for line in f:
word_count += len(line)
if word_count > word_limit:
break
raw_text += line
您可以添加
if
语句如果原始文本.split()
那么原始文本+=line
我想你的意思是len(line.spilt())
。。。为什么要分开两次呢?把这些单词放在单词表中,并在最后打印出来。我理解为单词总数不到6000个,但我可能误解了。如果是这样的话,我很乐意删除答案no-pb:)对不起,我看到你在那里做了什么。但效率不高。如果您有更有效的方法,请随时分享;)我输入了6000作为