Python 从文件中筛选特定长度的字符串
我有一个包含内容的foo.txt文件Python 从文件中筛选特定长度的字符串,python,string,file,python-2.7,extract,Python,String,File,Python 2.7,Extract,我有一个包含内容的foo.txt文件 'w3ll' 'i' '4m' 'n0t' '4sed' 't0' 'it' 我试图提取所有包含两个字符的单词。我的意思是,输出文件应该只有 4m t0 it 我试过的是 with open("foo.txt" , 'r') as foo: listme = foo.read() string = listme.strip().split("'") 我想这会用‘符号’分开字符串。 如何仅选择那些撇号中字符计数等于2的字符串?这应该可以:
'w3ll' 'i' '4m' 'n0t' '4sed' 't0'
'it'
我试图提取所有包含两个字符的单词。我的意思是,输出文件应该只有
4m
t0
it
我试过的是
with open("foo.txt" , 'r') as foo:
listme = foo.read()
string = listme.strip().split("'")
我想这会用‘符号’分开字符串。
如何仅选择那些撇号中字符计数等于2的字符串?这应该可以:
>>> with open('abc') as f, open('output.txt', 'w') as f2:
... for line in f:
... for word in line.split(): #split the line at whitespaces
... word = word.strip("'") # strip out `'` from each word
... if len(word) == 2: #if len(word) is 2 then write it to file
... f2.write(word + '\n')
print open('output.txt').read()
4m
t0
it
使用正则表达式:
考虑到您希望查找符号中包含的所有单词,这些单词正好有两个字符长:
import re
split = re.compile(r"'\w{2}'")
with open("file2","w") as fw:
for word in split.findall(open("file","r").read()):
fw.write(word.strip("'")+"\n")
由于您正在阅读由空格或逗号分隔的引号,因此可以使用csv模块:
import csv
with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
reader=csv.reader(fin,delimiter=' ',quotechar="'")
source=(e for line in reader for e in line)
for word in source:
if len(word)<=2:
print(word)
fout.write(word+'\n')
@有什么错误吗?请在问题正文而非评论中发布此类示例,因为它们不可读。谢谢@Ashwini。但是regex方法将两个由逗号分隔的不同字符串作为一个字符串。当我运行代码查找20个字符时。word,它给了我,'9,'1186148119',作为输出,这仍然有效,但它由许多不同的字符串组成,而不仅仅是一个。@abhikafle您的示例输入不包含任何','这就是为什么我没有处理它们。请把这些有问题的东西贴出来。你能添加“,”作为分隔两个字符串的标记吗?@abhikafle在第一个代码中将line.split替换为line.split“,”
import re
split = re.compile(r"'\w{2}'")
with open("file2","w") as fw:
for word in split.findall(open("file","r").read()):
fw.write(word.strip("'")+"\n")
import csv
with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
reader=csv.reader(fin,delimiter=' ',quotechar="'")
source=(e for line in reader for e in line)
for word in source:
if len(word)<=2:
print(word)
fout.write(word+'\n')
i
4m
t0