Python 从文件中筛选特定长度的字符串_Python_String_File_Python 2.7_Extract

Python 从文件中筛选特定长度的字符串

python string file python-2.7

Python 从文件中筛选特定长度的字符串,python,string,file,python-2.7,extract,Python,String,File,Python 2.7,Extract,我有一个包含内容的foo.txt文件 'w3ll' 'i' '4m' 'n0t' '4sed' 't0' 'it' 我试图提取所有包含两个字符的单词。我的意思是，输出文件应该只有 4m t0 it 我试过的是 with open("foo.txt" , 'r') as foo: listme = foo.read() string = listme.strip().split("'") 我想这会用‘符号’分开字符串。如何仅选择那些撇号中字符计数等于2的字符串？这应该可以：

我有一个包含内容的foo.txt文件

'w3ll' 'i' '4m' 'n0t' '4sed' 't0' 

'it'

我试图提取所有包含两个字符的单词。我的意思是，输出文件应该只有

4m
t0
it

我试过的是

with open("foo.txt" , 'r') as foo:
    listme = foo.read()

string =  listme.strip().split("'")

我想这会用‘符号’分开字符串。如何仅选择那些撇号中字符计数等于2的字符串？

这应该可以：

>>> with open('abc') as f, open('output.txt', 'w') as f2:
...     for line in f:
...         for word in line.split():    #split the line at whitespaces
...             word = word.strip("'")   # strip out `'` from each word
...             if len(word) == 2:       #if len(word) is 2 then write it to file
...                 f2.write(word + '\n')

print open('output.txt').read()
4m
t0
it

使用正则表达式：

考虑到您希望查找符号中包含的所有单词，这些单词正好有两个字符长：

import re
split = re.compile(r"'\w{2}'")

with open("file2","w") as fw:
    for word in split.findall(open("file","r").read()):
            fw.write(word.strip("'")+"\n")

由于您正在阅读由空格或逗号分隔的引号，因此可以使用csv模块：

import csv

with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
    reader=csv.reader(fin,delimiter=' ',quotechar="'")
    source=(e for line in reader for e in line)             
    for word in source:
        if len(word)<=2:
            print(word)
            fout.write(word+'\n')

@有什么错误吗？请在问题正文而非评论中发布此类示例，因为它们不可读。谢谢@Ashwini。但是regex方法将两个由逗号分隔的不同字符串作为一个字符串。当我运行代码查找20个字符时。word，它给了我，'9，'1186148119'，作为输出，这仍然有效，但它由许多不同的字符串组成，而不仅仅是一个。@abhikafle您的示例输入不包含任何'，'这就是为什么我没有处理它们。请把这些有问题的东西贴出来。你能添加“，”作为分隔两个字符串的标记吗？@abhikafle在第一个代码中将line.split替换为line.split“，”

import re
split = re.compile(r"'\w{2}'")

with open("file2","w") as fw:
    for word in split.findall(open("file","r").read()):
            fw.write(word.strip("'")+"\n")

import csv

with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
    reader=csv.reader(fin,delimiter=' ',quotechar="'")
    source=(e for line in reader for e in line)             
    for word in source:
        if len(word)<=2:
            print(word)
            fout.write(word+'\n')

i
4m
t0