使用python从文件中提取文本_Python_String

使用python从文件中提取文本

python string

使用python从文件中提取文本,python,string,Python,String,我在python代码中打开了一个文本文件。我想在文件中搜索并拉出后跟特定符号的文本。例如，我的文本文件名file.txt是：您好，这只是一个虚拟文件，其中包含的信息没有任何实质内容，我想在美元符号之间提取信息。所以这中间的所有美元都应该取出来，这样我就可以用它做任何我想做的事情，剩下的将是第二组以下是我的代码示例： class FileExtract(object): __init__(self): pass def extractFile(self):

我在python代码中打开了一个文本文件。我想在文件中搜索并拉出后跟特定符号的文本。例如，我的文本文件名file.txt是：

您好，这只是一个虚拟文件，其中包含的信息没有任何实质内容，我想在美元符号之间提取信息。所以这中间的所有美元都应该取出来，这样我就可以用它做任何我想做的事情，剩下的将是第二组

以下是我的代码示例：

class FileExtract(object):
    __init__(self):
        pass

    def extractFile(self):
        file = open(File.txt)
        wholeFile = file.read()
        file.close()
        symCount = wholefile.count("$") 
        count = 0 #Will count the each $ as it finds it
        begin = False #determines which the $ has been found and begin to start copying word
        myWant = [] #will add the portion I want
        for word in wholeFile.split():
            while(count != symCount):
                if word != "$" and begin == False:
                    break
                if word == "$" and begin == False:
                    myWant.append(word)
                    begin = True
                    count = count + 1 #it found one of the total symbols
                    break
                elif word != "$" and begin == True:
                    myWant.append(word)
                    break
                elif word == "$" and begin == True:
                    begin = False
                    break
        print myWant

我想让它打印：

"$ in between here should be pulled out so I can do what ever I want to with it" 
"$ and the rest of this will be a second group."

这是我唯一能想到的把课文拉出来的方法（我知道这很可怕，请放轻松，我只是在学习）。问题是，我的方法是将它放入一个列表中，我希望它只打印带有空格、换行符和所有字符的字符串。我忽略的任何建议或其他内置函数/方法对我有帮助吗？

好吧，你可以做

wholefile.split（“$”）

，然后有3个元素列表：第一个$之前是什么，在$之间是什么，第二个$之后是什么。（不含美元）

甚至

打印“\n$”.join（wholefile.split（“$”）

作为最小函数：

def extract_file(filename):
    return '\n$'.join(open(filename).read().split('$'))

当然，分隔符不会出现在结果中，但您自己附加分隔符是很简单的。

这就是它的用途。不过，在python中，您不需要使用flex来做同样的事情

firstFlag = False
secondFlag = False
outputFile1 = open('first.txt', 'wb')
outputFile2 = open('second.txt', 'wb')
yourFile = open('thefile.txt', 'rb')
while True:
    char = yourFile.read(1)
    if not char:
        break
    if char == '$'
        if firstFlag:
            secondFlag = True
        firstFlag = True
    if firstFlag and not secondFlag:
        outputFile1.write(data)
    elif secondFlag:
        outputFile2.write(data)

因为这不是本机C代码，所以不会太快。我建议您看看flex，不仅是为了方便的工具，也是为了学习体验

flex中的上述代码：

%option 8bit outfile="scanner.c"
%option nounput nomain noyywrap
%option warn

%x First
%x Second
%%

. { ECHO; }
\$ { BEGIN First; yyout = fopen("first.txt", "wb"); }
<First>\$ { BEGIN Second; fclose(yyout); yyout = fopen("second.txt", "wb");}
<First>. { ECHO; }
<Second>. { ECHO; }

%%

它将从stdin获取输入。

实际上非常简单。不使用拆分或将结果存储在列表中：

def extractFile(self):
    file = open(File.txt)
    wholeFile = file.read()
    file.close()

    pos = wholeFile.find("$")
    while pos > 0:
        pos2 = wholeFile.find("$")

        if pos2 > 0:
            print wholeFile[pos:pos2]
        else:
            print wholeFile[pos:]
        pos = pos2

这使它在列表格式，我真的不寻找。此外，它还对文件的开头部分进行了分组，该部分没有以$symbol开头，因此我如何知道这是否是我需要的部分？干净的代码，但请注意，您假设没有任何内存方面的考虑。如果他在32位操作系统上运行，并且想要通过几个Gig减去当前使用的内存来检查一个文件，那么他就倒霉了。文件的最大大小是多少？它会超过几百兆字节吗？

flex -Cf scanner.l 
gcc -O -o flexer.exe scanner.c

def extractFile(self):
    file = open(File.txt)
    wholeFile = file.read()
    file.close()

    pos = wholeFile.find("$")
    while pos > 0:
        pos2 = wholeFile.find("$")

        if pos2 > 0:
            print wholeFile[pos:pos2]
        else:
            print wholeFile[pos:]
        pos = pos2