Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/318.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何剥离txt文件中的多个内容?_Python_File_Strip - Fatal编程技术网

Python 如何剥离txt文件中的多个内容?

Python 如何剥离txt文件中的多个内容?,python,file,strip,Python,File,Strip,我正在创建一个函数,它读取txt文件的数据,文本文件设置为每行一句话。我有6项要求要删除该文件,以使其在以后的程序中可用: 1. Make everything lowercase 2. Split the line into words 3. Remove all punctuation, such as “,”, “.”, “!”, etc. 4. Remove apostrophes and hyphens, e.g. transform “can’t” into “cant” a

我正在创建一个函数,它读取txt文件的数据,文本文件设置为每行一句话。我有6项要求要删除该文件,以使其在以后的程序中可用:

 1. Make everything lowercase
 2. Split the line into words
 3. Remove all punctuation, such as “,”, “.”, “!”, etc.
 4. Remove apostrophes and hyphens, e.g. transform “can’t” into “cant” and 
 “first-born” into “firstborn”
 5. Remove the words that are not all alphabetic characters (do not remove 
 “can’t” because you have transformed it to “cant”, similarly for 
 “firstborn”).
 6. Remove the words with less than 2 characters, like “a”. 
这是我到目前为止所拥有的

def read_data(fp):
    file_dict={}
    fp=fp.lower
    fp=fp.strip(string.punctuation)
    lines=fp.readlines()

我有点卡住了,那么如何将此文件中的这6项删除?

这可以通过一系列正则表达式检查,然后循环删除所有少于2个字符的项来完成:

代码 输入 输出
作为旁注:fp通常代表文件指针,这是在open()之后得到的,并且fp没有.lower()。也许在命名方面会更仔细一些?
strip
只查看行的开头和结尾(它实际上是为了删除多余的空格)。此外,您还必须实际调用
lower()
查看如何使用来删除文件指针中的所有字符,这些字符是从另一个函数打开文件时使用的
import re

with open("text.txt", "r") as fi:
    lowerFile = re.sub("[^\w ]", "", fi.read().lower())
    lowerFile = re.sub("(^| )[^ ]*[^a-z ][^ ]*(?=$| )", "", lowerFile)
    words = [word for word in lowerFile.split() if len(word) >= 2]
    print(words)
I li6ke to swim, dance, and Run r8un88.
['to', 'swim', 'dance', 'and', 'run']