Python 从字符串数行_Python_Count_Split

Python 从字符串数行

python

Python 从字符串数行,python,count,split,Python,Count,Split,我需要创建一个程序，删除标点符号，一些特定的单词，重复和返回单词的左边和他们各自的行。我还需要跟踪副本。比如说, Python空闲索引器：键入行，以。仅在行的开头这是一股轻快的风来自北方，我青春的北方。风也很冷，比风还冷昨天的风。 . 索引为：轻快的1 吹1 风1,3,4 北2 青年2 冷3 昨天4 问题是：我需要跟踪留下的单词的行号以及它们的副本。我不能那样做 from string import * stopWords = [ "a", "i", "it", "am", "at

我需要创建一个程序，删除标点符号，一些特定的单词，重复和返回单词的左边和他们各自的行。我还需要跟踪副本。比如说,

Python空闲索引器：键入行，以。仅在行的开头这是一股轻快的风来自北方，我青春的北方。风也很冷，比风还冷昨天的风。 . 索引为：轻快的1 吹1 风1,3,4 北2 青年2 冷3 昨天4

问题是：我需要跟踪留下的单词的行号以及它们的副本。我不能那样做

from string import *

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \

              "of", "from", "here", "even", "the", "but", "and", "is", "my", \

              "them", "then", "this", "that", "than", "though", "so", "are" ]

endings = [ "es" , "ed" , "er", "ly"]

punctuation = [ ".", "," , ":" , ";" , "!" , "?" , "&" , "'" ]

unindexed_sentence = raw_input("type in lines, finish with a . at start of line only").lower()

#removing duplicates.
def unique_string(l):
    ulist = []
    ulist2 = []
    [ulist.append(x) for x in l if x not in ulist]
    [ulist2.append(x)]
    global ulist2

    return ulist
unindexed_sentence =' '.join(unique_string(unindexed_sentence.split()))

unindexed_sentence1 = split(unindexed_sentence,"\n")

list_unindexed = []



# splitting 
i = 0
while i<len(unindexed_sentence1):
    list_unindexed += [split(unindexed_sentence1[i])] 
    i+=1
countline = 0
i = 0
while i < len(list_unindexed):
    j = 0
    while j < len(list_unindexed[i]):
        if list_unindexed[i][j][0] in punctuation:
            list_unindexed[i][j] = list_unindexed[i][j][:0]
        if list_unindexed[i][j][-1] in punctuation:
            list_unindexed[i][j] = list_unindexed[i][j][:-1]
        if list_unindexed[i][j][-1] == "s":
            list_unindexed[i][j] = list_unindexed[i][j][:-1]
        if list_unindexed[i][j][-2:] in endings:
            list_unindexed[i][j] = list_unindexed[i][j][:-2]
        if list_unindexed[i][j][-3:] == "ing":
            list_unindexed[i][j] = list_unindexed[i][j][:-3]
        if list_unindexed[i][j] in stopWords:
            del list_unindexed[i][j]

        else:
            j += 1
    i += 1
    countline += 1

def new_line(n):
    split(n,"\n")
    count = 1
    if n[-1] == "\n":
        count += 1
    return count

string1 = str(list_unindexed)

string2 = str(string1)

string2 ='\n'.join(unique_string(string2.split()))   

print string2

从字符串导入*
stopWords=[“a”、“i”、“it”、“am”、“at”、“on”、“in”、“to”、“too”、“very”\
“of”，“from”，“here”，“偶数”，“the”，“but”，“and”，“is”，“my”\
“他们”、“那么”、“这个”、“那个”、“比”、“虽然”、“所以”、“是”]
词尾=[“es”、“ed”、“er”、“ly”]
标点符号=[”、“、”、“：”、“；”、“！”、“？”、“&”、“'']
未索引的句子=原始输入（“在行中键入，仅在行的开始处以“.结尾”）.lower（）
#删除重复项。
def唯一_字符串（l）：
ulist=[]
ulist2=[]
[如果x不在ulist中，则为l中的x追加（x）]
[ulist2.附加（x）]
全球ulist2
返回乌利斯特
未索引的_语句=“”.join（唯一的_字符串（未索引的_语句.split（）））
未索引的句子1=拆分（未索引的句子“\n”）
列表_未索引=[]
#分裂
i=0
而我是你的作业吗
以下是一些提示：

不做：从字符串导入*
。你不需要它
使用data.splitlines（）
获取行列表
使用enumerate（）
获取索引，例如：用于i，枚举中的行（data.splitlines（））
使用字典记录所有单词。每个值可以是一个列表或一组行号
最初不要删除重复项。您可以使用字典或集合来完成此操作