Python来获取相关的软件名_Python_Nltk

Python来获取相关的软件名

python

Python来获取相关的软件名,python,nltk,Python,Nltk,我有一张excel表格，其中包含许多软件名称，如Visual studio 2012、Visual studio 2013、Visual studio 2017、Adobe Reader English、Adobe Reader Deutsche、Power shell 4.0、Power shell 2.0、Power shell 5.0 我只想得到一个相关的软件版本名。例如，在本例中，我希望我的输出是Visual studio 2013、Power shell 4.0、Adobe Reade

我有一张excel表格，其中包含许多软件名称，如Visual studio 2012、Visual studio 2013、Visual studio 2017、Adobe Reader English、Adobe Reader Deutsche、Power shell 4.0、Power shell 2.0、Power shell 5.0

我只想得到一个相关的软件版本名。例如，在本例中，我希望我的输出是Visual studio 2013、Power shell 4.0、Adobe Reader English，剩下的就不用说了。我正在使用Python NLP。我已经删除了所有的垃圾字符和版本号，但我不知道如何继续

有进一步建设的想法吗？在得到两个没有任何数字和垃圾字符的软件名后，我尝试了序列匹配，但是结果并不准确和有效

import pandas as pd
from nltk.tokenize import wordpunct_tokenize

df = pd.read_csv('C:\\Users\\533471\\Desktop\\Book2.csv', encoding='Windows-1252')
saved_column = df.RowLabels[:]
second_column = df.RowLabels[:]

print(saved_column)

for eachcol in saved_column:
    eachword = eachcol.split()
    print(eachword)

    for secondcol in second_column:
        sentence = None
        wordo = None
        punct = None

        x = []
        copy = []
        secondword = secondcol.split()[:]

        ####proceed only if the first word is equal
        if eachword[0] in secondword[0]:
            print("true")
            sentence = eachword[:]
            sentence += secondword

            ####splitting according to punctuations.
            for token in sentence:
                word = wordpunct_tokenize(token)

                if wordo is None:
                    wordo = word
                else:
                    wordo += word

            ####Removing all the punctuations.
            punct = [item for item in wordo if item.isalpha()]
            t = punct[:]
            t.reverse()

            for p in punct:
                print(p)
                if len(x) > 0:
                    print(x, "Appended")
                    a = str(p)
                    x += [p]
                    if p == x[0]:
                        break
                else:
                    print("list is empty")

                    x += [p]

            x.pop()
            for z in t:
                print(z)
                if len(copy) > 0:
                    print(copy, "appended")

                    copy += [z]
                    if z == punct[0]:
                        break
                else:
                    print("list is empty")
                    copy += [z]

                print(copy)

        else:
            print("false")

你的订单全搞糟了。你的订单全搞糟了。