Python 如何重构重复代码

Python 如何重构重复代码,python,gensim,Python,Gensim,我有两个只在一行中不同的函数,所以为了避免代码重复,我想用这些函数的一般形式创建一个基类,然后为每个类继承它 职能1: def top_similar_traces(self, stack_trace, top=10): words_to_test = StackTraceProcessor.preprocess(stack_trace) words_to_test_clean = [w for w in np.unique(words_to_test).toli

我有两个只在一行中不同的函数,所以为了避免代码重复,我想用这些函数的一般形式创建一个基类,然后为每个类继承它

职能1:

def top_similar_traces(self, stack_trace, top=10):
        words_to_test = StackTraceProcessor.preprocess(stack_trace)
        words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

        # Cos-similarity
        all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
            [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

        for i, (doc_id, rwmd_distance) in enumerate(distances):

            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
            wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

        return sorted(similarities, key=lambda v: v[1])[:top]
职能2:

def top_similar_traces(self, stack_trace, top=10):
        words_to_test = StackTraceProcessor.preprocess(stack_trace)
        words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

        # Cos-similarity
        all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
            [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

        for i, (doc_id, rwmd_distance) in enumerate(distances):

            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
            wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

        return sorted(similarities, key=lambda v: v[1])[:top]
你可以看到唯一的区别是

        doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        doc_words_clean = [w for w in self.corpus[doc_id] if w in model]

您可以在超类中定义函数,如:

def top_similar_traces(self, stack_trace, t, top=10):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
        [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):

        if t=="something":
            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
        else:
            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

    return sorted(similarities, key=lambda v: v[1])[:top]
其中,
t
是一个字符串,用于做出您想要的决定,然后您应该从子类调用此方法,如:

def top_similar_traces(self, stack_trace, top=10):
    return super().top_similar_traces(stack_trace, "option", top)

这样的解决方案应该是可行的
t
可以是任何类型的变量(整数、字符串等)

只需将变化部分提取到单独的方法中即可。这样,基类就可以覆盖该部分并影响原始方法,而不必复制整个代码

大概是这样的:

# Base class
def top_similar_traces(self, stack_trace, top=10):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
        [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):
        # call another method here
        doc_words_clean = self.top_similar_traces_filter_words(doc_id)
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

    return sorted(similarities, key=lambda v: v[1])[:top]

# Subclass A
def top_similar_traces_filter_words(self, doc_id):
    return [w for w in self.corpus[doc_id].words if w in model]

# Subclass B
def top_similar_traces_filter_words(self, doc_id):
    return [w for w in self.corpus[doc_id] if w in model]
顺便说一句,我不知道你的
模型
来自哪里,但它似乎是一个全局变量。您可能应该避免这种情况,而是将其放在类中(或传入)。

您提到“……我想创建一个具有这些函数的一般形式的基类,然后为每个类继承它。”

我想指出的是,没有必要为此创建一个类。使用一个函数就可以了。在下面的示例中,我添加了第四个参数
words
,并将该值设置为
True
。如果将其保留为
True
,则函数将使用检查
self.corpus[doc\u id].words的行。如果使用
False
调用函数,它将使用检查
self.corpus[doc\u id]
的行

def top_similar_traces(self, stack_trace, top=10, words=True):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[[model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):
        if words == True:
            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        else:
            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

     return sorted(similarities, key=lambda v: v[1])[:top]
要使用该函数检查self.corpus[doc_id].words,请按以下方式调用:

top_similar_traces(<stack_trace>)
top_similar_traces(<stack_trace>, words=False)
top\u类似的跟踪()
要使用该函数检查self.corpus[doc_id],请这样调用它:

top_similar_traces(<stack_trace>)
top_similar_traces(<stack_trace>, words=False)
top\u类似跟踪(,words=False)

请不要包含指向外部站点的链接,而是将所有相关代码作为问题的一部分发布。有点难看的解决方案:您可以创建两个生成列表的函数,并将它们传递到函数参数中。当然,他们必须将self.corpus[doc\u id]
作为输入并返回列表。