Python 奇异值分解改变结果_Python_Svd

Python 奇异值分解改变结果

python

Python 奇异值分解改变结果,python,svd,Python,Svd,我试图使用svds执行文本摘要，但每次运行该函数时，摘要结果都会发生变化。有人能告诉我原因和解决办法吗？我甚至检查了indivudual阵列u、s和v，即使它们在每次运行后都会发生变化。如何使它们静止？在svds代码之后，句子矩阵的计算如下所示。数据集是澳大利亚最高法院的一些法律文件 def _compute_matrix(sentences, weighting, norm): if weighting.lower() == 'binary': vectorizer

我试图使用svds执行文本摘要，但每次运行该函数时，摘要结果都会发生变化。有人能告诉我原因和解决办法吗？我甚至检查了indivudual阵列u、s和v，即使它们在每次运行后都会发生变化。如何使它们静止？在svds代码之后，句子矩阵的计算如下所示。数据集是澳大利亚最高法院的一些法律文件

def _compute_matrix(sentences, weighting, norm):
    if weighting.lower() == 'binary':
        vectorizer = CountVectorizer(min_df=1, ngram_range=(1, 1), 
        binary=True, stop_words=None)
    elif weighting.lower() == 'frequency':
        vectorizer = CountVectorizer(min_df=1, ngram_range=(1, 1), 
        binary=False, stop_words=None)
    elif weighting.lower() == 'tfidf':
        vectorizer = TfidfVectorizer(min_df=1, ngram_range=(1, 1), 
        stop_words=None)
    else:
        raise ValueError('Parameter "method" must take one of the values 
        "binary", "frequency" or "tfidf".')

    # Extract word features from sentences using sparse vectorizer
    frequency_matrix = vectorizer.fit_transform(sentences).astype(float)

    terms = vectorizer.get_feature_names()

    if norm in ('l1', 'l2'):
        frequency_matrix = normalize(frequency_matrix, norm=norm, axis=1)
    elif norm is not None:
        raise ValueError('Parameter "norm" can only take values "l1", "l2" 
        or None')

    return frequency_matrix, terms

processed_sentences = _createsentences(raw_content)
sentence_matrix, feature_names = _compute_matrix(processed_sentences, 
weighting='tfidf', norm='l2')
sentence_matrix = sentence_matrix.transpose()
sentence_matrix = sentence_matrix.multiply(sentence_matrix > 0)
print(sentence_matrix.shape)

u, s, v = svds(sentence_matrix, k=20)
topic_sigma_threshold = 0.5
topic_averages = v.mean(axis=1)

for topic_ndx, topic_avg in enumerate(topic_averages):
    v[topic_ndx, v[topic_ndx, :] <= topic_avg] = 0

if 1 <= topic_sigma_threshold < 0:
   raise ValueError('Parameter topic_sigma_threshold must take a value 
   between 0 and 1')

sigma_threshold = max(s) * topic_sigma_threshold
s[s < sigma_threshold] = 0  

saliency_vec = np.dot(np.square(s), np.square(v))

top_sentences = saliency_vec.argsort()[-25:][::-1]
top_sentences.sort()

[processed_sentences[i] for i in top_sentences]

def_compute_矩阵（句子、权重、范数）：
如果weighting.lower（）
矢量器=计数矢量器（最小测向=1，ngram_范围=（1，1），
二进制=真，停止（字=无）
elif weighting.lower（）=“频率”：
矢量器=计数矢量器（最小测向=1，ngram_范围=（1，1），
二进制=假，停止（字=无）
elif weighting.lower（）=“tfidf”：
矢量器=TFIDF矢量器（最小值df=1，ngram范围=（1，1），
停止（单词=无）
其他：
raise VALUERROR（'参数“方法”必须采用其中一个值
“二进制”、“频率”或“tfidf”。）
#使用稀疏向量器从句子中提取单词特征
频率矩阵=矢量器。拟合变换（句子）。aType（浮点）
术语=矢量器。获取特征名称（）
如果（'l1'，'l2'）中的范数：
频率矩阵=归一化（频率矩阵，范数=范数，轴=1）
elif标准不是无：
raise VALUERROR（'参数“norm”只能取值“l1”、“l2”
或没有")
返回频率矩阵，术语
处理的句子=\u创建的句子（原始内容）
句子矩阵，特征名称=\u计算矩阵（已处理的句子，
权重='tfidf'，标准='l2'）
句子矩阵=句子矩阵。转置（）
句子矩阵=句子矩阵。乘法（句子矩阵>0）
打印（句子矩阵形状）
u、 s，v=svds（句子矩阵，k=20）
主题\u西格玛\u阈值=0.5
主题_平均值=v.平均值（轴=1）
对于主题ndx，枚举中的主题平均值（主题平均值）：
v[topic\u ndx，v[topic\u ndx，：]我通过玩svd的参数和理解svd的源代码找到了一个解决方案。svd使用稀疏矩阵维数N的随机初始向量。因此，要将初始向量设置为常数选择，我们必须使用v0参数，代码如下所述
np.random.seed(0)
v0 = np.random.rand(min(sentence_matrix.shape))

u, s, v = svds(sentence_matrix, k=20, v0=v0)

如果您的问题是为什么代码的第一行是不确定的，那么在不了解svds
和语句矩阵
变量的情况下无法回答。从某种意义上说，代码片段的其余部分没有帮助。请共享所有相关的代码和数据。请参阅：。我已经添加了代码片段的其余部分。希望可以这样做让您清楚地了解我的疑问。函数\u createSequences
只是NLTK中的一个句子标记器。