Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/320.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python TFIDFvectorier:是否可以进行条件重新初始化?_Python_Nlp_Scikit Learn - Fatal编程技术网

Python TFIDFvectorier:是否可以进行条件重新初始化?

Python TFIDFvectorier:是否可以进行条件重新初始化?,python,nlp,scikit-learn,Python,Nlp,Scikit Learn,我试图有条件地重新初始化一个对象 假设我有以下初始化 TfidfVectorizer(sublinear_tf=True , decode_error='ignore', analyzer='word', tokenizer=nltk.data.load('tokenizers/punkt/english.pickle')) 现在,我从一个用户那里得到一个dict,其中包含一些他想要添加的参数 d = {"stop_words":"english"} 如何将dict参数添加到已初始化的对

我试图有条件地重新初始化一个对象

假设我有以下初始化

 TfidfVectorizer(sublinear_tf=True , decode_error='ignore', analyzer='word', tokenizer=nltk.data.load('tokenizers/punkt/english.pickle'))
现在,我从一个用户那里得到一个dict,其中包含一些他想要添加的参数

 d = {"stop_words":"english"}
如何将dict参数添加到已初始化的对象?因此,该对象的最终版本将与

TfidfVectorizer(
                             stop_words='english',
                             sublinear_tf=True ,
                             decode_error='ignore',
                             analyzer='word',
                             tokenizer=nltk.data.load('tokenizers/punkt/english.pickle'))
我能做什么

TfidfVectorizer(**d)
这是否也会保留以前初始化的参数?我希望TFIDFvectorier中有一些默认设置,然后我希望用户能够选择其余的设置


这样的事情可能吗

似乎可以使用
设置参数()
,从这个
设置参数()
获取参数()
的小实验中:

from sklearn.feature_extraction.text import TfidfVectorizer

t = TfidfVectorizer()

t.get_params()
Out[23]: 
{'analyzer': u'word',
 'binary': False,
 'charset': None,
 'charset_error': None,
 'decode_error': u'strict',
 'dtype': numpy.int64,
 'encoding': u'utf-8',
 'input': u'content',
 'lowercase': True,
 'max_df': 1.0,
 'max_features': None,
 'min_df': 1,
 'ngram_range': (1, 1),
 'norm': u'l2',
 'preprocessor': None,
 'smooth_idf': True,
 'stop_words': None,
 'strip_accents': None,
 'sublinear_tf': False,
 'token_pattern': u'(?u)\\b\\w\\w+\\b',
 'tokenizer': None,
 'use_idf': True,
 'vocabulary': None}

t.set_params(binary=True)
Out[24]: 
TfidfVectorizer(analyzer=u'word', binary=True, charset=None,
        charset_error=None, decode_error=u'strict',
        dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm=u'l2', preprocessor=None, smooth_idf=True,
        stop_words=None, strip_accents=None, sublinear_tf=False,
        token_pattern=u'(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True,
        vocabulary=None)

t.set_params(smooth_idf=False)
Out[25]: 
TfidfVectorizer(analyzer=u'word', binary=True, charset=None,
        charset_error=None, decode_error=u'strict',
        dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm=u'l2', preprocessor=None,
        smooth_idf=False, stop_words=None, strip_accents=None,
        sublinear_tf=False, token_pattern=u'(?u)\\b\\w\\w+\\b',
        tokenizer=None, use_idf=True, vocabulary=None)

d = {"stop_words":"english"}

t.set_params(**d)
Out[27]: 
TfidfVectorizer(analyzer=u'word', binary=True, charset=None,
        charset_error=None, decode_error=u'strict',
        dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm=u'l2', preprocessor=None,
        smooth_idf=False, stop_words='english', strip_accents=None,
        sublinear_tf=False, token_pattern=u'(?u)\\b\\w\\w+\\b',
        tokenizer=None, use_idf=True, vocabulary=None)
def set_params(self, **params):
    """Set the parameters of this estimator.
    The method works on simple estimators as well as on nested objects
    (such as pipelines). The former have parameters of the form
    ``<component>__<parameter>`` so that it's possible to update each
    component of a nested object.
    Returns
    -------
    self
    """
    if not params:
        # Simple optimisation to gain speed (inspect is slow)
        return self
    valid_params = self.get_params(deep=True)
    for key, value in six.iteritems(params):
        split = key.split('__', 1)
        if len(split) > 1:
            # nested objects case
            name, sub_name = split
            if name not in valid_params:
                raise ValueError('Invalid parameter %s for estimator %s. '
                                 'Check the list of available parameters '
                                 'with `estimator.get_params().keys()`.' %
                                 (name, self))
            sub_object = valid_params[name]
            sub_object.set_params(**{sub_name: value})
        else:
            # simple objects case
            if key not in valid_params:
                raise ValueError('Invalid parameter %s for estimator %s. '
                                 'Check the list of available parameters '
                                 'with `estimator.get_params().keys()`.' %
                                 (key, self.__class__.__name__))
            setattr(self, key, value)
    return self