Machine learning 为什么词干化比柠檬化效果更好？_Machine Learning_Nlp_Feature Extraction_Stemming_Lemmatization

Machine learning 为什么词干化比柠檬化效果更好？

machine-learning nlp

Machine learning 为什么词干化比柠檬化效果更好？,machine-learning,nlp,feature-extraction,stemming,lemmatization,Machine Learning,Nlp,Feature Extraction,Stemming,Lemmatization,我正在使用IMDB数据集和ClassificationTree。我尝试过词干法（PorterStemming）和柠檬化法（WordNetLemmatizer），它告诉我词干法优于柠檬化法，但在文书工作中，人们总是使用柠檬化法。所以我想知道，为什么我的呼气显示词干法更好结果: STEMMING Classification Report: precision recall f1-score support Negative 0.

我正在使用IMDB数据集和ClassificationTree。我尝试过词干法（PorterStemming）和柠檬化法（WordNetLemmatizer），它告诉我词干法优于柠檬化法，但在文书工作中，人们总是使用柠檬化法。所以我想知道，为什么我的呼气显示词干法更好

结果:

STEMMING
Classification Report: 
               precision    recall  f1-score   support

    Negative       0.70      0.71      0.70      4945
    Positive       0.71      0.70      0.71      5055

    accuracy                           0.70     10000
   macro avg       0.71      0.71      0.70     10000
weighted avg       0.71      0.70      0.71     10000

Confusion Matrix: 
 [[3504 1441]
 [1509 3546]]
Accuracy: 0.705

Process finished with exit code 0


LEMMATIZATION
Classification Report: 
               precision    recall  f1-score   support

    Negative       0.68      0.70      0.69      4945
    Positive       0.70      0.69      0.69      5055

    accuracy                           0.69     10000
   macro avg       0.69      0.69      0.69     10000
weighted avg       0.69      0.69      0.69     10000

Confusion Matrix: 
 [[3441 1504]
 [1589 3466]]
Accuracy: 0.6907

你做过统计显著性检验吗？这些值看起来很接近，实际上是更好，还是只是看起来有点像？