Javascript 从句子中删除停止词_Javascript_Json_Regex_Reactjs_Stop Words

Javascript 从句子中删除停止词

javascript json regex reactjs

Javascript 从句子中删除停止词,javascript,json,regex,reactjs,stop-words,Javascript,Json,Regex,Reactjs,Stop Words,我有一个句子，但是这个句子被分成了两部分我的数据输出如下 const escapeRE = new RegExp(/([/\?""])/g); const myDatas = data.map(des => des.Sentence.toLowerCase().replace(escapeRE, '').split(' ')); [ [ 'yes', 'keep', 'go', 'apple', 'tabacco', 'javascript', 'no', 'uhh', 'omg',

我有一个句子，但是这个句子被分成了两部分

我的数据输出如下

const escapeRE = new RegExp(/([/\?""])/g);
const myDatas = data.map(des => des.Sentence.toLowerCase().replace(escapeRE, '').split(' '));

[ [ 'yes',
'keep',
'go',
'apple',
'tabacco',
'javascript',
'no',
'uhh',
'omg',
'hi.' ],
['say',
'hello',
'me',
'allright',
'maybe',
'mi',
'say.' 
....] ]

然后我有一个stop words

JSON

文件

停止字

JSON

文件的内容

['yes',
'hi',
'so',
'say',
'me',
'uhh',
'omg',
'go',
'hello',
'hi' 
 ...]

所以我想从数组句子中删除停止词。我要的是纯粹的句子，没有停顿的话。

stopwords

定义

const stopwords = require('./stop_words.json');

那我该怎么办？我不知道。我尝试了

myDatas.replace（'stopwords'，''）

函数，但它没有用

您可以使用jQuery grep函数实现您的目标。你可以像下面这样使用

    var withoutStopWords = jQuery.grep(myDatas, function(element, index){
  return stopwords.indexOf(element)<0;
                  });

var withoutStopWords=jQuery.grep（myDatas，函数（元素，索引）{
返回stopwords.indexOf（element）您可以使用如下数组原型：
Array.prototype.diff = function(stopwords) {
    return this.filter(function(word) {
        var punctuationlessWord = word.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        return stopwords.indexOf(punctuationlessWord) < 0;
    });
};

function removeStopWords(sentenceArray, stopWords, result = []) {
    sentenceArray.forEach((sentence) => {
        if (Array.isArray(sentence)) {
            result = removeStopWords(sentence, stopWords, result);
        } else if (!stopWords.includes(sentence)) {
            result = result.concat(sentence)
        }
    });

    return result;
}


var myDatas=[['是'，
"保持",，
“走”，
“苹果”，
“烟草”，
“javascript”，
“不”，
"嗯",，
“天哪”，
“你好。”，
[“说”，
“你好”，
“我”，
“好吧”，
“也许”，
"米",，
“说。”]]；
var stopwords=[“是”，
“嗨”，
“那么”，
“说”，
“我”，
"嗯",，
“天哪”，
“走”，
“你好”，
“嗨”]；
Array.prototype.diff=函数（停止字）{
返回此.filter（函数（字）{
var标点符号lesswigh=word.replace（/[，\/\\\！$%\^&\*；：{}=\-\\-\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\*；：{}=；
返回stopwords.indexOf（标点符号lesswill）<0；
});
};
forEach（函数（部分、索引、数组）{
数组[index]=数组[index].diff（停止字）；
});
console.log（myDatas）；
我想到的第一个问题是，您可以创建递归函数来迭代句子数组，只需检查句子单词是否在stopWords
数组中，如下所示：
Array.prototype.diff = function(stopwords) {
    return this.filter(function(word) {
        var punctuationlessWord = word.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        return stopwords.indexOf(punctuationlessWord) < 0;
    });
};

function removeStopWords(sentenceArray, stopWords, result = []) {
    sentenceArray.forEach((sentence) => {
        if (Array.isArray(sentence)) {
            result = removeStopWords(sentence, stopWords, result);
        } else if (!stopWords.includes(sentence)) {
            result = result.concat(sentence)
        }
    });

    return result;
}

这是ES6孤子
  myDatas.map(des => des.filter(word => stopWords.indexOf(word) < 0));

myDatas.map（des=>des.filter（word=>stopWords.indexOf（word）<0））；
但我只使用react es6。可能类似以下es6语法：让filtered=myDatas.filter（e=>this.indexOf（e）<0，stopwords）；
谢谢。我解决了你的问题。这是es6解决方案。我想对我来说，这个问题的最佳实践是myDatas.map（des=>des.filter（word=>stopwords.indexOf（word）<0））；
如果支持ES6，那么就可以了。