Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/396.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 从句子中删除停止词_Javascript_Json_Regex_Reactjs_Stop Words - Fatal编程技术网

Javascript 从句子中删除停止词

Javascript 从句子中删除停止词,javascript,json,regex,reactjs,stop-words,Javascript,Json,Regex,Reactjs,Stop Words,我有一个句子,但是这个句子被分成了两部分 我的数据输出如下 const escapeRE = new RegExp(/([/\?""])/g); const myDatas = data.map(des => des.Sentence.toLowerCase().replace(escapeRE, '').split(' ')); [ [ 'yes', 'keep', 'go', 'apple', 'tabacco', 'javascript', 'no', 'uhh', 'omg',

我有一个句子,但是这个句子被分成了两部分

我的数据输出如下

const escapeRE = new RegExp(/([/\?""])/g);
const myDatas = data.map(des => des.Sentence.toLowerCase().replace(escapeRE, '').split(' '));

[ [ 'yes',
'keep',
'go',
'apple',
'tabacco',
'javascript',
'no',
'uhh',
'omg',
'hi.' ],
['say',
'hello',
'me',
'allright',
'maybe',
'mi',
'say.' 
....] ]
然后我有一个stop words
JSON
文件

停止字
JSON
文件的内容

['yes',
'hi',
'so',
'say',
'me',
'uhh',
'omg',
'go',
'hello',
'hi' 
 ...]
所以我想从数组句子中删除停止词。 我要的是纯粹的句子,没有停顿的话。
stopwords
定义

const stopwords = require('./stop_words.json');

那我该怎么办?我不知道。我尝试了
myDatas.replace('stopwords','')
函数,但它没有用

您可以使用jQuery grep函数实现您的目标。你可以像下面这样使用

    var withoutStopWords = jQuery.grep(myDatas, function(element, index){
  return stopwords.indexOf(element)<0;
                  });
var withoutStopWords=jQuery.grep(myDatas,函数(元素,索引){

返回stopwords.indexOf(element)您可以使用如下数组原型:

Array.prototype.diff = function(stopwords) {
    return this.filter(function(word) {
        var punctuationlessWord = word.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        return stopwords.indexOf(punctuationlessWord) < 0;
    });
};
function removeStopWords(sentenceArray, stopWords, result = []) {
    sentenceArray.forEach((sentence) => {
        if (Array.isArray(sentence)) {
            result = removeStopWords(sentence, stopWords, result);
        } else if (!stopWords.includes(sentence)) {
            result = result.concat(sentence)
        }
    });

    return result;
}
var myDatas=[['是',
"保持",,
“走”,
“苹果”,
“烟草”,
“javascript”,
“不”,
"嗯",,
“天哪”,
“你好。”,
[“说”,
“你好”,
“我”,
“好吧”,
“也许”,
"米",,
“说。”]];
var stopwords=[“是”,
“嗨”,
“那么”,
“说”,
“我”,
"嗯",,
“天哪”,
“走”,
“你好”,
“嗨”];
Array.prototype.diff=函数(停止字){
返回此.filter(函数(字){
var标点符号lesswigh=word.replace(/[,\/\\\!$%\^&\*;:{}=\-\\-\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\*;:{}=;
返回stopwords.indexOf(标点符号lesswill)<0;
});
};
forEach(函数(部分、索引、数组){
数组[index]=数组[index].diff(停止字);
});

console.log(myDatas);
我想到的第一个问题是,您可以创建递归函数来迭代句子数组,只需检查句子单词是否在
stopWords
数组中,如下所示:

Array.prototype.diff = function(stopwords) {
    return this.filter(function(word) {
        var punctuationlessWord = word.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g, "");
        return stopwords.indexOf(punctuationlessWord) < 0;
    });
};
function removeStopWords(sentenceArray, stopWords, result = []) {
    sentenceArray.forEach((sentence) => {
        if (Array.isArray(sentence)) {
            result = removeStopWords(sentence, stopWords, result);
        } else if (!stopWords.includes(sentence)) {
            result = result.concat(sentence)
        }
    });

    return result;
}
这是ES6孤子

  myDatas.map(des => des.filter(word => stopWords.indexOf(word) < 0));
myDatas.map(des=>des.filter(word=>stopWords.indexOf(word)<0));

但我只使用react es6。可能类似以下es6语法:
让filtered=myDatas.filter(e=>this.indexOf(e)<0,stopwords);
谢谢。我解决了你的问题。这是es6解决方案。我想对我来说,这个问题的最佳实践是
myDatas.map(des=>des.filter(word=>stopwords.indexOf(word)<0));
如果支持ES6,那么就可以了。