在javascript中计算唯一茎的数组?
在javascript中,我有一个字符串数组在javascript中计算唯一茎的数组?,javascript,jquery,underscore.js,Javascript,Jquery,Underscore.js,在javascript中,我有一个字符串数组 [ "test", "tests", "abc", "abcdef" ] 我想创建一个只包含字符串唯一词干的新数组。例如,上面的数组将减少到 [ "test", "abc" ] …因为“test”是“tests”的词干,“abc”是“abcdef”的词干 最简单的方法是什么?最简单的方法是循环。首先,我建议按字母数排序单词,因此您可以这样做: var myArray = ["test", "tests", "a
[
"test",
"tests",
"abc",
"abcdef"
]
我想创建一个只包含字符串唯一词干的新数组。例如,上面的数组将减少到
[
"test",
"abc"
]
…因为“test”是“tests”的词干,“abc”是“abcdef”的词干
最简单的方法是什么?最简单的方法是循环。首先,我建议按字母数排序单词,因此您可以这样做:
var myArray = ["test", "tests", "abc", "abcdef"];
//this sorts from smallest to largest
myArray.sort(function(a,b){return a.length - b.length});
所以,现在myArray是从最小到最大排序的。现在,您将对每个元素进行循环,以检查它是否是下一个元素的词干
//this is the array where we will store the stems
var stemArray = [];
//the temporary stem goes here
var stem;
//this variable is used to capture a substring from each string
// to check against the stem variable
var check;
//loop over all the variables except the last
//since they are ordered from smallest to last, we are guaranteed that
//the last object wont be much of a stem
//and thus we can avoid that check
for (var i = 0; i < myArray.length - 1; i++){
//set the current stem
stem = myArray[i];
//then loop over the remainding objects
for (var j = i+1; j < myArray.length; j++){
//capture a substring
//so for example, stem = "abc" and the element we're testing against
//is "test", check will be equal to "tes" so the size matches "abc"
check = myArray[j].substring(0,stem.length);
//if stem and check are the same
//and since you wanted unique we check if it is unique
//alternatively, we could just break inside the next
//conditional statement and it would be more efficient
//and thus remove the indexOf test
//but i include it to explain the logic of how the algorithm works
if (stem === check){
if (stemArray.indexOf(stem) === -1){
//add the verified stem to the array of stems
stemArray.push(stem);
}
//In the case of ["t", "te", "test"], the above code
//will output ["t", "te"]
//if you want it to output just ["t"], uncomment the following
//myArray.splice(j,1);
//j--;
//these lines will remove the duplicate from myArray
//and correct the iteration due to the deletion
}
}
}
//这是存储茎的数组
var stemArray=[];
//临时的茎在这里
变种茎;
//此变量用于从每个字符串捕获子字符串
//对照阀杆变量进行检查
var检查;
//循环所有变量,最后一个除外
//由于它们是从小到大订购的,我们保证
//最后一个物体不会有多大的茎
//因此,我们可以避免这种检查
对于(var i=0;i“简单”当然是相对的
采取一种简单的观点,即词干总是与较长单词的第一个字符匹配,对数组进行排序,使词干位于较长单词之前(例如,“tests”之前的“test”),然后迭代数组,并针对以下成员测试每个成员,删除词干的扩展部分,例如
function getStems(arr) {
var a = arr.slice().sort();
var stem, len;
for (var i=0, iLen=a.length - 1; i<iLen; i++) {
stem = a[i];
len = stem.length;
while (i<iLen && stem == a[i+1].substring(0,len)) {
a.splice(i+1, 1);
--iLen;
}
}
return a;
}
var a = ["test","tests","abcdef","abcf","abcdefqw","abc"];
alert(getStems(a)); // abc, test
函数getStems(arr){
var a=arr.slice().sort();
茎变位,len;
对于(var i=0,iLen=a.length-1;i您还需要使用下划线字符串[https://github.com/epeli/underscore.string]
// make a new array that is the old array without...
var myArrayWithNoStems =_.without(myArray,function(word) {
// anything that matches the filter
return _.find(myArray, function(word2) {
// if this word wholly includes another word, return true
if (word != word2 && _.str.include(word, word2)) return true;
})
});
我不是反对者,我不知道原因,但你看过js/jquery中的任何词干算法吗?你对[“testa”、“testb”、“tesa”、“tea”、“ta”、“t”]
的结果会有什么期待?我为你想从[“a”、“any”、“anywhere”中获得[“a”]的情况添加了一个修复程序根据他对这个问题的评论,这实际上是他想要的:)在返回true;
之前缺少一个右括号,以及调用\uuuu.find
的右括号。我做不到,所以没有把它算作足够的更改,成为合法的编辑:(