Java 解释使用位向量确定所有字符是否唯一
我对位向量如何工作感到困惑(对位向量不太熟悉)。下面是给出的代码。有人能带我看一下吗Java 解释使用位向量确定所有字符是否唯一,java,string,bit-manipulation,bitvector,Java,String,Bit Manipulation,Bitvector,我对位向量如何工作感到困惑(对位向量不太熟悉)。下面是给出的代码。有人能带我看一下吗 public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; if ((checker & (1 << val)) >
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
public静态布尔值isUniqueChars(String str){
int-checker=0;
对于(int i=0;i checker |=(1int checker
在此用作位的存储。整数值中的每一位都可以被视为一个标志,因此最终int
是一个位数组(标志)。代码中的每个位都说明是否在字符串中找到具有位索引的字符。出于相同的原因,您可以使用位向量而不是int
。它们之间有两个区别:
- Size
int
具有固定大小,通常为4字节,表示8*4=32位(标志)。位向量通常可以具有不同的大小,或者您应该在构造函数中指定大小
- API。使用位向量,您将有更易于阅读的代码,可能如下所示:
vector.SetFlag(4,true);//将索引4处的标志设置为true
对于int
您将有较低级别的位逻辑代码:
checker |=(1我有一个隐秘的怀疑,你从我正在读的同一本书中得到了这个代码……这里的代码本身并不像操作符-|=、&和公共静态void main(String[]args)那样神秘
{
//为了理解该算法,有必要了解以下内容:
//int-checker=0;
//这里我们使用的原语int几乎就像大小为32的数组,其中唯一的值可以是1或0
//因为在Java中,我们每个int有4个字节,每个字节有8位,我们总共有4x8=32位可以使用
//int val=str.charAt(i)-“a”;
//为了理解这里发生了什么,我们必须认识到所有字符都有一个数值
对于(int i=0;i<256;i++)
{
char val=(char)i;
系统输出打印(val);
}
//输出类似于:
//5月5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5第二部分是对中国传统文化的反思
//似乎有大约15个前导空格不能很好地复制粘贴,所以我不得不使用实空格
//要仅打印转发时“a”中的字符,请执行以下操作:
System.out.println();
System.out.println();
对于(int i=0;i<256;i++)
{
char val=(char)i;
//char val2=val+'a';//不兼容的类型。必需:char found:int
int val2=val+'a';//如果换成'a',我们必须在这里使用int,否则编译器会抱怨
char val3=(char)val2;//转换回char。应该有一种更优雅的方法来实现这一点。
系统输出打印(val3);
}
//请注意以下内容如何不起作用:
System.out.println();
System.out.println();
对于(int i=0;i<256;i++)
{
char val=(char)i;
int val2=val-'a';
char val3=(char)val2;
系统输出打印(val3);
}
//我不知道为什么会有两行:
//编辑我似乎无法将此复制到stackoverflow!
System.out.println();
System.out.println();
//回到我们最初的算法:
//int val=str.charAt(i)-“a”;
//我们将字符串的第i个字符转换为一个字符,并将其向右移位,因为加移位到右边,减移位到左边似乎是正确的
//如果((复选框和(10))返回false;
//这句话很有意思,让我们把它分解一下:
System.out.println(0我还假设您的示例来自本书,我的答案与此上下文相关
为了使用这个算法来解决这个问题,我们必须承认我们只将字符从a传递到z(小写)
由于只有26个字母,并且这些字母在我们使用的编码表中正确排序,这保证了所有潜在差异str.charAt(i)-“a”
将低于32(int变量checker
的大小)
正如Snowbear所解释的,我们将使用checker
变量作为一个位数组。让我们举例说明一种方法:
比方说
str等于“test”
- 首次通过(i=t)
检查程序==0(00000000000000000000)
在ASCII中,val=str.charAt(i)-'a'=116-97=19
关于1上面已经提供了两个很好的答案。因此我不想重复已经说过的所有内容。但是我确实想添加一些东西来帮助完成上述计划,因为我刚刚完成了相同的计划,并提出了几个问题,但在花了一些时间之后,我对这个计划有了更多的了解
首先,“checker”用于跟踪字符串中已遍历的字符,以查看是否有任何字符被重复
现在“checker”是一种int数据类型,因此它只能有32位或4个字节(取决于平台)因此,此程序只能对32个字符范围内的字符集正确运行。这就是为什么,此程序从每个字符中减去“a”,以便使此程序仅对小写字符运行。但是,如果混合使用小写和大写字符,则无法运行
顺便说一句,如果您不从每个字符中减去“a”(请参见下面的语句),那么这个程序将只对大写字符的字符串或只对小写字符的字符串正常工作。因此,上面的程序的范围也从小写字符增加到大写字符,但它们可以
00000000000000000000000000000001 a 2^0
00000000000000000000000000000010 b 2^1
00000000000000000000000000000100 c 2^2
00000000000000000000000000001000 d 2^3
00000000000000000000000000010000 e 2^4
00000000000000000000000000100000 f 2^5
00000000000000000000000001000000 g 2^6
00000000000000000000000010000000 h 2^7
00000000000000000000000100000000 i 2^8
00000000000000000000001000000000 j 2^9
00000000000000000000010000000000 k 2^10
00000000000000000000100000000000 l 2^11
00000000000000000001000000000000 m 2^12
00000000000000000010000000000000 n 2^13
00000000000000000100000000000000 o 2^14
00000000000000001000000000000000 p 2^15
00000000000000010000000000000000 q 2^16
00000000000000100000000000000000 r 2^17
00000000000001000000000000000000 s 2^18
00000000000010000000000000000000 t 2^19
00000000000100000000000000000000 u 2^20
00000000001000000000000000000000 v 2^21
00000000010000000000000000000000 w 2^22
00000000100000000000000000000000 x 2^23
00000001000000000000000000000000 y 2^24
00000010000000000000000000000000 z 2^25
a =00000000000000000000000000000001
checker=00000000000000000000000000000000
checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001
a and checker=0 no dupes condition
checker=00000000000000000000000000000001
z =00000010000000000000000000000000
z and checker=0 no dupes
checker=z or checker;
// checker now becomes 00000010000000000000000000000001
checker= 00000010000000000000000000000001
y = 00000001000000000000000000000000
checker and y=0 no dupes condition
checker= checker or y;
// checker now becomes = 00000011000000000000000000000001
checker= 00000011000000000000000000000001
a = 00000000000000000000000000000001
a and checker=1 we have a dupe
public static void main (String[] args)
{
//In order to understand this algorithm, it is necessary to understand the following:
//int checker = 0;
//Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
//Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with
//int val = str.charAt(i) - 'a';
//In order to understand what is going on here, we must realize that all characters have a numeric value
for (int i = 0; i < 256; i++)
{
char val = (char)i;
System.out.print(val);
}
//The output is something like:
// !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
//There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead
//To only print the characters from 'a' on forward:
System.out.println();
System.out.println();
for (int i=0; i < 256; i++)
{
char val = (char)i;
//char val2 = val + 'a'; //incompatible types. required: char found: int
int val2 = val + 'a'; //shift to the 'a', we must use an int here otherwise the compiler will complain
char val3 = (char)val2; //convert back to char. there should be a more elegant way of doing this.
System.out.print(val3);
}
//Notice how the following does not work:
System.out.println();
System.out.println();
for (int i=0; i < 256; i++)
{
char val = (char)i;
int val2 = val - 'a';
char val3 = (char)val2;
System.out.print(val3);
}
//I'm not sure why this spills out into 2 lines:
//EDIT I cant seem to copy this into stackoverflow!
System.out.println();
System.out.println();
//So back to our original algorithm:
//int val = str.charAt(i) - 'a';
//We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems
//if ((checker & (1 << val)) > 0) return false;
//This line is quite a mouthful, lets break it down:
System.out.println(0<<0);
//00000000000000000000000000000000
System.out.println(0<<1);
//00000000000000000000000000000000
System.out.println(0<<2);
//00000000000000000000000000000000
System.out.println(0<<3);
//00000000000000000000000000000000
System.out.println(1<<0);
//00000000000000000000000000000001
System.out.println(1<<1);
//00000000000000000000000000000010 == 2
System.out.println(1<<2);
//00000000000000000000000000000100 == 4
System.out.println(1<<3);
//00000000000000000000000000001000 == 8
System.out.println(2<<0);
//00000000000000000000000000000010 == 2
System.out.println(2<<1);
//00000000000000000000000000000100 == 4
System.out.println(2<<2);
// == 8
System.out.println(2<<3);
// == 16
System.out.println("3<<0 == "+(3<<0));
// != 4 why 3???
System.out.println(3<<1);
//00000000000000000000000000000011 == 3
//shift left by 1
//00000000000000000000000000000110 == 6
System.out.println(3<<2);
//00000000000000000000000000000011 == 3
//shift left by 2
//00000000000000000000000000001100 == 12
System.out.println(3<<3);
// 24
//It seems that the - 'a' is not necessary
//Back to if ((checker & (1 << val)) > 0) return false;
//(1 << val means we simply shift 1 by the numeric representation of the current character
//the bitwise & works as such:
System.out.println();
System.out.println();
System.out.println(0&0); //0
System.out.println(0&1); //0
System.out.println(0&2); //0
System.out.println();
System.out.println();
System.out.println(1&0); //0
System.out.println(1&1); //1
System.out.println(1&2); //0
System.out.println(1&3); //1
System.out.println();
System.out.println();
System.out.println(2&0); //0
System.out.println(2&1); //0 0010 & 0001 == 0000 = 0
System.out.println(2&2); //2 0010 & 0010 == 2
System.out.println(2&3); //2 0010 & 0011 = 0010 == 2
System.out.println();
System.out.println();
System.out.println(3&0); //0 0011 & 0000 == 0
System.out.println(3&1); //1 0011 & 0001 == 0001 == 1
System.out.println(3&2); //2 0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
System.out.println(3&3); //3 why?? 3 == 0011 & 0011 == 3???
System.out.println(9&11); // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!
//so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97
//why is it that the result of bitwise & is > 0 means its a dupe?
//lets see..
//0011 & 0011 is 0011 means its a dupe
//0000 & 0011 is 0000 means no dupe
//0010 & 0001 is 0011 means its no dupe
//hmm
//only when it is all 0000 means its no dupe
//so moving on:
//checker |= (1 << val)
//the |= needs exploring:
int x = 0;
int y = 1;
int z = 2;
int a = 3;
int b = 4;
System.out.println("x|=1 "+(x|=1)); //1
System.out.println(x|=1); //1
System.out.println(x|=1); //1
System.out.println(x|=1); //1
System.out.println(x|=1); //1
System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
System.out.println(y); //should be 3??
System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
System.out.println(y|=3); //0011 |= 0011, still 3
System.out.println(y|=4); //0011 |= 0100.. should be... 0111? so... 11? no its 7
System.out.println(y|=5); //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!
//so the |= is just a bitwise OR!
}
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a'; //the - 'a' is just smoke and mirrors! not necessary!
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
public static boolean is_unique(String input)
{
int using_int_as_32_flags = 0;
for (int i=0; i < input.length(); i++)
{
int numeric_representation_of_char_at_i = input.charAt(i);
int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
boolean already_bit_flagged = result_of_bitwise_and > 0; //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
if (already_bit_flagged)
return false;
using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
}
return true;
}
In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1 == 00000000000000000000000000000001
1 << 19 == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)
val = 101 - 97 = 4
1 == 00000000000000000000000000000001
1 << 4 == 00000000000000000000000000010000
checker |= (1 << val)
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)
(checker & (1 << val)) > 0
int val = str.charAt(i) - 'a';
public static boolean isUniqueStringUsingBitVectorClass(String s) {
final int ASCII_CHARACTER_SET_SIZE = 256;
final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);
// if more than 256 ASCII characters then there can't be unique characters
if(s.length() > 256) {
return false;
}
//this will be used to keep the location of each character in String
final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);
for(int i = 0; i < s.length(); i++) {
int charVal = s.charAt(i);
charBitLocation.set(charVal); //set the char location in BitSet
//check if tracker has already bit set with the bit present in charBitLocation
if(tracker.intersects(charBitLocation)) {
return false;
}
//set the tracker with new bit from charBitLocation
tracker.or(charBitLocation);
charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop
}
return true;
}
public static boolean isUniqueChars(String str) {
/*
checker is the bit array, it will have a 1 on the character index that
has appeared before and a 0 if the character has not appeared, you
can see this number initialized as 32 0 bits:
00000000 00000000 00000000 00000000
*/
int checker = 0;
//loop through each String character
for (int i = 0; i < str.length(); ++i) {
/*
a through z in ASCII are charactets numbered 97 through 122, 26 characters total
with this, you get a number between 0 and 25 to represent each character index
0 for 'a' and 25 for 'z'
renamed 'val' as 'characterIndex' to be more descriptive
*/
int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26
/*
created a new variable to make things clearer 'singleBitOnPosition'
It is used to calculate a number that represents the bit value of having that
character index as a 1 and the rest as a 0, this is achieved
by getting the single digit 1 and shifting it to the left as many
times as the character index requires
e.g. character 'd'
00000000 00000000 00000000 00000001
Shift 3 spaces to the left (<<) because 'd' index is number 3
1 shift: 00000000 00000000 00000000 00000010
2 shift: 00000000 00000000 00000000 00000100
3 shift: 00000000 00000000 00000000 00001000
Therefore the number representing 'd' is
00000000 00000000 00000000 00001000
*/
int singleBitOnPosition = 1 << characterIndex;
/*
This peforms an AND between the checker, which is the bit array
containing everything that has been found before and the number
representing the bit that will be turned on for this particular
character. e.g.
if we have already seen 'a', 'b' and 'd', checker will have:
checker = 00000000 00000000 00000000 00001011
And if we see 'b' again:
'b' = 00000000 00000000 00000000 00000010
it will do the following:
00000000 00000000 00000000 00001011
& (AND)
00000000 00000000 00000000 00000010
-----------------------------------
00000000 00000000 00000000 00000010
Since this number is different than '0' it means that the character
was seen before, because on that character index we already have a
1 bit value
*/
if ((checker & singleBitOnPosition) > 0) {
return false;
}
/*
Remember that
checker |= singleBitOnPosition is the same as
checker = checker | singleBitOnPosition
Sometimes it is easier to see it expanded like that.
What this achieves is that it builds the checker to have the new
value it hasnt seen, by doing an OR between checker and the value
representing this character index as a 1. e.g.
If the character is 'f' and the checker has seen 'g' and 'a', the
following will happen
'f' = 00000000 00000000 00000000 00100000
checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001
00000000 00000000 00000000 00100000
| (OR)
00000000 00000000 00000000 01000001
-----------------------------------
00000000 00000000 00000000 01100001
Therefore getting a new checker as 00000000 00000000 00000000 01100001
*/
checker |= singleBitOnPosition;
}
return true;
}
function checkIfUniqueChars (str) {
var checker = 0; // 32 or 64 bit integer variable
for (var i = 0; i< str.length; i++) {
var index = str[i].charCodeAt(0) - 96;
var bitRepresentationOfIndex = 1 << index;
if ( (checker & bitRepresentationOfIndex) > 1) {
console.log(str, false);
return false;
} else {
checker = (checker | bitRepresentationOfIndex);
}
}
console.log(str, true);
return true;
}
checkIfUniqueChars("abcdefghi"); // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false
// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000
str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1
checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false
// So, we go for the '`OR`' operation.
checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001
str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1
checker= 00000000000000000000000000000001
a = 00000000000000000000000000000001
checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now
private static String isUniqueCharsUsingBitSet(String string) {
BitSet bitSet =new BitSet();
for (int i = 0; i < string.length(); ++i) {
int val = string.charAt(i);
if(bitSet.get(val)) return "NO";
bitSet.set(val);
}
return "YES";
}
Line 1: public static boolean isUniqueChars(String str) {
Line 2: int checker = 0;
Line 3: for (int i = 0; i < str.length(); ++i) {
Line 4: int val = str.charAt(i) - 'a';
Line 5: if ((checker & (1 << val)) > 0) return false;
Line 6: checker |= (1 << val);
Line 7: }
Line 8: return true;
Line 9: }
Line 4: int val = str.charAt(i) - 'a';
val = 0; // 97 - 97 Which is a - a
val = 1; // 98 - 97 Which is b - a
val = 1; // 99 - 97 Which is c - a
fun isUnique(str: String): Boolean {
var checker = 0
for (i in str.indices) {
val bit = str.get(i) - 'a'
if (checker.and(1 shl bit) > 0) return false
checker = checker.or(1 shl bit)
}
return true
}