Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在另一个字节数组中查找一个字节数组的索引_Java_Search_Bytearray - Fatal编程技术网

Java 在另一个字节数组中查找一个字节数组的索引

Java 在另一个字节数组中查找一个字节数组的索引,java,search,bytearray,Java,Search,Bytearray,给定一个字节数组,如何在其中找到(较小)字节数组的位置 使用ArrayUtils看起来很有希望,但如果我是正确的,它只允许我在数组中查找要搜索的单个字节 (我看不出这有什么关系,但只是以防万一:有时搜索字节数组是常规ASCII字符,有时是控制字符或扩展ASCII字符。因此使用字符串操作并不总是合适的) 大的数组可能在10到10000字节之间,小的数组大约在10字节左右。在某些情况下,我将有几个较小的数组,我希望在一次搜索中在较大的数组中找到它们。我有时会想找到实例的最后一个索引,而不是第一个。最

给定一个字节数组,如何在其中找到(较小)字节数组的位置

使用
ArrayUtils
看起来很有希望,但如果我是正确的,它只允许我在数组中查找要搜索的单个字节

(我看不出这有什么关系,但只是以防万一:有时搜索字节数组是常规ASCII字符,有时是控制字符或扩展ASCII字符。因此使用字符串操作并不总是合适的)


大的数组可能在10到10000字节之间,小的数组大约在10字节左右。在某些情况下,我将有几个较小的数组,我希望在一次搜索中在较大的数组中找到它们。我有时会想找到实例的最后一个索引,而不是第一个。

最简单的方法是比较每个元素:

public int indexOf(byte[] outerArray, byte[] smallerArray) {
    for(int i = 0; i < outerArray.length - smallerArray.length+1; ++i) {
        boolean found = true;
        for(int j = 0; j < smallerArray.length; ++j) {
           if (outerArray[i+j] != smallerArray[j]) {
               found = false;
               break;
           }
        }
        if (found) return i;
     }
   return -1;  
}  

当您更新您的问题时:Java字符串是UTF-16字符串,它们不关心扩展的ASCII集,因此您可以使用string.indexOf()

Java字符串由16位
字符组成,而不是由8位
字节组成。
char
可以容纳
byte
,因此您可以始终将字节数组设置为字符串,并使用
indexOf
:ASCII字符、控制字符甚至零字符都可以正常工作

下面是一个演示:

byte[] big = new byte[] {1,2,3,0,4,5,6,7,0,8,9,0,0,1,2,3,4};
byte[] small = new byte[] {7,0,8,9,0,0,1};
String bigStr = new String(big, StandardCharsets.UTF_8);
String smallStr = new String(small, StandardCharsets.UTF_8);
System.out.println(bigStr.indexOf(smallStr));

但是,考虑到您的大阵列最多可以有10000个字节,而小阵列只有10个字节,此解决方案可能不是最有效的,原因有两个:

  • 它需要将大数组复制到两倍大的数组中(容量相同,但使用
    char
    而不是
    byte
    )。这将使您的内存需求增加三倍
  • Java的字符串搜索算法不是最快的。如果您实现一种高级算法,例如。这可能会将执行速度降低十倍(小字符串的长度),并且需要与小字符串的长度成比例的额外内存,而不是与大字符串的长度成比例

    • 这就是你要找的吗

      public class KPM {
          /**
           * Search the data byte array for the first occurrence of the byte array pattern within given boundaries.
           * @param data
           * @param start First index in data
           * @param stop Last index in data so that stop-start = length
           * @param pattern What is being searched. '*' can be used as wildcard for "ANY character"
           * @return
           */
          public static int indexOf( byte[] data, int start, int stop, byte[] pattern) {
              if( data == null || pattern == null) return -1;
      
              int[] failure = computeFailure(pattern);
      
              int j = 0;
      
              for( int i = start; i < stop; i++) {
                  while (j > 0 && ( pattern[j] != '*' && pattern[j] != data[i])) {
                      j = failure[j - 1];
                  }
                  if (pattern[j] == '*' || pattern[j] == data[i]) {
                      j++;
                  }
                  if (j == pattern.length) {
                      return i - pattern.length + 1;
                  }
              }
              return -1;
          }
      
          /**
           * Computes the failure function using a boot-strapping process,
           * where the pattern is matched against itself.
           */
          private static int[] computeFailure(byte[] pattern) {
              int[] failure = new int[pattern.length];
      
              int j = 0;
              for (int i = 1; i < pattern.length; i++) {
                  while (j>0 && pattern[j] != pattern[i]) {
                      j = failure[j - 1];
                  }
                  if (pattern[j] == pattern[i]) {
                      j++;
                  }
                  failure[i] = j;
              }
      
              return failure;
          }
      }
      
      公共类KPM{
      /**
      *在数据字节数组中搜索给定边界内第一次出现的字节数组模式。
      *@param数据
      *@param启动数据中的第一个索引
      *@param停止数据中的最后一个索引,以便停止开始=长度
      *@param pattern正在搜索的内容。“*”可以用作“任意字符”的通配符
      *@返回
      */
      公共静态int indexOf(字节[]数据、int开始、int停止、字节[]模式){
      if(data==null | | pattern==null)返回-1;
      int[]故障=计算故障(模式);
      int j=0;
      for(int i=开始;i<停止;i++){
      而(j>0&(模式[j]!='*'&&pattern[j]!=data[i])){
      j=故障[j-1];
      }
      如果(模式[j]='*'| |模式[j]==数据[i]){
      j++;
      }
      if(j==模式长度){
      返回i-模式长度+1;
      }
      }
      返回-1;
      }
      /**
      *使用引导过程计算故障函数,
      *模式与自身相匹配。
      */
      私有静态int[]计算失败(字节[]模式){
      int[]失败=新的int[pattern.length];
      int j=0;
      for(int i=1;i0&&pattern[j]!=pattern[i]){
      j=故障[j-1];
      }
      if(模式[j]==模式[i]){
      j++;
      }
      失效[i]=j;
      }
      返回失败;
      }
      }
      
      为节省测试时间:

      为您提供使computeFailure()为静态的代码:


      Google的Guava提供了Bytes.indexOf(byte[]数组,byte[]目标)

      因此,你可以在byte[]中找到byte[]的索引


      Github上的示例位于:

      从复制的内容几乎相同

      indexOf(char[],int,int,char[]int,int,int)

      静态int indexOf(字节[]源、int源偏移量、int源计数、字节[]目标、int目标偏移量、int目标计数、int fromIndex){
      if(fromIndex>=sourceCount){
      返回(targetCount==0?sourceCount:-1);
      }
      如果(从索引<0){
      fromIndex=0;
      }
      如果(targetCount==0){
      从索引返回;
      }
      字节第一=目标[targetOffset];
      int max=sourceOffset+(sourceCount-targetCount);
      对于(int i=sourceOffset+fromIndex;i,使用是最有效的方法

      是它的一个实现,是Twitter的大象鸟项目的一部分

      不建议包含此库,因为仅使用一个类就相当大

      import java.io.IOException;
      import java.io.InputStream;
      import java.util.Arrays;
      
      /**
       * An efficient stream searching class based on the Knuth-Morris-Pratt algorithm.
       * For more on the algorithm works see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
       */
      public class StreamSearcher
      {
          private byte[] pattern_;
          private int[] borders_;
      
          // An upper bound on pattern length for searching. Results are undefined for longer patterns.
          @SuppressWarnings("unused")
          public static final int MAX_PATTERN_LENGTH = 1024;
      
          StreamSearcher(byte[] pattern)
          {
              setPattern(pattern);
          }
      
          /**
           * Sets a new pattern for this StreamSearcher to use.
           *
           * @param pattern the pattern the StreamSearcher will look for in future calls to search(...)
           */
          public void setPattern(byte[] pattern)
          {
              pattern_ = Arrays.copyOf(pattern, pattern.length);
              borders_ = new int[pattern_.length + 1];
              preProcess();
          }
      
          /**
           * Searches for the next occurrence of the pattern in the stream, starting from the current stream position. Note
           * that the position of the stream is changed. If a match is found, the stream points to the end of the match -- i.e. the
           * byte AFTER the pattern. Else, the stream is entirely consumed. The latter is because InputStream semantics make it difficult to have
           * another reasonable default, i.e. leave the stream unchanged.
           *
           * @return bytes consumed if found, -1 otherwise.
           */
          long search(InputStream stream) throws IOException
          {
              long bytesRead = 0;
      
              int b;
              int j = 0;
      
              while ((b = stream.read()) != -1)
              {
                  bytesRead++;
      
                  while (j >= 0 && (byte) b != pattern_[j])
                  {
                      j = borders_[j];
                  }
                  // Move to the next character in the pattern.
                  ++j;
      
                  // If we've matched up to the full pattern length, we found it.  Return,
                  // which will automatically save our position in the InputStream at the point immediately
                  // following the pattern match.
                  if (j == pattern_.length)
                  {
                      return bytesRead;
                  }
              }
      
              // No dice, Note that the stream is now completely consumed.
              return -1;
          }
      
          /**
           * Builds up a table of longest "borders" for each prefix of the pattern to find. This table is stored internally
           * and aids in implementation of the Knuth-Moore-Pratt string search.
           * <p>
           * For more information, see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
           */
          private void preProcess()
          {
              int i = 0;
              int j = -1;
              borders_[i] = j;
              while (i < pattern_.length)
              {
                  while (j >= 0 && pattern_[i] != pattern_[j])
                  {
                      j = borders_[j];
                  }
                  borders_[++i] = ++j;
              }
          }
      }
      
      import java.io.IOException;
      导入java.io.InputStream;
      导入java.util.array;
      /**
      *基于Knuth-Morris-Pratt算法的高效流搜索类。
      *有关算法工作原理的更多信息,请参阅:http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
      */
      公共类流搜索器
      {
      专用字节[]模式;
      私有int[]边界;
      //用于搜索的模式长度上限。对于较长的模式,结果未定义。
      @抑制警告(“未使用”)
      公共静态最终int MAX_PATTERN_LENGTH=1024;
      StreamSearcher(字节[]模式)
      {
      设置模式(模式);
      }
      /**
      *设置此StreamSearcher要使用的新模式。
      *
      *@param pattern StreamSearcher将在以后的搜索调用中查找的模式(…)
      */
      公共void setPattern(字节[]模式)
      {
      pattern=数组.copyOf(pattern,pattern.length);
      边框=新整数[图案长度+1];
      预处理();
      }
      /**
      *搜索流中模式的下一个匹配项,
      
      public class KPM {
          /**
           * Search the data byte array for the first occurrence 
           * of the byte array pattern.
           */
          public static int indexOf(byte[] data, byte[] pattern) {
          int[] failure = computeFailure(pattern);
      
          int j = 0;
      
          for (int i = 0; i < data.length; i++) {
              while (j > 0 && pattern[j] != data[i]) {
                  j = failure[j - 1];
              }
              if (pattern[j] == data[i]) { 
                  j++; 
              }
              if (j == pattern.length) {
                  return i - pattern.length + 1;
              }
          }
          return -1;
          }
      
          /**
           * Computes the failure function using a boot-strapping process,
           * where the pattern is matched against itself.
           */
          private static int[] computeFailure(byte[] pattern) {
          int[] failure = new int[pattern.length];
      
          int j = 0;
          for (int i = 1; i < pattern.length; i++) {
              while (j>0 && pattern[j] != pattern[i]) {
                  j = failure[j - 1];
              }
              if (pattern[j] == pattern[i]) {
                  j++;
              }
              failure[i] = j;
          }
      
          return failure;
          }
      }
      
      public class Test {
          public static void main(String[] args) {
              do_test1();
          }
          static void do_test1() {
            String[] ss = { "",
                          "\r\n\r\n",
                          "\n\n",
                          "\r\n\r\nthis is a test",
                          "this is a test\r\n\r\n",
                          "this is a test\r\n\r\nthis si a test",
                          "this is a test\r\n\r\nthis si a test\r\n\r\n",
                          "this is a test\n\r\nthis si a test",
                          "this is a test\r\nthis si a test\r\n\r\n",
                          "this is a test"
                      };
            for (String s: ss) {
              System.out.println(""+KPM.indexOf(s.getBytes(), "\r\n\r\n".getBytes())+"in ["+s+"]");
            }
      
          }
      }
      
      package org.example;
      
      import java.util.List;
      
      import org.riversun.finbin.BinarySearcher;
      
      public class Sample2 {
      
          public static void main(String[] args) throws Exception {
      
              BinarySearcher bs = new BinarySearcher();
      
              // UTF-8 without BOM
              byte[] srcBytes = "Hello world.It's a small world.".getBytes("utf-8");
      
              byte[] searchBytes = "world".getBytes("utf-8");
      
              List<Integer> indexList = bs.searchBytes(srcBytes, searchBytes);
      
              System.out.println("indexList=" + indexList);
          }
       }
      
      indexList=[6, 25]
      
      static int indexOf(byte[] source, int sourceOffset, int sourceCount, byte[] target, int targetOffset, int targetCount, int fromIndex) {
          if (fromIndex >= sourceCount) {
              return (targetCount == 0 ? sourceCount : -1);
          }
          if (fromIndex < 0) {
              fromIndex = 0;
          }
          if (targetCount == 0) {
              return fromIndex;
          }
      
          byte first = target[targetOffset];
          int max = sourceOffset + (sourceCount - targetCount);
      
          for (int i = sourceOffset + fromIndex; i <= max; i++) {
              /* Look for first character. */
              if (source[i] != first) {
                  while (++i <= max && source[i] != first)
                      ;
              }
      
              /* Found first character, now look at the rest of v2 */
              if (i <= max) {
                  int j = i + 1;
                  int end = j + targetCount - 1;
                  for (int k = targetOffset + 1; j < end && source[j] == target[k]; j++, k++)
                      ;
      
                  if (j == end) {
                      /* Found whole string. */
                      return i - sourceOffset;
                  }
              }
          }
          return -1;
      }
      
      import java.io.IOException;
      import java.io.InputStream;
      import java.util.Arrays;
      
      /**
       * An efficient stream searching class based on the Knuth-Morris-Pratt algorithm.
       * For more on the algorithm works see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
       */
      public class StreamSearcher
      {
          private byte[] pattern_;
          private int[] borders_;
      
          // An upper bound on pattern length for searching. Results are undefined for longer patterns.
          @SuppressWarnings("unused")
          public static final int MAX_PATTERN_LENGTH = 1024;
      
          StreamSearcher(byte[] pattern)
          {
              setPattern(pattern);
          }
      
          /**
           * Sets a new pattern for this StreamSearcher to use.
           *
           * @param pattern the pattern the StreamSearcher will look for in future calls to search(...)
           */
          public void setPattern(byte[] pattern)
          {
              pattern_ = Arrays.copyOf(pattern, pattern.length);
              borders_ = new int[pattern_.length + 1];
              preProcess();
          }
      
          /**
           * Searches for the next occurrence of the pattern in the stream, starting from the current stream position. Note
           * that the position of the stream is changed. If a match is found, the stream points to the end of the match -- i.e. the
           * byte AFTER the pattern. Else, the stream is entirely consumed. The latter is because InputStream semantics make it difficult to have
           * another reasonable default, i.e. leave the stream unchanged.
           *
           * @return bytes consumed if found, -1 otherwise.
           */
          long search(InputStream stream) throws IOException
          {
              long bytesRead = 0;
      
              int b;
              int j = 0;
      
              while ((b = stream.read()) != -1)
              {
                  bytesRead++;
      
                  while (j >= 0 && (byte) b != pattern_[j])
                  {
                      j = borders_[j];
                  }
                  // Move to the next character in the pattern.
                  ++j;
      
                  // If we've matched up to the full pattern length, we found it.  Return,
                  // which will automatically save our position in the InputStream at the point immediately
                  // following the pattern match.
                  if (j == pattern_.length)
                  {
                      return bytesRead;
                  }
              }
      
              // No dice, Note that the stream is now completely consumed.
              return -1;
          }
      
          /**
           * Builds up a table of longest "borders" for each prefix of the pattern to find. This table is stored internally
           * and aids in implementation of the Knuth-Moore-Pratt string search.
           * <p>
           * For more information, see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
           */
          private void preProcess()
          {
              int i = 0;
              int j = -1;
              borders_[i] = j;
              while (i < pattern_.length)
              {
                  while (j >= 0 && pattern_[i] != pattern_[j])
                  {
                      j = borders_[j];
                  }
                  borders_[++i] = ++j;
              }
          }
      }
      
          private boolean multipartUploadParseOutput(InputStream is, OutputStream os, String boundary)
          {
              try
              {
                  String n = "--"+boundary;
                  byte[] bc = n.getBytes("UTF-8");
                  int s = bc.length;
                  byte[] b = new byte[s];
                  int p = 0;
                  long l = 0;
                  int c;
                  boolean r;
                  while ((c = is.read()) != -1)
                  {
                      b[p] = (byte) c;
                      l += 1;
                      p = (int) (l % s);
                      if (l>p)
                      {
                          r = true;
                          for (int i = 0; i < s; i++)
                          {
                              if (b[(p + i) % s] != bc[i])
                              {
                                  r = false;
                                  break;
                              }
                          }
                          if (r)
                              break;
                          os.write(b[p]);
                      }
                  }
                  os.flush();
                  return true;
              } catch(IOException e) {e.printStackTrace();}
              return false;
          }
      
      // The Knuth, Morris, and Pratt string searching algorithm remembers information about
      // the past matched characters instead of matching a character with a different pattern
      // character over and over again. It can search for a pattern in O(n) time as it never
      // re-compares a text symbol that has matched a pattern symbol. But, it does use a partial
      // match table to analyze the pattern structure. Construction of a partial match table
      // takes O(m) time. Therefore, the overall time complexity of the KMP algorithm is O(m + n).
      
      public class KMPSearch {
      
          public static int indexOf(byte[] haystack, byte[] needle)
          {
              // needle is null or empty
              if (needle == null || needle.length == 0)
                  return 0;
      
              // haystack is null, or haystack's length is less than that of needle
              if (haystack == null || needle.length > haystack.length)
                  return -1;
      
              // pre construct failure array for needle pattern
              int[] failure = new int[needle.length];
              int n = needle.length;
              failure[0] = -1;
              for (int j = 1; j < n; j++)
              {
                  int i = failure[j - 1];
                  while ((needle[j] != needle[i + 1]) && i >= 0)
                      i = failure[i];
                  if (needle[j] == needle[i + 1])
                      failure[j] = i + 1;
                  else
                      failure[j] = -1;
              }
      
              // find match
              int i = 0, j = 0;
              int haystackLen = haystack.length;
              int needleLen = needle.length;
              while (i < haystackLen && j < needleLen)
              {
                  if (haystack[i] == needle[j])
                  {
                      i++;
                      j++;
                  }
                  else if (j == 0)
                      i++;
                  else
                      j = failure[j - 1] + 1;
              }
              return ((j == needleLen) ? (i - needleLen) : -1);
          }
      }
      
      
      
      import java.util.Random;
      
      class KMPSearchTest {
          private static Random random = new Random();
          private static String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
      
          @Test
          public void testEmpty() {
              test("", "");
              test("", "ab");
          }
      
          @Test
          public void testOneChar() {
              test("a", "a");
              test("a", "b");
          }
      
          @Test
          public void testRepeat() {
              test("aaa", "aaaaa");
              test("aaa", "abaaba");
              test("abab", "abacababc");
              test("abab", "babacaba");
          }
      
          @Test
          public void testPartialRepeat() {
              test("aaacaaaaac", "aaacacaacaaacaaaacaaaaac");
              test("ababcababdabababcababdaba", "ababcababdabababcababdaba");
          }
      
          @Test
          public void testRandomly() {
              for (int i = 0; i < 1000; i++) {
                  String pattern = randomPattern();
                  for (int j = 0; j < 100; j++)
                      test(pattern, randomText(pattern));
              }
          }
      
          /* Helper functions */
          private static String randomPattern() {
              StringBuilder sb = new StringBuilder();
              int steps = random.nextInt(10) + 1;
              for (int i = 0; i < steps; i++) {
                  if (sb.length() == 0 || random.nextBoolean()) {  // Add literal
                      int len = random.nextInt(5) + 1;
                      for (int j = 0; j < len; j++)
                          sb.append(alphabet.charAt(random.nextInt(alphabet.length())));
                  } else {  // Repeat prefix
                      int len = random.nextInt(sb.length()) + 1;
                      int reps = random.nextInt(3) + 1;
                      if (sb.length() + len * reps > 1000)
                          break;
                      for (int j = 0; j < reps; j++)
                          sb.append(sb.substring(0, len));
                  }
              }
              return sb.toString();
          }
      
          private static String randomText(String pattern) {
              StringBuilder sb = new StringBuilder();
              int steps = random.nextInt(100);
              for (int i = 0; i < steps && sb.length() < 10000; i++) {
                  if (random.nextDouble() < 0.7) {  // Add prefix of pattern
                      int len = random.nextInt(pattern.length()) + 1;
                      sb.append(pattern.substring(0, len));
                  } else {  // Add literal
                      int len = random.nextInt(30) + 1;
                      for (int j = 0; j < len; j++)
                          sb.append(alphabet.charAt(random.nextInt(alphabet.length())));
                  }
              }
              return sb.toString();
          }
      
          private static void test(String pattern, String text) {
              try {
                  assertEquals(text.indexOf(pattern), KMPSearch.indexOf(text.getBytes(), pattern.getBytes()));
              } catch (AssertionError e) {
                  System.out.println("FAILED -> Unable to find '" + pattern + "' in '" + text + "'");
              }
          }
      }