Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 比较二进制文件的文本表示法_Java_String_Text_Comparison_Diff - Fatal编程技术网

Java 比较二进制文件的文本表示法

Java 比较二进制文件的文本表示法,java,string,text,comparison,diff,Java,String,Text,Comparison,Diff,我有两个执行两个程序的xml报告。此类报告包含一个部分,其中列出了执行的所有i/o操作以及每个操作的内容。其中一些是xml,另一些是二进制的,但是报告中包含的数据总是文本的,所以我有一些类似的东西: .....0.................. .......................@........'F...O)v...O*......................0..........l...c...= Y!...!pvw.........(.........E... yY..

我有两个执行两个程序的xml报告。此类报告包含一个部分,其中列出了执行的所有i/o操作以及每个操作的内容。其中一些是xml,另一些是二进制的,但是报告中包含的数据总是文本的,所以我有一些类似的东西:

.....0.................. .......................@........'F...O)v...O*......................0..........l...c...=
Y!...!pvw.........(.........E...
yY...-qVC......p...K,......Pm.........Si4........,.......C0....?0....'...................K0....0
.   *...H......
....0I1.0   ..U....US1.0...U.
.
Google Inc1%0#..U....Google Internet Authority G20..
140423121609Z.
140722000000Z0f1.0  ..U....US1.0...U...
California1.0...U...
Mountain View1.0...U.
.
Google Inc1.0...U....*.google.com0...."0
.   *...H......
..........0....
..............>..........:...z...S...5...%f............-....*J...i.......c}m......N%...t....G..f.......y.........0x...F.........:......k...k$......!............I...A...........A...G.......q...C...g........r.......b....6.......c...|X.........F...?qs......'.........................mrM.....D....9...
....$...v... .........=.........amAdo..V.....................@.../...   U~....r......... .........g_    ...[y...7=...i... >......b......s...........W......#w..............e..........yI.........{..............0.....0...U.%..0...+.........+.......0.........U..........0.......*.google.com...
*.android.com....*.appengine.google.com....*.cloud.google.com....*.google-analytics.com....*.google.ca....*.google.cl....*.google.co.in....*.google.co.jp....*.google.co.uk....*.google.com.ar....*.google.com.au....*.google.com.br....*.google.com.co....*.google.com.mx....*.google.com.tr....*.google.com.vn....*.google.de....*.google.es....*.google.fr....*.google.hu....*.google.it....*.google.nl....*.google.pl....*.google.pt....*.googleapis.cn....*.googlecommerce.com....*.googlevideo.com...
*.gstatic.com...
*.gvt1.com....*.urchin.com....*.url.google.com....*.youtube-nocookie.com...
*.youtube.com....*.youtubeeducation.com....*.ytimg.com....android.com....g.co....goo.gl....google-analytics.com...
google.com....googlecommerce.com...
urchin.com....youtu.be....youtube.com....youtubeeducation.com0h..+.........0Z0+..+.....0.....http://pki.google.com/GIAG2.crt0+..+.....0.....http://clients1.google.com/ocsp0...U.........XV.H...%....r..!.......y...'0...U.........00...U.#..0.....J............h...v...b....Z.../0...U. ..0.0..
+.......y...00..U...)0'0%...#...!....http://pki.google.com/GIAG2.crl0
.   *...H......
..........A...d...A~A..0...P-JY/........"..M...N.=...H....n%...A......u......2...X......I........F...%....%p..............K...j...A.............g$Y...h....K....E...m......s/......t.....S..SN...Wo.B6.......a......|.............q........?.............y...N....K=....1......|+......3=.....6....j...&...H?.1.....X.H..#V".k.............-.....C.....5S......$.G............eMY(...1+,.e...v"......K...C...}.....V............28K......[......4A.Vr.......C0....?0....'...................K0....0
.   *...H......
....0I1.0   ..U....US1.0...U.
我必须对这些段进行比较,以找到相似之处,即两个程序是否向文件系统写入/读取相似的内容。另外,由于有许多i/o操作(100个)和许多报告(10000个),我应该很快完成。我正在使用java


有什么建议吗

最后我使用了标准化的压缩距离。我还不知道这是否是处理我的数据的最佳方法…

如何定义“相似”内容?这是问题之一XD将每个字符/字符集与整个第二个文档进行比较:p这将是相当大的负担先生XDI将提取3个或更多字母的字符串,为两个文档编制索引,然后比较这些索引的相似性。