Java中的哈夫曼树

Java中的哈夫曼树,java,huffman-code,Java,Huffman Code,我的哈夫曼树代码有问题。在main方法中,我输入了一个符号字符串,还输入了一个包含符号频率的整数数组。它应该打印出每个符号及其哈夫曼代码,但我认为这是错误的 代码如下: package huffman; import java.util.*; abstract class HuffmanTree implements Comparable<HuffmanTree> { public final int frequency; // the frequency of thi

我的哈夫曼树代码有问题。在main方法中,我输入了一个符号字符串,还输入了一个包含符号频率的整数数组。它应该打印出每个符号及其哈夫曼代码,但我认为这是错误的

代码如下:

 package huffman;

import java.util.*;

abstract class HuffmanTree implements Comparable<HuffmanTree> {
    public final int frequency; // the frequency of this tree
    public HuffmanTree(int freq) { frequency = freq; }

    // compares on the frequency
    public int compareTo(HuffmanTree tree) {
        return frequency - tree.frequency;
    }
}

class HuffmanLeaf extends HuffmanTree {
    public final char value; // the character this leaf represents

    public HuffmanLeaf(int freq, char val) {
        super(freq);
        value = val;
    }
}

class HuffmanNode extends HuffmanTree {
    public final HuffmanTree left, right; // subtrees

    public HuffmanNode(HuffmanTree l, HuffmanTree r) {
        super(l.frequency + r.frequency);
        left = l;
        right = r;
    }
}

public class Huffman {
    // input is an array of frequencies, indexed by character code
    public static HuffmanTree buildTree(int[] charFreqs, char[] test2) {
        PriorityQueue<HuffmanTree> trees = new PriorityQueue<HuffmanTree>();
        // initially, we have a forest of leaves
        // one for each non-empty character
        for (int i = 0; i < charFreqs.length; i++)
            if (charFreqs[i] > 0)
                trees.offer(new HuffmanLeaf(charFreqs[i], test2[i]));

        assert trees.size() > 0;
        // loop until there is only one tree left
        while (trees.size() > 1) {
            // two trees with least frequency
            HuffmanTree a = trees.poll();
            HuffmanTree b = trees.poll();

            // put into new node and re-insert into queue
            trees.offer(new HuffmanNode(a, b));
        }
        return trees.poll();
    }

    public static void printCodes(HuffmanTree tree, StringBuffer prefix) {
        assert tree != null;
        if (tree instanceof HuffmanLeaf) {
            HuffmanLeaf leaf = (HuffmanLeaf)tree;

            // print out character, frequency, and code for this leaf (which is just the prefix)
            System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);

        } else if (tree instanceof HuffmanNode) {
            HuffmanNode node = (HuffmanNode)tree;

            // traverse left
            prefix.append('0');
            printCodes(node.left, prefix);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            printCodes(node.right, prefix);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void main(String[] args) {
        //Symbols:
        String str = "12345678"; 
        char[] test2 = str.toCharArray();
        //Frequency (of the symbols above):
        int[] charFreqs = {36,18,12,9,7,6,5,4};


        // build tree
        HuffmanTree tree = buildTree(charFreqs,test2);

        // print out results
        System.out.println("SYMBOL\tFREQ\tHUFFMAN CODE");
        printCodes(tree, new StringBuffer());
    }
}
这很奇怪,例如符号7应该是:11110,符号8应该是:11111


您能帮我吗?

位模式的分配与代码的最佳性无关。你的作业会做得很好。这没什么奇怪的。你也可以表达对2:110、3:100或4:1110、5:1011的担忧,但这些也可以

对代码施加命令的唯一原因是减少将代码从压缩器传输到解压缩器所需的位数。您可以发送每个符号的代码长度,而不是发送代码,只要代码在两侧的长度相同即可

在这种情况下,方法通常是按数字顺序将代码分配给已排序的符号列表。那么,如果符号8是按顺序分配的,那么符号7的代码“值”确实比符号8低

对于您的示例,这样一个规范代码应该是:

1: 1 - 0
2: 3 - 100
3: 3 - 101
4: 4 - 1100
5: 4 - 1101
6: 4 - 1110
7: 5 - 11110
8: 5 - 11111
您只需获取长度,并在相同长度内对符号进行排序。然后分配从0开始递增的代码,随着长度的增加,在末尾添加位


请注意,这是一个不寻常的示例,其中符号顺序也是频率顺序。通常情况并非如此。

位模式的分配与代码的最佳性无关。你的作业会做得很好。这没什么奇怪的。你也可以表达对2:110、3:100或4:1110、5:1011的担忧,但这些也可以

对代码施加命令的唯一原因是减少将代码从压缩器传输到解压缩器所需的位数。您可以发送每个符号的代码长度,而不是发送代码,只要代码在两侧的长度相同即可

在这种情况下,方法通常是按数字顺序将代码分配给已排序的符号列表。那么,如果符号8是按顺序分配的,那么符号7的代码“值”确实比符号8低

对于您的示例,这样一个规范代码应该是:

1: 1 - 0
2: 3 - 100
3: 3 - 101
4: 4 - 1100
5: 4 - 1101
6: 4 - 1110
7: 5 - 11110
8: 5 - 11111
您只需获取长度,并在相同长度内对符号进行排序。然后分配从0开始递增的代码,随着长度的增加,在末尾添加位


请注意,这是一个不寻常的示例,其中符号顺序也是频率顺序。通常情况并非如此。

只需再添加一个0即可理解finish位。(超过3位读数)

1360 3 12 100 6 6 1010 5 7 1011'0 2 18 110 4 9 1110 8 4 11110
7 5 11111'0

只需再添加一个0即可理解完成位。(超过3位读数)

1360 3 12 100 6 6 1010 5 7 1011'0 2 18 110 4 9 1110 8 4 11110
7 5 11111'0

从答案的评论中回答问题:


嘿,马克,谢谢你的帮助,但我真的不明白你是怎么得到这些代码的?我需要在代码中做很多更改吗?从答案的注释中回答问题:


嘿,马克,谢谢你的帮助,但我真的不明白你是怎么得到这些代码的?我需要在代码中做很多更改吗?发布了我基于Princeton.EDU版本创建的完整的Java huffman树实现。普林斯顿软件包的版本可以作为一个学术性的例子,但在上下文之外并没有真正使用

我下面的例子可以通过提供一个字符串作为输入并获取一个字节数组作为输出来使用

如果您想将输出编码为人类可读的格式,那么可以使用标准的JavaBase64编码器(在空间方面效率很低),或者检查我发布的HumanByte类

import java.io.ByteArrayInputStream;
导入java.io.ByteArrayOutputStream;
导入java.util.PriorityQueue;
/**************************************************************************************************************
*使用了huffman算法的edu.princeton.cs版本,使其在示例上下文之外易于使用
*用Java标准队列替换专有优先级队列
*创建具有所需功能最少的二进制输出/输入
*
*如果需要哈夫曼编码字符串的可读编码,请查看HumanByte类
*那是贴在这里的
* 
* https://stackoverflow.com/questions/4141317/how-to-convert-byte-array-into-human-readable-format/58332208#58332208
*
*@作者斯坦·索科洛夫
* 10/9/19
**************************************************************************************************************/
公共类BetterHuffman{
专用静态最终整数R=256;
私有静态最终字符父项='\u0000';
私有静态最终字符为空='\u0001';
私有静态final int UNDEFINED=-1;
/**************************************************************************************************************
*以字节为单位压缩字符串
**************************************************************************************************************/
公共静态字节[]压缩(最终字符串s){
final HuffmanOut binaryOut=新的HuffmanOut();
final char[]输入=s.toCharArray();
最终整数[]频率=新整数[R];
for(字符输入1:输入){
++频率[输入1];
}
最终BetterHuffman.Node root=buildTrie(fr
Example:
2x2: 00, 01  (next is 10)
4x3: 10 + (00, 01, 10) = 1000, 1001, 1010 (next is 1011)
5x3: 1011 + (0, 1, 0 + 10) = 10110, 10111, 10110 + 10 = 11000 (next would be 11001)...
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.PriorityQueue;

/**************************************************************************************************************
 * Used edu.princeton.cs version of huffman algorithm and made it easily usable outside of example context
 * replaced proprietary priority queue by Java standard one
 * created binary output/input that has minimal features needed
 *
 * If human readable encoding of huffman encoded string is desired then look at HumanByte class
 * that was posted here
 * 
 * https://stackoverflow.com/questions/4141317/how-to-convert-byte-array-into-human-readable-format/58332208#58332208
 *
 * @author Stan Sokolov
 * 10/9/19
 **************************************************************************************************************/
public class BetterHuffman {
    private static final int R = 256;
    private static final char PARENT = '\u0000';
    private static final char EMPTY = '\u0001';
    private static final int UNDEFINED = -1;


    /**************************************************************************************************************
     *   Compress string in bytes
     **************************************************************************************************************/
    public static byte[] compress(final String s) {

        final HuffmanOut binaryOut = new HuffmanOut();
        final char[] input = s.toCharArray();
        final int[] freq = new int[R];

        for (char anInput1 : input) {
            ++freq[anInput1];
        }

        final BetterHuffman.Node root = buildTrie(freq);
        final String[] st = new String[R];
        buildCode(st, root, "");

        writeTrie(root, binaryOut);
        binaryOut.write(input.length);


        for (char anInput : input) {
            final String code = st[anInput];
            for (char ch : code.toCharArray()) {
                binaryOut.writeBit(ch == '1');
            }
        }

        return binaryOut.value();
    }

    /**************************************************************************************************************
     *   build huffman tree
     **************************************************************************************************************/
    private static BetterHuffman.Node buildTrie(int[] freq) {
        final PriorityQueue<BetterHuffman.Node> pq2 = new PriorityQueue<>();

        for (char i = 0; i < R; ++i) {
            if (freq[i] > 0) {
                //pq.insert(new BetterHuffman.Node(i, freq[i], null, null));
                pq2.add(new BetterHuffman.Node(i, freq[i], null, null));
            }
        }

        if (pq2.size() == 1) {//if entire string is just one char repeated
            if (freq[0] == 0) {//empty string
                pq2.add(new BetterHuffman.Node(PARENT, 0, null, null));
            } else {
                pq2.add(new BetterHuffman.Node(EMPTY, 0, null, null));
            }
        } else
            while (pq2.size() > 1) {
                final BetterHuffman.Node left = pq2.poll();
                final BetterHuffman.Node right = pq2.poll();
                //aggregate two nodes into one by summing frequency
                final BetterHuffman.Node parent = new BetterHuffman.Node(PARENT, left.freq + right.freq, left, right);
                pq2.add(parent);
            }

        //this will be the root node that would have total length of input as frequency
        return pq2.poll();
    }

    /**************************************************************************************************************
     *   write tree into byte output
     **************************************************************************************************************/
    private static void writeTrie(final BetterHuffman.Node x, final HuffmanOut binaryOut) {
        if (x.isLeaf()) {//if this is a node representing symbol in alphabet
            binaryOut.writeBit(true);
            binaryOut.writeByte((int) x.ch);
        } else {
            binaryOut.writeBit(false); //this is an aggregate node used for branching
            writeTrie(x.left, binaryOut);
            writeTrie(x.right, binaryOut);
        }
    }

    /**************************************************************************************************************
     *   make substitutes for incoming words
     **************************************************************************************************************/
    private static void buildCode(final String[] st, final BetterHuffman.Node x, final String s) {
        if (!x.isLeaf()) {
            buildCode(st, x.left, s + '0');
            buildCode(st, x.right, s + '1');
        } else {
            st[x.ch] = s;
        }
    }

    /**************************************************************************************************************
     *   Return uncompressed string
     **************************************************************************************************************/
    public static String expand(final byte[] value) {

        final HuffmanIn binaryIn = new HuffmanIn(value);
        final StringBuilder out = new StringBuilder();

        final BetterHuffman.Node root = readTrie(binaryIn);
        final int length = binaryIn.readInt();

        for (int i = 0; i < length; ++i) {
            BetterHuffman.Node x = root;

            while (!x.isLeaf()) {
                boolean bit = binaryIn.readBoolean();
                if (bit) {
                    x = x.right;
                } else {
                    x = x.left;
                }
            }

            out.append(x.ch);
        }


        return out.toString();
    }

    /**************************************************************************************************************
     *   get tree from bytes
     **************************************************************************************************************/
    private static BetterHuffman.Node readTrie(final HuffmanIn binaryIn) {
        boolean isLeaf = binaryIn.readBoolean();
        if (isLeaf) {
            char ch = binaryIn.readChar();
            return new BetterHuffman.Node(ch, UNDEFINED, null, null);
        } else {
            return new BetterHuffman.Node(PARENT, UNDEFINED, readTrie(binaryIn), readTrie(binaryIn));
        }
    }


    /**************************************************************************************************************
     *   Simple implementation of node
     **************************************************************************************************************/
    private static class Node implements Comparable<Node> {
        private final char ch;
        private final int freq;
        private final Node left;
        private final Node right;

        Node(char ch, int freq, Node left, Node right) {
            this.ch = ch;
            this.freq = freq;
            this.left = left;
            this.right = right;
        }

        private boolean isLeaf() {
            return left == null && right == null;
        }

        @Override
        public int compareTo(Node that) {
            return this.freq - that.freq;
        }
    }

    /**************************************************************************************************************
     *   class to read bits from stream
     **************************************************************************************************************/
    private static class HuffmanIn {

        private final ByteArrayInputStream in;
        private int buffer;
        private byte n;

        HuffmanIn(final byte[] input) {
            in = new ByteArrayInputStream(input);
            fillBuffer();
        }

        private void fillBuffer() {
            buffer = in.read();
            n = 8;
        }

        boolean readBoolean() {
            boolean bit = (buffer >> --n & 1) == 1;
            if (n == 0) {
                fillBuffer();
            }
            return bit;
        }

        char readChar() {
            int x = buffer <<= 8 - n;
            if (n == 8) {
                fillBuffer();
            } else {
                byte oldN = n;
                fillBuffer();
                n = oldN;
                x |= buffer >>> n;
            }
            return (char) (x & 255);
        }

        int readInt() {
            int x = 0;
            for (int i = 0; i < 4; ++i) {
                char c = readChar();
                x <<= 8;
                x |= c;
            }
            return x;
        }
    }

    /**************************************************************************************************************
     *   Output
     **************************************************************************************************************/
    private static class HuffmanOut {

        private ByteArrayOutputStream out = new ByteArrayOutputStream();
        private int buffer;
        private byte n;

        /**************************************************************************************************************
         * @return what was compressed so far in a human readable (no funny characters) format
         **************************************************************************************************************/
        public byte[] value() {
            clearBuffer();
            return out.toByteArray();
        }

        void writeBit(final boolean bit) {
            buffer = (buffer <<= 1) | (bit ? 1 : 0);
            if (++n == 8) {
                clearBuffer();
            }
        }

        void writeByte(final int x) {
            for (int i = 0; i < 8; ++i) {
                writeBit((x >>> 8 - i - 1 & 1) == 1);
            }
        }

        void clearBuffer() {
            if (n != 0) {
                out.write(buffer <<= 8 - n);
                n = 0;
                buffer = 0;
            }
        }


        /**************************************************************************************************************
         *   write all 4 bytes of int
         **************************************************************************************************************/
        void write(final int x) {
            for (int i = 3; i >= 0; i--)
                writeByte(x >>> (i * 8) & 255);//write 4 bytes of int
        }


    }

}