-
Forehead... Can't you just type it according to the book???
-
Huffman encoding, also known as Huffman encoding, is an encoding method, and Huffman encoding is a type of variable [word length] encoding (VLC). In 1952, Huffman proposed an encoding method that is based entirely on the probability of the occurrence of [characters] to construct the code word with the shortest average length of the prefix, sometimes called the best encoding, commonly known as the Huffman code (sometimes called Huffman code).
Suppose that the frequency of occurrence of the four characters is different, as follows:
The above example can be encoded according to the above algorithm logic, and the total length obtained is .
70×1+3×3+20×3+37×2=213mbit
-
The code word of the Hefman code (the ** of each symbol) is a different prefix codeword, that is, any codeword will not be the front part of another codeword, which makes each codeword can be connected to transmit together, and there is no need to add an isolation symbol in the middle, as long as there is no error when transmitting, the end can still separate each codeword, so as not to be confused.
Huffman encoding, also known as Huffman encoding, is an encoding method, and Huffman encoding is a type of variable word length encoding (VLC). In 1952, Huffman proposed an encoding method that constructs the codeword with the shortest average length of the heterogeneous prefix based entirely on the probability of character occurrence, sometimes called the best encoding, generally known as Huffman encoding.
-
Huffman encoding is a method of lossless compression of files, his idea is very simple, but very classic, he uses the idea of no duplicate prefix, that is, the prefix of each character is unique, if the encoding of a is 001, then there will be no other encoding that starts with 001, because Huffman encoding is based on the binary tree, and the path from the binary tree to each leaf node is unique, that is, the encoding of each character is also unique.
Huffman encoding is a kind of variable-length encoding, compared with the fixed-length encoding of ASCII code, Huffman encoding can save a lot of space, because the frequency of each character is not the same, for example, in English, the number of occurrences of 'e' is the highest, so if I define the encoding of 'e' a little shorter, then is it less space than fixed-length encoding?
Based on this line of thinking, the specific implementation process of Huffman coding is as follows:
1) First, count the frequency (weight) of the occurrence of each character in the text.
2) Using these frequencies (weights), a Huffman tree is constructed.
3) It is stipulated that starting from the root node, walking towards the leaf node, passing through the left sub-tree, the code is 0, and the right sub-tree, the code is 1, so that you can get the encoded value of each leaf node character.
-
Huffman encoding, the left subtree defaults to 0, and the right subtree defaults to 1, and the resulting encoding is as follows:
A:100 B:01 C:1011 D:11 E:1010 F:00 The code length of the encoding is:
The frequency is w=, and the probability of using each symbol can be calculated from this. The basic idea of Huffman encoding is that the symbol with a higher frequency is encoded with a shorter codeword, and the symbol with a lower frequency is encoded with a longer codeword, so that the coding efficiency is very high, that is, the average amount of information carried by each bit of the codeword is larger.
Probability of a: 10 27 (coded: 11).
Probability of b: 2 27 (code: 101).
Probability of c: 5 27 (coded: 01).
Probability of d: 6 27 (code: 00).
Probability of e: 4 27 (coded: 100).
The specific rules of coding are: each time to find the two symbols with the lowest probability of merging, if there are multiple minimum probabilities at the same time, then merge at will (in fact, the specific engineering application can not be merged casually, because this involves the final coding is completed, the variance of the length of the codeword, the difference above the project should be as small as possible, beginners can not stick to this).
-
Huffman coding is an encoding method, and Huffman coding is a type of variable length coding (VLC). In 1952, Uffman proposed an encoding method that constructs the codeword with the shortest average length of the heterogeneous prefix based entirely on the probability of character occurrence, sometimes called the best encoding, generally known as Huffman encoding.
Basic Introduction] Example of Huffman encoding.
The Huffman tree, that is, the optimal binary tree, is often used for data compression. In computer information processing, "Huffman coding" is a consensus coding method (also known as"Entropy coding") for lossless compression of data. This term refers to the encoding of source characters, such as a symbol in a file, using a special encoding table.
The peculiarity of this encoding table is that it is based on the estimated probability of occurrence of each source character (the characters with a high probability of occurrence use a shorter encoding, and the lower probability of occurrence uses a longer encoding, which reduces the average expected length of the encoded string, thus achieving lossless compression of the data). This approach was developed by. For example, in English, e has a high probability of occurrence, while z has the lowest probability of occurrence.
When compressing an English article using Huffman encoding, E is most likely represented by a bit, while Z may take 25 bits (not 26). In the normal way of representation, each letter occupies one byte, or 8 bits. Compared with the two, E uses the length of 1 8 of the general encoding, and Z uses more than 3 times.
If we can achieve a more accurate estimate of the probability of the occurrence of each letter in English, we can greatly increase the proportion of lossless compression.
This article describes the simplest and fastest Huffman code that can be found online. This method does not use any extended dynamic libraries, such as stls or components. Use only simple C functions, such as:
memset, memmove, qsort, malloc, realloc and memcpy.
As a result, everyone will find it easy to understand and even modify this code.
The Huffman tree is:
The weighted path length of a tree is the sum of the weighted path lengths of all leaf nodes in the tree, and the weighted path length of a node is the product of the path length from the node to the root node and the weights on the node. >>>More
I don't know much about hip-hop culture, but I think it's a particularly trendy culture that has always been ahead of its time. Hip-hop culture is a form of culture that was born on the streets of the ghetto of the United States, and it also includes a variety of styles. The singing form is also ingenious. >>>More
Transfer-in players Age Position Transfer method and cost From Foster 22 Goalkeeper £2.5 million Stoke City Park Ji-sung 24 Defender £4 million PSV Eindhoven van der Sar 34 Goalkeeper £2 million Fulham transfers player Age Position Transfer method and cost To Bellion 23 Striker Loan West Ham. >>>More
Buying a house after marriage does not necessarily belong to the joint property of the husband and wife. >>>More
In 1958, at the age of 5, Richard Clayderman began to play the piano; In the same year, he used the harmonica to create a piece called "Fifi Waltz". >>>More