exercise in the previous class Write a computer program to construct a Huffman code... typical application of the data structure “priority queue” a collection of data with each associated with “priority” enqueue: add a data in the collection dequeue: retrieve a data with the largest priority 0.3 dequeue 0.1 enqueue 0.2 0.3 0.4 tree = data, smaller probability = higher priority dequeue two trees, join them, and enqueue the result at the final satge, give labels to the unified single tree 1 review of the previous class Shannon’s source coding theorem the average codeword length is H(S) or more the length can be reduced to H(S) + ε for any ε “block” version of Huffman codes encode by block, and we have more compact coding there are various approaches to define block patterns there is an additional remark concerning the block coding 2 handling the “tail” of the data block coding cannot be applied if... the message is too short we have leftover symbols the last run is not terminated no clever solution, get around problems by operational devises use different coding for the final (leftover) block use padding, etc. 3 today’s class some more source coding scheme coding for “simple” realization arithmetic codes coding for the unknown information sources Lempel-Ziv algorithms coding which sacrifices the uniquely decodable property non-reversible, or lossy, coding 4 arithmetic code part I coding for “simple” realization 5 realization issue of coding coding ≈ translation between symbols and codewords We need a large translation table to obtain good performance. static (静的な) approach... construct the table beforehand, and keep it in the system the translation is made by a simple table-lookup large memory is needed not good for realization dynamic (動的な) approach... the table is not constructed explicitly (明示的に) the table-lookup is replaced by on-the-fly computation 6 arithmetic code arithmetic code...well-known example of the dynamic approach the idea is sketched by an example: p(A) = p, p(B) = 1 – p, the block size is n in numerical examples, we use p = 0.7 and n = 3 i 0 1 2 3 4 5 6 7 wi AAA AAB ABA ABB BAA BAB BBA BBB P(wi) 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 S(wi) 0 0.343 0.490 0.637 0.700 0.847 0.910 0.973 We have 23 = 8 block patterns. w0, ..., w7, in the dictionary order 𝑃(𝑤𝑖 ) : probability that wi occurs 𝑆 𝑤𝑖 = 𝑖−1 𝑗=0 𝑃 𝑤𝑗 = 𝑆 𝑤𝑖−1 + 𝑃 𝑤𝑖−1 (sum of probabilities before wi) 7 illustration of probabilities the eight patterns partition the interval [0, 1]; 0 0.5 AAA AAB ABA ABB BAA 1.0 BAB BBA BBB 0.343 0.147 0.147 0.063 0.147 0.0630.063 S(ABB) i 0 1 2 3 4 5 6 7 wi AAA AAB ABA ABB BAA BAB BBA BBB P(wi) 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 S(wi) 0 0.343 0.490 0.637 0.700 0.847 0.910 0.973 0.027 S(BAA) = S(ABB)+P(ABB) wi occupies the interval 𝐼𝑖 = 𝑆 𝑤𝑖 , 𝑆 𝑤𝑖 + 𝑃(𝑤𝑖 ) basic idea: encode wi as a certain value 𝑥 ∈ 𝐼𝑖 problem to solve: need a translation between wi and 𝑥 8 about the translation P(w) S(w) two directions of the translation: [encode] the translation from wi to 𝑥 P(wA) P(wB) S(wA) S(wB) [decode] the translation from 𝑥 to wi ...use recursive computation instead of a static table A B AA AAA AB AAB 0.343 ABA BA ABB 0.147 BAA 0.147 AB BAB 0.063 BAB 0.147 BBB 0.0630.063 0.027 9 [encode] the translation from wi to 𝑥 recursively determine P( ) and S( ) for prefixes of wi P(ε) = 1, A(ε) = 1 for a pattern wA, P(wA) = P(w)p, S(wA) = S(w) for a pattern wB, P(wB) = P(w)(1 – p), S(wB) = S(w) + P(w)p P(w) the interval of “ABB” ? S(w) P() = 1 S() = 0 P(A)=0.7 A S(A) = 0 AB P(AB)=0.21 S(AB) = 0.49 AA ABA p = P(A) = 0.7 B P(wA) P(wB) S(wA) S(wB) ABB P(ABB)=0.063 S(ABB) = 0.637 the interval of “ABB” is [0.637, 0.637 + 0.063) 10 [encode] the translation from wi to 𝑥 (cnt’d) Given a pattern wi, now we know the interval 𝐼𝑖 , but which 𝑥 ∈ 𝐼𝑖 should be chosen? we want 𝑥 ∈ 𝐼𝑖 whose binary representation is the shortest. choose 𝑥 as S(wi) + P(wi) but trimming at − log 2 𝑃(𝑤𝑖 ) places (...以下を切り捨て) S(wj) + P(wj) S(wj+1) 0.aa...aaa...a +0.00...0bb...b 0.aa...acc...c 0.aa...ac0...0 x 0.aa...ac 0.aa...aaa...a the length of x ≈ – log2P(wi) place (桁数) of the most = – log2P(wj) significant non-zero of P(wj) 0.aa...acc...c 11 [decode] the translation from 𝑥 to wi goal: given x, determine the leaf node whose interval contains x almost the same as the first half of the encoding translation compute, compare, and move to the left or right P(w) x = 0.600 A(w) P() = 1 S() = 0 P(A)=0.7 A S(A) = 0 AA P(ABA)=0.147 S(ABA) = 0.49 B S(B) = 0.7 AB ABA ABB P(AB)=0.21 S(AB) = 0.49 P(wA) P(wB) A(wA) A(wB) threshold S(ABB) = 0.637 0.600 is contained in the interval of ABA...decoding completed 12 performance, summary an n-symbol pattern w with probability P(w) encoded to a sequence with length − log 2 𝑃(𝑤𝑖 ) the average codeword length per symbol is 1 1 𝑃(𝑤) − log 2 𝑃(𝑤𝑖 ) ≈ −𝑃 𝑤 log 2 𝑃 𝑤𝑖 = 𝐻(𝑆) 𝑛 𝑛 𝑛 𝑛 𝑤∈𝑉 𝑤∈𝑉 almost optimum coding without using a translation table however... we need much computation with good precision ( use approximation?) 13 Lempel-Ziv algorithms part II coding for the unknown information sources 14 probability in advance? so far, we assumed that the probabilities of symbols are known... in the real world... the symbol probabilities are often not known in advance scan the data twice? first scan...count the number of symbol occurrences (出現) second scan...Huffman coding delay of the encoding operation? overhead to transmit the translation table? 15 Lempel-Ziv algorithms for information sources whose symbol probability is not known... LZ77 lha, gzip, zip, zoo, etc. LZ78 compress, arc, stuffit, etc. LZW GIF, TIFF, etc. work fine for any information sources universal coding (ユニバーサル符号化) 16 LZ77 proposed by A. Lempel and J. Ziv in 1977 represent a data substring by using a substring which has been occurred previously L Z algorithm overview process the data from the beginning partition the data to blocks in a dynamic manner represent a block by a three-tuple (i, l, x) “rewind i symbols, copy l symbols, and append x” x –i –i+l encoding completed –1 0 l–1 l 17 encoding example of LZ77 consider to encode ABCBCDBDCBCD symbol A B C B C D B D C B C D history first time first time first time = (here) – 2 = (here) – 2 ≠ (here) – 2 = (here) – 3 ≠ (here) – 3 = (here) – 6 = (here) – 6 = (here) – 6 = (here) – 6 codeword (0, 0, A) (0, 0, B) (0, 0, C) (2, 2, D) (3, 1, D) (6, 4, *) 18 decoding example of LZ77 decode (0, 0, A), (0, 0, B), (0, 0, C), (2, 2, D), (3, 1, D), (6, 4, *) possible problem: large block is good, because we can copy more symbols large block is bad, because a codeword contains a large integer ... the dilemma degrades the performance. 19 LZ78 proposed by A. Lempel and J. Ziv in 1978 represent a block by a thw-tuple (b, x) “copy the b-th block before, and append x” x –b –1 0 encoding completed 20 encoding example of LZ78 consider to encode ABCBCBCDBCDE symbol A B C B C B C D B C D E history first time first time first time = (here) – 2 block codeword (0, A) (0, B) (0, C) block # 1 2 3 (2, C) 4 (1, D) 5 (1, E) 6 = (here) – 1 block = (here) – 1 block 21 decoding example of LZ78 decode (0, A), (0, B), (0, C), (2, C), (1, D), (1, E) advantage against LZ77: large block is good, because we can copy more symbols is there anything wrong with large blocks? the performance slightly better than LZ78 22 summary of LZ algorithms in LZ algorithms, the translation table is constructed adoptively information sources with unknown symbol probabilities information sources with memory LZW: good material to learn intellectual property (知的財産) UNISYS, CompuServe, GIF format, ... 23 non-reversible, or lossy, coding part III coding which sacrifices the uniquely decodable property 24 uniquely decodable? really needed? We have considered codes which are uniquely decodable. “encode, decode, and we have the original data” reversible code, lossless code (可逆,無歪み符号) Sometimes, the uniquely decodable property is too much. image, sound, etc...OK if similar to the original non-reversible code, lossy code (非可逆,有歪み符号) more compactness individual coding schemes are not given in this lecture... 25 Shannon’s theorem, reconsidered 1. 2. Shannon’s source coding theorem: for any code, the average codeword length ≥ H (X) a code with average codeword length < H (X) + ε is constructible X? X encoder Y encoder = communication channel input X = symbol(s), output Y = codeword(s) reversible: the knowledge of Y uniquely determines X entropies: H(X) → 0 ... mutual information I(X; Y) = H(X) “A codeword is a container of the information with size I(X; Y).” ... A codeword cannot be smaller than H(X). 26 non-reversible case non-reversible code: the knowledge of Y does not eliminate all ambiguity on X “AFTER entropy” > 0, mutual information I(X; Y) < H(X). X? X encoder Y “A codeword is a container of the information with size I(X; Y).” it can happen that the average codeword length < H (X) “a code beyond Shannon’s theorem” may exist. the theory is more complicated, cf. rate-distortion theorem 27 summary of today’s talk source coding other than Huffman codes coding for “simple” realization arithmetic codes coding for the unknown information sources Lempel-Ziv algorithms coding which sacrifices the uniquely decodable property non-reversible, or lossy, coding 28 exercise Encode ABACABADACABAD by the LZ77 algorithm, and decode its result. Survey what has happened concerning the patent of LZW. 29
© Copyright 2024 ExpyDoc