Diss. ETH No. 23608 On Successive Cancellation Decoding of Polar Codes and Related Codes A dissertation submitted to ETH Zurich for the degree of Doctor of Sciences presented by Christian Schürch MSc ETH born on December 23, 1985 citizen of Heimiswil (Bern) accepted on the recommendation of Prof. Dr. Hans-Andrea Loeliger, examiner Prof. Dr. Henry D. Pfister, co-examiner 2016 Series in Signal and Information Processing Vol. 29 Editor: Hans-Andrea Loeliger Bibliographic Information published by Die Deutsche Nationalbibliothek Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the internet at http://dnb.d-nb.de. Copyright © 2017 by Christian Schürch First Edition 2017 HARTUNG-GORRE VERLAG KONSTANZ ISSN 1616-671X ISBN-10: 3-86628-580-9 ISBN-13: 978-3-86628-580-4 Aerodynamically, the bumble bee shouldn’t be able to fly, but the bumble bee doesn’t know it so it goes on flying anyway. Mary Kay Ash Acknowledgments Special thanks go to my supervisor, Andi Loeliger. I very much appreciate the freedom he gave me, even though it was not always easy to handle. His skills in motivating people are incredible. His ability to teach ideas in a clear and concise way is even more incredible. I would like to thank Henry Pfister for taking such an interest in my work and for being my co-examiner. It was a pleasure to meet him and to get his advice on how to make the most out of my visit to California. Many thanks to all the colleagues at the ISI lab for providing a relaxing and fun atmosphere. I very much enjoyed playing tabletop soccer (a.k.a. baby foot), playing AoFE, participating in the SOLA or doing other lab activities. I also made some good friends at the lab for which I am deeply grateful. Thank you, Rita, for your kindness and help. I am indebted to Fabienne, who supported and motivated me when I was frustrated or stressed. Without her I am not sure if I would have finished this thesis. Finally, I would like to thank my family for their continuous support and for their love. I am so happy to have you in my life. Although doing my PhD took a lot of hard work and effort, I am, however, very glad to have gone through this experience. v Abstract Polar codes with a successive cancellation decoder were historically the first practical coding scheme that was able to reliably transmit digital data at the highest possible data rate. This thesis gives new insights on successive cancellation decoding of polar codes and related codes. It consists of three parts. The first part presents a partial order for the synthesized channels of a polar code that is independent of the transmitting channel. This partial order is combined with a partial order from the literature. A minimal representation of this combined partial order is derived that enables us to practically use the combined partial order for simplifying the construction of polar codes. The second part investigates successive cancellation decoding of codes based on the Discrete Fourier Transform over finite fields. Successive cancellation decoding in the natural symbol order is impractical for reasonable blocklength. However, based on the Cooley-Tukey Fast Fourier Transform, we found that successive cancellation decoding in the digitreversed symbol order has significantly lower complexity but comes along with slower polarization. However, for most codes, even this reduced complexity is still impractical. We found that for the reasonably high blocklength of 256, successive cancellation decoding in the digit-reversed order is feasible. Simulations showed that the synthesized channels polarize. Moreover, for that blocklength, we found a way to choose the frozen symbols of the Discrete Fourier Transform code such that its performance under successive cancellation list decoding is very well. However, the decoding complexity is still much higher than for other coding schemes with comparable performance. The third part introduces the fat-tree decoder, which is applicable to polar codes and related codes and which is a generalization to the sucvii viii Abstract cessive cancellation decoder. The fat-tree decoder does message passing on a cycle-free factor graph for the polar code and thus computes exact posterior probabilities. The fat-tree decoder is able to decode the information symbols of a polar code in any desired order. Moreover, it has the ability to take into account any frozen symbols even from the future symbols. However, to keep the complexity manageable, only a limited number of frozen symbols can be considered and the symbol decoding order is not completely arbitrary. We present two specific fat-tree decoder strategies that perform substantially better than the successive cancellation decoder and have manageable complexity. Moreover, these strategies approach block maximum a posteriori performance for polar codes of many parameters. The high flexibility of the fat-tree decoder suggests that even better tradeoffs between performance and decoding complexity exist. Keywords: successive cancellation; polar codes; factor graphs; partial order; code construction; Discrete Fourier Transform; Fast Fourier Transform; cycle-free; maximum likelihood decoding; maximum a posteriori decoding Kurzfassung Polar Codes mit einem Successive Cancellation Dekodierer waren historisch gesehen die erste praktikable Kodierungsmethode, die digitale Daten zuverlässig mit der höchstmöglichen Rate übertragen konnte. Diese Dissertation gibt neue Einblicke in das Successive Cancellation Dekodieren von Polar und verwandten Codes. Sie besteht aus drei Teilen. Der erste Teil präsentiert eine Halbordnung für die synthetischen Kanäle eines Polar Codes, die unabhängig vom Übertragungskanal ist. Diese Halbordnung wird kombiniert mit einer anderen Halbordnung aus der Literatur. Eine minimale Darstellung dieser kombinierten Halbordnung wird hergeleitet, welche es uns ermöglicht die kombinierte Halbordnung praktisch zu nutzen um die Konstruktion eines Polar Codes zu vereinfachen. Der zweite Teil untersucht das Successive Cancellation Dekodieren von Codes basierend auf der Diskreten Fourier-Transformation über endliche Körper. Das Successive Cancellation Dekodieren in der natürlichen Symbol Reihenfolge ist unpraktikabel für sinnvolle Blocklängen. Basierend auf der schnellen Fourier-Transformation nach Cooley-Tukey, zeigen wir, dass das Successive Cancellation Dekodieren in Ziffer-umgekehrter Symbol Reihenfolge eine bedeutend tiefere Komplexität besitzt, jedoch auch eine langsamere Polarisierung mit sich zieht. Trotzdem ist für viele Codes auch diese reduzierte Komplexität immer noch unpraktikabel. Jedoch ist das Successive Cancellation Dekodieren in der Zifferumgekehrten Symbol Reihenfolge praktikabel für die vernünftig hohe Blocklänge von 256. Simulationen haben gezeigt, dass für diese Blocklänge die synthetischen Kanäle polarisieren. Wir haben ausserdem einen Weg gefunden um die eingefrorenen Symbole des Diskreten FourierTransformations Codes so zu wählen, dass dessen Blockfehlerwahrscheinlichkeit für den Successive Cancellation Dekodierer mit Liste sehr gut ist. ix x Kurzfassung Jedoch ist die Dekodierkomplexität um einiges höher als für andere Codes mit vergleichbarer Blockfehlerwahrscheinlichkeit. Der dritte Teil führt den Fetter-Baum Dekodierer ein, der sowohl auf Polar wie auch auf verwandte Codes anwendbar ist und der den Successive Cancellation Dekodierer verallgemeinert. Der Fetter-Baum Dekodierer rechnet auf einem kreisfreien Faktorgraphen und berechnet daher exakte A-Posteriori-Wahrscheinlichkeiten. Der Fetter-Baum Dekodierer ist imstande die Informationssymbole eines Polar Codes in jeder gewünschten Reihenfolge zu dekodieren. Er ist ausserdem imstande beliebige eingefrorene Symbole miteinzubeziehen. Damit jedoch die Komplexität immer noch zu bewältigen ist, kann nur eine limitierte Anzahl an eingefrorenen Symbolen miteinbezogen werden und die Dekodier Reihenfolge ist nicht beliebig. Zwei konkrete Fetter-Baum Dekodier Strategien werden präsentiert, die eine bedeutend tiefere Blockfehlerwahrscheinlichkeit als der Successive Cancellation Dekodierer haben und die eine praktikable Komplexität besitzen. Beide Strategien erreichen die Blockfehlerwahrscheinlichkeit eines Block-Maximum-A-Posteriori Dekodierers für viele Parameter. Die hohe Flexibilität des Fetter-Baum Dekodierers legt nahe, dass Strategien mit einem besseren Kompromiss zwischen Komplexität und tiefer Blockfehlerwahrscheinlichkeit existieren. Stichworte: Successive Cancellation; Polar Codes; Faktorgraphen; Halbordnung; Code Konstruktion; Diskrete Fourier Transformation; Schnelle Fourier Transformation; kreisfrei; Maximum-Likelihood Dekodierer; Maximum-A-Posteriori Dekodierer Contents Abstract vii Kurzfassung ix 1 Introduction 1.1 Polar Codes . . . . . . . . . . 1.2 Contributions . . . . . . . . . 1.3 Notation . . . . . . . . . . . . 1.4 Important Concepts . . . . . 1.4.1 Maximum a Posteriori 1.4.2 Channels . . . . . . . 1.4.3 Factor Graphs . . . . 1.4.4 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 2 . 3 . 5 . 6 . 6 . 8 . 11 . 17 2 Polar Codes 2.1 Gn -Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Probability Setup . . . . . . . . . . . . . . . . . . . . 2.3 Genie-Aided Successive Cancellation Decoding . . . . . . 2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 The Tree Factor Graph for the Synthesized Channels 2.3.3 Max-Product on the Tree Factor Graph . . . . . . 2.4 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Definition of Polar Codes . . . . . . . . . . . . . . 2.4.2 Construction of Polar Codes . . . . . . . . . . . . 2.5 Successive Cancellation Decoder . . . . . . . . . . . . . . 2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Complexity . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Successive Cancellation and Block MAP . . . . . . 2.6 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . xi 19 19 21 21 22 23 27 28 28 29 29 29 29 30 31 xii Contents 2.7 Performance Improvements . . . . . . . . . . . . . . . . . 31 2.7.1 Improve the Decoder . . . . . . . . . . . . . . . . . 31 2.7.2 Improve the Code . . . . . . . . . . . . . . . . . . 32 3 A Partial Order for the Synthesized Channels of SCD 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Sufficient Condition for Degradation . . . . . . . . . . 3.3 A Partial Order . . . . . . . . . . . . . . . . . . . . . . . 3.4 Another Partial Order from the Literature . . . . . . . . 3.5 The Combined Partial Order . . . . . . . . . . . . . . . 3.6 Code Construction with the Combined Partial Order . . 3.7 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 36 38 40 43 46 48 51 4 Successive Cancellation Decoding of DFT-Codes 4.1 Discrete Fourier Transform Codes . . . . . . . . . . 4.2 Digit Representation . . . . . . . . . . . . . . . . . 4.3 The Cooley-Tukey Fast Fourier Transform . . . . . 4.4 Successive Cancellation . . . . . . . . . . . . . . . 4.4.1 Partial Distances . . . . . . . . . . . . . . . 4.4.2 Natural Symbol Order . . . . . . . . . . . . 4.4.3 Digit-Reversed Symbol Order . . . . . . . . 4.5 An Example for the Blocklength 256 . . . . . . . . 4.5.1 Polar DFT-Code . . . . . . . . . . . . . . . 4.5.2 Improved Polar DFT-Code . . . . . . . . . 4.5.3 Doubly Improved Polar DFT-Code . . . . . 4.6 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 56 57 59 61 61 61 62 68 69 72 72 72 5 A Generalized SCD with Look-Ahead 5.1 Abbreviations and Symbols . . . . . . . . . . . . . . . . . 5.2 Cycle-Free Factor Graphs for the Polar Transform . . . . 5.3 Fat-Tree Decoder . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 States . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Reduced Graph . . . . . . . . . . . . . . . . . . . . 5.3.3 Message Passing . . . . . . . . . . . . . . . . . . . 5.3.4 Decoding Process . . . . . . . . . . . . . . . . . . . 5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Successive Cancellation . . . . . . . . . . . . . . . 5.4.2 SCD with Different Bit Order . . . . . . . . . . . . 5.4.3 Block Maximum a Posteriori . . . . . . . . . . . . 5.4.4 The Benefits of Future Bits: A Numerical Example 79 80 81 81 84 84 86 92 94 94 95 95 103 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents 5.5 5.6 5.7 5.8 xiii Ideas for Good Strategies . . . . . . . . . . . . . . . . 5.5.1 “Decode” Frozen Bits First . . . . . . . . . . . 5.5.2 Taking Decisions: Benefits and Drawbacks . . . 5.5.3 Choose the Bits to Decode with Close Indexes Implemented Strategies . . . . . . . . . . . . . . . . . 5.6.1 Scores . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Most Known Bits of Lowest Index Strategy . . 5.6.3 Most Known Bits Strategy . . . . . . . . . . . Simulation Results . . . . . . . . . . . . . . . . . . . . 5.7.1 MKBLIS and MKBS . . . . . . . . . . . . . . . 5.7.2 Sum-Product and Max-Product . . . . . . . . . 5.7.3 Varying the Rate . . . . . . . . . . . . . . . . . 5.7.4 Varying the Blocklength . . . . . . . . . . . . . 5.7.5 Experiments . . . . . . . . . . . . . . . . . . . 5.7.6 Comparison to the SC List Decoder . . . . . . Conclusion and Outlook . . . . . . . . . . . . . . . . . A Proofs A.1 Proof A.2 Proof A.3 Proof A.4 Proof A.5 Proof A.6 Proof A.7 Proof A.8 Proof A.9 Proof Bibliography of of of of of of of of of Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition 1.2 1.5 3.1 3.2 3.6 4.3 5.1 5.2 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 104 104 107 107 107 108 113 114 115 121 121 121 125 128 131 . . . . . . . . . . . . . . . . . . 137 137 138 140 142 143 145 146 147 150 153 List of Figures 1.1 1.2 1.3 1.4 Digital data transmission . . . Factor graph examples . . . . . Message passing . . . . . . . . Some factor graph equivalences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 11 . 12 . 16 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 The factor graph notation of G1 . . . . The recursive construction of Gn . . . . The decomposition of G3 . . . . . . . . A factor graph for the probability setup Useful factor graph equivalences for G1 . (i) . . A factor graph for the channel Wn (i) Intermediate factor graph for Wn . . . (4) The tree factor graph of W3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 20 22 24 25 25 26 3.1 3.2 3.3 3.4 3.5 3.6 The Hasse-diagram of the PO 1 for N = 16 . . . . . . The Hasse-diagram of the PO 1 for N = 64, wH (i) = 3 The Hasse-diagram of the PO c for N = 16 . . . . . . Bounds on the union bound by the Maximin strategy . A generalization of G2 . . . . . . . . . . . . . . . . . . . Two tree factor graphs for the generalized G2 . . . . . . . . . . . . 43 44 47 50 51 52 4.1 4.2 4.3 4.4 4.5 4.6 f (i) . . . . . . . . . . . . A factor graph for the channel W N Decomposition of the digit-reversed DFT transform . . . . f (i) . . . . . . . . . . . . . Intermediate factor graph for W N f (5) . . . . . . . . . . . . . . . The tree factor graph for W 12 f (i) for N = 256 The mutual information of the channels W N f (i) The digit-reversed mutual information of the channels W N xv . . . . . . . . . . . . . . . . 63 64 65 66 69 70 xvi List of Figures 4.7 4.8 4.9 4.10 Block error probability of the polar DFT-code . . . . . . . The polar DFT-code compared to polar codes . . . . . . . Block error probability of the improved polar DFT-code . Error probability of the doubly improved polar DFT-code 71 73 74 75 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 5.34 5.35 A factor graph of Gn with both copies of Gn−1 superposed 82 Two cycle-free factor graphs of G8 . . . . . . . . . . . . . . 83 The fat-tree factor graph for G8 with states . . . . . . . . 85 Message passing through an i-node of G1 . . . . . . . . . 86 The FV of the decoding factor graph of Figure 5.3. . . . . 89 A factor graph equivalent to the factor graph in Figure 5.5 90 Allowed state transitions . . . . . . . . . . . . . . . . . . . 93 The FV of the RG at time step 1 in Table 5.3. . . . . . . 96 The FV of the RG at time step 2 in Table 5.3. . . . . . . 96 The FV of the RG at time step 3 in Table 5.3. . . . . . . 97 The FV of the RG at time step 4 in Table 5.3. . . . . . . 97 The FV of the RG at time step 5 in Table 5.3. . . . . . . 98 The FV of the RG at time step 1 in Table 5.4. . . . . . . 99 The FV of the RG at time step 2 in Table 5.4. . . . . . . 99 The FV of the RG at time step 3 in Table 5.4. . . . . . . 100 The FV of the RG at time step 4 in Table 5.4. . . . . . . 100 The FV of the RG at time step 5 in Table 5.4. . . . . . . 101 The FV of the RG at time step 1 in Table 5.5. . . . . . . 102 The FV of the RG at time step 2 in Table 5.5. . . . . . . 102 The FV of the RG at time step 3 in Table 5.5. . . . . . . 103 An example showing the benefit of taking reliable decisions106 The FV of the FTFG at time step 0 in Table 5.7 . . . . . 110 The FV of the FTFG as in Figure 5.22 but with S1 = D . 111 The FV of the FTFG as in Figure 5.23 but with S2 = D . 111 Simulation results: MKBLIS and α = 1.0 . . . . . . . . . 116 Simulation results: MKBS and α = 1.0 . . . . . . . . . . . 117 Simulation results: MKBLIS and α = 0.7 . . . . . . . . . 118 Simulation results: MKBS and α = 0.7 . . . . . . . . . . . 119 Simulation results: MKBS in dependence of α . . . . . . . 120 Simulation results: Max-product message passing . . . . . 122 Simulation results: Low rate . . . . . . . . . . . . . . . . . 123 Simulation results: High rate . . . . . . . . . . . . . . . . 124 Simulation results: Low blocklength . . . . . . . . . . . . 126 Simulation results: High blocklength . . . . . . . . . . . . 127 Simulation results: First modified MKBS . . . . . . . . . 129 List of Figures 5.36 5.37 5.38 5.39 Simulation Simulation Simulation Simulation xvii results: results: results: results: Second modified MKBS . . . . . . . . 130 Successive Cancellation List Decoder . 132 Comparison to SCLD . . . . . . . . . 133 Comparison to SCLD, high blocklength134 A.1 An example for termination edges . . . . . . . . . . . . . 148 Chapter 1 Introduction This thesis deals with practical schemes for the reliable transmission of digital data over a noisy transmission link as depicted in Figure 1.1. A transmitter wants to convey a message U , let us assume a sequence of K symbols, to a receiver. The transmitter encodes the K symbols into the sequence X containing N symbols and transmits them over N uses of a noisy transmission link. The noisy transmission link is referred to as the transmitting channel. It is a statistical model of what happens to the sent signal before it is received. The receiver observes the output sequence Y of the transmitting channel and generates an estimate Û of the message U , which is called decoding. The goal is to reliably transmit as many symbols per channel use as possible with practical encoding and decoding operations. The positive integer N is called the blocklength. The rate is defined as R , K/N . The code is defined as the set of all sequences X of length N that the encoder is able to produce. Shannon showed in his seminal papers [32], [33] that there exists a fundamental limit on the number of symbols per channel use that can be reliably sent over a transmitting channel. He called this limit U X Encoder Y Channel Û Decoder Figure 1.1: A simplified representation of digital data transmission. 1 2 Introduction the capacity of the transmitting channel. However, he did not give a practical scheme that achieves capacity. Since then a huge number of ideas have been created towards this goal. Some ideas led to practical schemes that empirically got very close to capacity, as for example turbo or low-density parity-check codes [29]. However, it took a long time until the first practical scheme was invented that provably achieved capacity on the important subclass of binary-input discrete memoryless channels: polar codes under successive cancellation decoding (SCD) [1], [10]. In the meantime, other capacity-achieving codes with a practical encoder and decoder have been found: In [16] it was shown that also spatially coupled low-density parity-check codes achieve the capacity of symmetric binary-input discrete memoryless channels. Moreover, sparse superposition codes [12] achieve capacity on the additive white Gaussian noise channel. However, these codes are not considered in this thesis. 1.1 Polar Codes Polar codes were introduced by Arıkan [1]. They were the first provably capacity-achieving codes for binary-input discrete memoryless channels with a low-complexity encoder and decoder. Although the rule to construct a polar code was already introduced and investigated in [34], the proof that polar codes achieve the symmetric capacity was first given by Arıkan [1]. Binary polar codes work as follows. Suppose that K out of the N bits u0 , u1 , . . . , uN −1 contain the message bits and the remaining N − K bits are fixed and known to both the transmitter and the receiver. A linear, bijective transform Gn is applied to u0 , u1 , . . . , uN −1 to get the bits x0 , x1 , . . . , xN −1 , which are then sent over N uses of the transmitting channel. After receiving the output sequence y0 , y1 , . . . , yN −1 , the SCD successively decodes the bits u0 , u1 , . . . , uN −1 in the order of ascending index. When decoding ui the SCD assumes the already decoded bits u0 , u1 , . . . , ui−1 to be known and the future bits ui+1 , . . . , uN −1 to be completely unknown. Encoding and SCD of polar codes can be performed with low complexity. The set of K bits from the N bits u0 , u1 , . . . , uN −1 that are used to transmit the message is determined as follows. Arıkan defined for every ui a synthesized binary-input channel that corresponds to SCD of ui . He showed that these N synthesized channels converge either to a noiseless or a pure noise channel as the blocklength N goes to infinity. 1.2 Contributions 3 This phenomenon was called polarization. The bits corresponding to the K best synthesized channels are used to transmit the message and if K is not larger than the number of almost noiseless channels they have a small error probability. Although having a practical scheme that achieves capacity was a milestone in coding theory, being capacity-achieving is a property that holds asymptotically as the blocklength N goes to infinity. In practice, the blocklength N is a finite quantity and it is important that the block error probability is low at practical blocklengths. Polar codes under SCD are not good at practical blocklengths [35]. As the SCD is not an optimal decoder, it is worthwhile to search for practical decoders that improve the performance of polar codes. The successive cancellation list decoder (SCLD) from [35] is a promising practical decoder that substantially reduces the block error probability of polar codes. The SCLD empirically approaches the performance of optimal decoding of polar codes for many parameters. However, even optimal decoding of polar codes is still not very good. Thus, to make polar codes competitive, polar codes themselves need to be improved as well. Several approaches to improve polar codes have been considered so far. A well-performing but still simple approach was given in [35], where a high-rate outer cyclic redundancy check code was added with the consequence that the performance under SCLD was substantially improved. With these modifications, polar codes are competitive. 1.2 Contributions The contributions of this thesis can be divided into the following three parts. A Partial Order for the Synthesized Channels (Chapter 3) Code construction of polar codes consists of finding the K best among all N synthesized channels of a polar code, which is dependent on the transmitting channel. This is not a trivial problem and hence it is desirable to simplify it. We found a partial order on the set of N synthesized channels that is independent of the transmitting channel and that orders the synthesized channels according to their quality. This partial order is combined with a partial order from the literature. The combined partial order is quite powerful. For example for the 4 Introduction blocklength N = 220 , there exists a synthesized channel such that according to the combined partial order, 23.06% of all synthesized channels are better and 23.07% are worse than this synthesized channel. A minimal representation of the combined partial order is derived that enables us to practically use the partial order for simplifying code construction of polar codes. An example shows that a code construction algorithm based on the combined partial order can significantly reduce the number of synthesized channels for which the quality needs to be determined. SCD of Discrete Fourier Transform Codes (Chapter 4) Codes based on the Discrete Fourier Transform are an important class of codes, including the widely-used Reed-Solomon codes. However, SCD of Discrete Fourier Transform codes in the natural symbol order is prohibitively complex for all except the shortest blocklengths. Based on the Cooley-Tukey Fast Fourier Transform, we found a symbol order, called the digit-reversed order, such that the SCD complexity is substantially reduced. But this comes along with slower polarization. However, exact SCD of Discrete Fourier Transform codes in the digit-reversed symbol order is still only feasible if N can be factorized into small factors. A specific example with a reasonably high blocklength of N = 256 is given for which SCD in the digit-reversed symbol order is feasible. As for standard polar codes, we empirically found that many of the synthesized channels polarize to almost noiseless or almost useless channels. Simulation results for this blocklength show that under SCLD the constructed Discrete Fourier Transform code has comparable performance to a polar code, however with much higher decoding complexity. We substantially improved the performance under SCLD by altering the rule to pick the information symbols. Although this resulted in a block error probability comparable to the well-performing polar code with outer cyclic redundancy check code from [35], the decoding complexity is still much higher. A Generalized Successive Cancellation Decoder (Chapter 5) SCD of polar codes is mainly suboptimal because when decoding ui , the bits among ui+1 , . . . , uN −1 that are known by the receiver are assumed to be unknown. We found two cycle-free factor graphs for the polar transform Gn . We defined the fat-tree decoder which does message passing on a reduced version of one of the cycle-free graphs and which generalizes SCD. The fat-tree decoder is not only applicable to polar codes but 1.3 Notation 5 also to all codes based on Gn or based on transforms having a similar structure as Gn . The fat-tree decoder is able to use the values of any known bits even from the future bits ui+1 , . . . , uN −1 . Moreover, decoding can be performed in any desired bit order, not only in ascending index order. However, to keep the complexity manageable, only a limited amount of fixed bits can be incorporated and the bit decoding order is not completely arbitrary. The flexibility of the fat-tree decoder is huge and thus it is a very interesting problem to find good decoding strategies. We proposed two decoding strategies with adjustable decoding complexity. For many parameters, both strategies are able to approach optimal decoding of polar codes with manageable complexity. For other Gn -codes, the block error probability is significantly lower than for SCD even though the performance of optimal decoding is not always reached with manageable complexity. A comparison to the SCLD shows an advantage for the SCLD. However, our generalized SCD is very flexible and hence it is promising to search for better strategies or for codes that are better matched to the fat-tree decoder. We think that the presented cycle-free factor graphs are a suitable and intuitive way of teaching polar codes because the structure of polar codes stands out clearly. 1.3 Notation For an integer k, we denote by Zk the set {0, 1, . . . , k − 1} which is empty if k ≤ 0. The cardinality of a finite set A is denoted by |A|. For a function f with domain A and for B ⊂ A, we denote by f (B) the set {f (b)|b ∈ B} and by argmaxb∈B f (b) the element b ∈ B that maximizes f (b) where ties are broken arbitrarily. P For a variable a taking values in the finite set A, we use the notation a , maxa and argmaxa to mean that these operators are performed over all a ∈ A. We denote by dxe the ceiling (i.e. the smallest integer not smaller than x) of the real number x and by bxc the flooring of x (i.e. the largest integer not larger than x). We use the standard Landau notation O(.) to denote an asymptotic upper bound on the growth rate of a function. We use uppercase particularly but not exclusively for matrices and random variables. We denote by uk−1 the row vector (u0 , u1 , . . . , uk−1 ) and by uji for 0 0 ≤ i, j < k the vector (ui , ui+1 , . . . , uj ), which is empty if i > j. For 6 Introduction A ⊂ Zk , we denote by uA the row vector (ui |i ∈ A). The element at row i and column j of a matrix A is denoted by [A]i,j , where the indexes i, j start both at 0. We denote by Im the m × m identity matrix. The Kronecker product of a k × l matrix A and a m × q matrix B is denoted by A ⊗ B and defined by [A]0,0 B ... [A]0,l−1 B .. .. .. A⊗B , (1.1) . . . [A]k−1,0 B . . . [A]k−1,l−1 B which is a km × lq matrix. For any positive integer n, the Kronecker power A⊗n is defined by A⊗n , A ⊗ A ⊗ . . . ⊗ A . {z } | n times (1.2) We denote by P (A) the probability of an event A. We denote by p X |Y (x|y) the distribution of the random variable X given Y = y which is often abbreviated to p(x|y). The Hamming weight wH (i) of a non-negative integer i is defined as the number binary expansion of i. The Hamming weight of ones in the k−1 wH uk−1 of a vector u with elements in a finite field denotes the 0 0 number of elements of uk−1 that are different from 0. 0 1.4 Important Concepts There are some well-known concepts that we use over and over again in this thesis. We discuss these concepts here by giving a compact summary. 1.4.1 Maximum a Posteriori U0k−1 Let be k discrete random variables and let Y be a random variable that is observed. The block maximum a posteriori (block MAP) decision of U0k−1 based on Y is defined as y . ûk−1 (1.3) (y) = argmax p uk−1 0 0 uk−1 0 In general, dropping or adding a variable in the block MAP decision (1.3) may change the decision. The block MAP decision minimizes the block error probability P U0k−1 6= Û0k−1 (Y ) (1.4) 1.4 Important Concepts 7 independently of how ties are broken in (1.3). The symbol-wise block MAP decision of U0k−1 based on Y is defined by y ûi (y) = argmax max p uk−1 (1.5) 0 ui u0 ,...,ui−1 ,ui+1 ,...,uk−1 for all i ∈ Zk . If there is a unique maximizer in (1.3) then performing (1.5) for all i ∈ Zk is equivalent to (1.3) and is often used in practice instead of (1.3). However, if there are several maximizers in (1.3) then (1.5) is not equivalent to (1.3) in general. Interestingly, the example below shows that the rule (1.5) could even produce decisions that do not y . maximize p uk−1 0 Example 1.1. Let U0 , U1 be binary random variables and let ( 1/2, if u0 = u1 , p(u0 , u1 |y) = 0, otherwise. (1.6) Then the rule (1.3) produces either (0, 0) or (1, 1). However, the rule (1.5) applied to i = 0, 1 could produce all possibilities (0, 0), (1, 0), (0, 1) or (1, 1). Clearly, the possibilities (u0 , u1 ) = (1, 0) and (u0 , u1 ) = (0, 1) do not maximize p(u0 , u1 |y). Thus, we have to be careful in applying (1.5). However, the symbolwise decisions obtained by Algorithm 1 are block MAP decisions. Algorithm 1 Successive symbol-wise block MAP 1: for i = 0 to k − 1 do k−1 2: ûi (y) ← argmaxui maxui+1 ,...,uk−1 p ûi−1 y 0 (y), ui 3: end for It is often not feasible to compute the block MAP decisions. However, it is very useful to know the block error probability of a block MAP decoder as a benchmark. Suppose a practical decoder produces the decisions ũk−1 (y). With the procedure that was used in [35], we can 0 obtain an empirical lower bound on the block error probability of the block MAP decoder: 1. Generate m independent samples uk−1 (i), yi , i ∈ Zm , from the 0 distribution pU k−1 ,Y uk−1 ,y . 0 0 2. Compute the decisions of the practical decoder ũk−1 (yi ) for every 0 i ∈ Zm . 8 Introduction 3. The number of samples where (i) yi (yi ) yi > p U k−1 | Y uk−1 p U k−1 | Y ũk−1 0 0 0 0 (1.7) divided by m is the empirical lower bound. Note that (1.7) implies that there is at least one uk−1 which has a higher 0 probability than uk−1 (i) and hence the block MAP decoder in (1.3) 0 would not have decided for the correct values uk−1 (i). 0 The MAP decision of Ui based on Y is defined as X y ûi (y) = argmax p uk−1 (1.8) 0 ui u0 ,...,ui−1 ,ui+1 ,...,uk−1 which is equal to ûi (y) = argmax p(ui |y). (1.9) ui The MAP decision minimizes the error probability of Ui P Ui 6= Ûi (Y ) . 1.4.2 (1.10) Channels A channel is usually defined as an input alphabet X , an output alphabet Y and transition probabilities p Y |X (y|x), x ∈ X , y ∈ Y [7, pp. 183-184]. However, this thesis is about polar codes, where channels naturally arise as pairs of random variables, the input X and the output Y . Therefore, we use this unusual definition of the term channel. A channel with input X and output Y is denoted by X → Y . Note that with this definition, a prior on the input pX (x) is implicitly included in a channel as in the following example. Example 1.2 (Uniform and Binary-Input Additive White Gaussian Noise Channel). Let X be uniformly distributed over Z2 . Let Z be independent of X and Gaussian distributed with mean 0 and variance σ 2 . The uniform and binary-input additive white Gaussian noise chan nel with noise variance σ 2 , denoted as UBAWGNC σ 2 , is the channel X → Y with Y , µ(X) + Z where µ is the mapping ( 1, if x = 0, µ(x) , (1.11) −1, if x = 1. 1.4 Important Concepts 9 The signal-to-noise ratio (SNR) of the UBAWGNC σ 2 is defined as the ratio between the variance of µ(X) to the variance of the noise Z and hence equals 1/σ 2 . Definition 1.1 (MAP Decision of a Channel). The MAP decision of the channel X → Y is the MAP decision of X based on Y . Let I(X; Y ) and H(X |Y ) denote the mutual information and the conditional entropy, respectively, as defined in [7, Ch. 2]. Definition 1.2 (Channel Parameters). The mutual information of the channel W , X → Y is defined by I(W ) , I(X; Y ), (1.12) H(W ) , H(X |Y ), (1.13) the entropy of W by and the error probability of W by Pe (W ) , P X 6= X̂(Y ) , (1.14) where X̂(Y ) is the MAP decision of W . In the following, our notation assumes that the output alphabet of a channel is discrete. However, the case where it is continuous can be handled analogously. Definition 1.3 (Stochastic Degradation). Let W1 and W2 be the channels X1 → Y1 and X2 → Y2 , respectively and let both channels share the same input alphabet. We say W1 is stochastically degraded with respect to W2 , denoted as W1 W2 , if probability distributions p(y1 |y2 ) exist such that X pX1 ,Y1 (x, y1 ) = p(y1 |y2 )pX2 ,Y2 (x, y2 ). (1.15) y2 If we sum over y1 on both sides of (1.15), we see that a necessary condition for W1 W2 is that both inputs X1 , X2 have the same distribution pX1 (x) = pX2 (x). (1.16) 10 Introduction Example 1.3 (Omit to Degrade). Let W be the channel X → (Y1 , Y2 ) and W 0 be the channel X → Y1 . Then W 0 W which can be seen by using Definition 1.3 with probability distributions ( 1, if y10 = y1 , 0 p(y1 |y1 , y2 ) = (1.17) 0, otherwise. Thus, a channel gets stochastically degraded if parts of the output are omitted. Proposition 1.1 (Degradation is Transitive). Let W1 , W2 , W3 be three channels. If W1 W2 and W2 W3 then W1 W3 . Proof. Let Wi be the channel Xi → Yi for i = 1, 2, 3. According to the assumptions, there are probability distributions p(y1 |y2 ) and p0 (y2 |y3 ) such that X pX1 ,Y1 (x, y1 ) = p(y1 |y2 )pX2 ,Y2 (x, y2 ), (1.18) y2 pX2 ,Y2 (x, y2 ) = X p0 (y2 |y3 )pX3 ,Y3 (x, y3 ). (1.19) y3 We combine equations (1.18) and (1.19) to get X X pX1 ,Y1 (x, y1 ) = p(y1 |y2 ) p0 (y2 |y3 )pX3 ,Y3 (x, y3 ) y2 (1.20) y3 ! = X X y3 0 p(y1 |y2 )p (y2 |y3 ) pX3 ,Y3 (x, y3 ) (1.21) y2 which proves W1 W3 with distributions X p00 (y1 |y3 ) , p(y1 |y2 )p0 (y2 |y3 ). (1.22) y2 The following proposition is proved in the Appendix. Proposition 1.2 (Degradation and Channel Parameters). If W1 W2 then I(W1 ) ≤ I(W2 ), (1.23) H(W1 ) ≥ H(W2 ), (1.24) Pe (W1 ) ≥ Pe (W2 ). (1.25) 1.4 Important Concepts 11 U1 U2 U1 f g0 U0 g1 U3 (a) U2 U3 U0 g2 g3 (b) Figure 1.2: (a) The factor graph for (1.26). (b) The factor graph for (1.27). Proposition 1.2 shows that stochastic degradation can be reasonably used to order channels by their quality. However, note that in general no stochastic degradation order between two channels needs to exist. 1.4.3 Factor Graphs An introduction to factor graphs can be found in [19]. We use Forneystyle factor graphs [9] with some minor changes. A factor graph is a graphical representation of a factorization of a function. Since there is no unique factorization of a function, there is no unique factor graph of a function. For example, suppose f (u0 , u1 , u2 , u3 ) can be factorized in the following two ways f (u0 , u1 , u2 , u3 ) = f (u0 , u1 , u2 , u3 ), (1.26) f (u0 , u1 , u2 , u3 ) = g0 (u0 )g1 (u0 , u1 , u3 )g2 (u1 , u2 )g3 (u2 , u3 ). (1.27) The factor graphs for these two factorizations are shown in Figure 1.2. Every edge in a factor graph corresponds to a variable whereas every node (rectangular box) corresponds to a factor in the factorization. An edge is connected to a node if and only if the factor corresponding to that node depends on the variable corresponding to that edge. In this thesis, all functions take values in the real numbers and scale factors are not important. Hence, we say that a factor graph represents the function f (uk−1 ) even if it is a factorization of the function c·f (uk−1 ) 0 0 where c is a positive real number independently of uk−1 . Moreover, 0 almost all variables are discrete and thus our notation assumes discrete alphabets. Often factor graphs represent the factorization of a probability distribution. Then, factor graphs are a tool to understand and derive algorithms that efficiently solve many statistical inference problems. Such 12 Introduction Zm−1 .. . g Z0 U Figure 1.3: Auxiliary visualization for SPMP and MPMP. algorithms are often based on message passing. In a message passing → − algorithm, a forward message − µ and a backward message ← µ is assigned to every edge in the factor graph which are functions of the variable of that edge. Sum-product message passing (SPMP) and max-product message passing (MPMP) are the two most prominent message passing algorithms and are defined below. Sum-product (See Figure 1.3) The message along the edge U in the direction of the arrow is the function X X → − → → µU (u) , ... g(u, z0 , . . . , zm−1 )− µZ0 (z0 ) · · · − µZm−1 (zm−1 ) (1.28) z0 zm−1 → where − µZi (zi ) is the message along the edge Zi in the direction of the → arrow. If the edge U is an open edge then − µU (u) , 1 for all u. Max-product (See Figure 1.3) The message along the edge U in the direction of the arrow is the function − → → → µU (u) , max . . . max g(u, z0 , . . . , zm−1 )− µZ0 (z0 ) · · · − µZm−1 (zm−1 ) (1.29) z0 zm−1 → where − µZi (zi ) is the message along the edge Zi in the direction of the → arrow. If the edge U is an open edge then − µU (u) , 1 for all u. Suppose we have a cycle-free factor graph that represents the prob y . If we apply ability distribution on U0k−1 given by p U k−1 | Y uk−1 0 0 SPMP to every edge in that factor graph in both directions we get that − → − µUi (ui )← µUi (ui ) ∝ p Ui |Y (ui |y) (1.30) where ∝ means equality up to a positive scale factor that is independent of ui (but possibly dependent on y). Therefore, the MAP decision of Ui 1.4 Important Concepts 13 given Y can be obtained by calculating → − ûi = argmax − µUi (ui )← µUi (ui ). (1.31) ui Similarly, if we apply MPMP we obtain → − − µUi (ui )← µUi (ui ) ∝ max u0 ,...,ui−1 ,ui+1 ,...,uk−1 y (1.32) p U k−1 | Y uk−1 0 0 and thus symbol-wise block MAP (1.5) can also be obtained by (1.31). Let δx,y denote the {0, 1}-valued function that is 1 if and only if x = y. Table 1.1 shows some basic building blocks for larger factor graphs. Note that in general the factors of the permutation and the parity-check node are not symmetric with respect to the involved variables, which is indicated by a black dot in the factor graph. For the binary field, the black dot is omitted since all permutations are their own inverse and x ⊕ y = z is equivalent to x ⊕ y ⊕ z = 0, which is symmetric in x, y, z. Proposition 1.3 (Observations). Let M be a factor graph for the dis, y on U0k−1 , Y . If we add an observation node tribution pU k−1 ,Y uk−1 0 0 with value y 0 to the edge corresponding to Y , then the resulting factor 0 y on U k−1 . graph represents the probability distribution p U k−1 | Y uk−1 0 0 0 Proof. The factor graph of M together with the observation node rep resent the function pU k−1 ,Y uk−1 , y δy,y0 . By summing over y, we get 0 0 X , y0 (1.38) pU k−1 ,Y uk−1 , y δy,y0 = pU k−1 ,Y uk−1 0 0 0 0 y 0 y up to the scale factor pY (y 0 ) which is which equals p U k−1 | Y uk−1 0 0 independent of uk−1 . 0 Proposition 1.3 is very important, because it tells us how to use observed values in a factor graph. Definition 1.4 (Factor Graph Equivalence). Let the factor graphs M and L represent the functions fM (uC , uM ) and fL (uC , uL ), respectively. We say M and L are uC -equivalent if they represent the same marginal with respect to uC , i.e. X X fM (uC , uM ) ∝ fL (uC , uL ) (1.39) uM uL for all uC , where ∝ means equality up to a positive scale factor that is independent of uC . 14 Introduction Node name Graph X x0 observation Factor g(x) , δx,x0 (1.33) where x0 is the observed value of X. X πσ permutation g(x, y) , δx,σ(y) Y (1.34) where X and Y take values in the same finite set and σ is a permutation on that set. Y parity-check X ⊕ g(x, y, z) , δz,x⊕y (1.35) Z where ⊕ denotes addition in a finite field. Y equality X = Z g(x, y, z) , δx,y δy,z (1.36) X channel W Y g(x, y) , pX,Y (x, y) (1.37) where W is the channel X → Y . Table 1.1: The factor graph notation of some basic nodes. 1.4 Important Concepts 15 If the set of variables uC is understood from the context, we just say M and L are equivalent. Proposition 1.4. Let the two factor graphs M and L represent the probability distribution pM (uC , uM |y) and pL (uC , uL |y), respectively. If M and L are uC -equivalent then they represent the same probability distribution on uC . Moreover, for any i ∈ C, the MAP estimate of Ui based on Y is identical for the probability distribution of M and L. Proof. In this case, we can simplify (1.39) to pM (uC |y) = pL (uC |y) for all uC , which proves the first part. The MAP estimate of Ui based on Y for the distribution of M is given by X ûM pM (uC , uM |y) (1.40) i (y) = argmax ui uC\{i} ,uM which can be written as ûM i (y) = argmax ui X pM (uC |y). (1.41) uC\{i} The same calculation reveals the MAP estimate of Ui based on Y for the distribution of L X ûL pL (uC |y) (1.42) i (y) = argmax ui uC\{i} and because pM (uC |y) = pL (uC |y) for all uC , the proposition is proven. The following proposition is proved in the Appendix. Proposition 1.5 (Some Equivalences). Every subfigure in Figure 1.4 is an equivalence relative to all variables outside the dashed box. This is still valid if the same factor graph is added on both sides of a subfigure in Figure 1.4 whose represented function does only depend on variables outside the dashed box. The following proposition is useful to go from SPMP to MPMP and vice versa. Proposition 1.6 (Max Equals Sum). Let the function g(x, z) be nonnegative and x, z take values in the finite sets X , Z, respectively. If for all z ∈ Z there is at most one x ∈ X such that g(x, z) > 0 then X g(x, z) = max g(x, z). (1.43) x x 16 Introduction Y X ⊕ ≡ Z X Z (a) X y0 Y ⊕ ≡ Z X πσ Z (b) Y X = ≡ Z X = Z (c) X y0 Y = ≡ Z X y0 y0 Z (d) Y πσ ≡ X X (e) y0 Y πσ ≡ X σ(y 0 ) X (f) Figure 1.4: Some equivalences according to Definition 1.4. Note that the permutation σ in (b) is given by σ(x) = x ⊕ y 0 . 1.4 Important Concepts 17 Proof. Fix any z ∈ Z. If there is no x ∈ X such that g(x, z) > 0 then both sides of (1.43) are zero. If there is one x ∈ X such that g(x, z) > 0, which we denote by xz , then both sides of (1.43) are equal to g(xz , z). 1.4.4 Permutations A permutation on m letters is a bijective mapping from Zm to itself and is completely defined by its two-line notation 0 1 ... m−1 π= . (1.44) π(0) π(1) . . . π(m − 1) The permutation matrix of a permutation π on m letters is the unique m × m matrix P with elements in {0, 1} that satisfies (x0 , x1 , . . . , xm−1 )P = xπ(0) , xπ(1) , . . . , xπ(m−1) (1.45) for all vectors x. It follows that T P (x0 , x1 , . . . , xm−1 ) = xπ−1 (0) , xπ−1 (1) , . . . , xπ−1 (m−1) T . (1.46) Let Pπ1 and Pπ2 be the permutation matrices of the permutations π1 and π2 on m letters, respectively. Then with Pπ being the permutation matrix of π , π1 ◦ π2 , we get Pπ 1 Pπ 2 = P π . (1.47) Chapter 2 Polar Codes In this chapter, we review and discuss polar codes [1]. Note that throughout this chapter and also in Chapters 3 and 5, n is a non-negative integer, the blocklength N equals 2n and arithmetic is over the binary field. Polar codes belong to the family of Gn -codes. 2.1 Gn -Codes Let the binary matrix G1 be defined by G1 , 1 1 0 . 1 (2.1) The factor graph notation of G1 is shown in Figure 2.1. The N × N matrix Gn is recursively defined according to Figure 2.2. As an example, Figure 2.3 shows G3 . Note that Gn is its own inverse, i.e. G−1 n = Gn . U0 U1 G1 X0 X1 Figure 2.1: The factor graph notation of G1 . The node G1 represents the {0, 1}-valued factor that is 1 if and only if u10 = x10 G1 . 19 20 Polar Codes U0 X0 G1 U1 .. . .. . .. . .. . Gn−1 .. . XN/2−1 U2i G1 U2i+1 XN/2 .. . .. . .. . .. . Gn−1 UN −2 UN −1 .. . G1 XN −1 Gn −1 uN 0 −1 Figure 2.2: The relation = xN Gn is shown, where Gn is re0 cursively constructed based on two copies of Gn−1 U0 U1 U2 U3 U4 U5 U6 U7 G1 G1 G1 G1 G1 G1 G1 G1 G1 G1 G1 G1 X0 X1 X2 X3 X4 X5 X6 X7 Figure 2.3: The relation u70 = x70 G3 is shown where G3 is decomposed recursively according to Figure 2.2. 2.2 The Probability Setup 21 Definition 2.1 (Gn -codes). The family of Gn -codes contains all codes of the form n o |I | −1 Cn (F) , uN Gn uI ∈ Z2 , uF = 0 (2.2) 0 with F ⊂ ZN and I = ZN \F. The bits uF are called frozen bits. The bits uI are called information bits because they are used to transmit the K bits of the message. Note that the values of the frozen bits uF are defined to be zero in (2.2) for simplicity and because for symmetric transmitting channels and for many decoders the performance is independent of the values of uF (see [1, Sec. VI.B] for a proof for the SCD). For example, Reed-Muller codes are a well-known subset of Gn -codes. Definition 2.2 (Reed-Muller Code). A Reed-Muller code of blocklength N and rate R is a Gn -code Cn (F) with F being a set of bN (1 − R)c indexes i ∈ ZN of minimal Hamming weight wH (i). Example 2.1. The Hamming weights of the eight indexes i ∈ Z8 are (0, 1, 1, 2, 1, 2, 2, 3). Thus, a possible Reed-Muller code of blocklength 8 and rate 5/8 is the G3 -code C3 (F) with F = {0, 1, 4}. The encoding complexity of a Gn -code is O(N log(N )) since Gn consists of nN/2 copies of G1 and of n − 1 permutations on N letters (see for example Figure 2.3). 2.2 The Probability Setup Let the channels Xi → Yi , i ∈ ZN , be independent and identically distributed to the uniform and binary-input channel W . We define U0N −1 , X0N −1 Gn . (2.3) A factor graph for the probability distribution of U0N −1 , X0N −1 , Y0N −1 is shown in Figure 2.4. 2.3 Genie-Aided Successive Cancellation Decoding The genie-aided SCD is a tool to understand and construct polar codes [1]. It can not be used for decoding because it needs a genie that reveals the bits U0N −1 . 22 Polar Codes X0 U0 X1 U1 .. . Gn .. . XN −2 UN −2 XN −1 UN −1 W Y0 W .. . Y1 .. . W YN −2 W YN −1 Figure 2.4: A factor graph for the probability distribution of U0N −1 , X0N −1 , Y0N −1 . The node Gn represents the {0, 1}−1 valued factor that is 1 if and only if uN = x0N −1 Gn . 0 2.3.1 Definition (i) For i ∈ ZN , let Wn denote the channel Wn(i) , Ui → Y0N −1 , U0i−1 . (2.4) (i) The genie-aided SCD successively computes the MAP decisions for Wn . Note that U0i−1 are in general not known to the receiver, but are assumed (i) to be provided by a genie. The MAP decision for Wn ûi = argmax p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1 (2.5) 0 0 0 ui could be computed according to (2.8) given by the following derivation ûi = argmax ui 1 X −1 N −1 p U N −1 | Y N −1 uN y0 0 0 c N −1 0 (2.6) ui+1 1 X −1 = argmax p X N −1 | Y N −1 uN Gn y0N −1 0 0 0 c N −1 ui (2.7) ui+1 = argmax ui N −1 −1 1 X Y p Xi |Yi uN Gn i yi 0 c N −1 j=0 (2.8) ui+1 N −1 y where c = p U i−1 | Y N −1 ui−1 is independent of ui and hence ir0 0 0 0 relevant for the decision on Ui . However, computing ûi according to (2.8) is not feasible for practical blocklengths N . But the MAP decision 2.3 Genie-Aided Successive Cancellation Decoding 23 (i) for Wn can be computed efficiently [1] by making use of the recursive construction of Gn in Figure 2.2. This can be seen by noting that the (i) factor graph for Wn is essentially a tree [22]. We elaborate on that in the following subsection. 2.3.2 The Tree Factor Graph for Wn(i) We need the equivalences in Figure 2.5, which are proved now. In Figure 2.5a, the left hand side and the right hand side represent fl (x0 , x1 , u0 , u1 ) = δ(u0 ,u1 ),(x0 ,x1 )G1 (2.9) fr (x0 , x1 , u0 , u1 , s) = δu1 ,x1 δx1 ,s δu0 ,x0 ⊕s (2.10) respectively. The definition of G1 in (2.1) implies fl (x0 , x1 , u0 , u1 ) = δu1 ,x1 δu0 ,x0 ⊕x1 By summing fr over s, we get X fr (x0 , x1 , u0 , u1 , s) = δu1 ,x1 δu0 ,x0 ⊕x1 (2.11) (2.12) s which equals fl in (2.11) and thus we proved Figure 2.5a. Figures 2.5b and 2.5c follow from Figures 1.4e and 1.4f by noting that G1 is bijective and therefore a permutation. In Figure 2.5d, the left hand side represents again the function (2.11) and the right hand side represents fr (x0 , x1 , u0 ) = δu0 ,x0 ⊕x1 By summing fl over u1 , we get X fl (x0 , x1 , u0 , u1 ) = δu0 ,x0 ⊕x1 (2.13) (2.14) u1 which equals fr and hence we proved Figure 2.5d. Finally, Figure 2.5e follows from Figures 2.5a and 1.4b. (i) To derive the tree factor graph for Wn , we start with the factor graph in Figure 2.6 which represents the distribution −1 −1 N −1 p uN , xN y0 , ui−1 (2.15) 0 0 i where y0N −1 , ui−1 are fixed values. Suppose we replace the node Gn by 0 the equivalent factor graph in Figure 2.2. We can use Figure 2.5 to get 24 Polar Codes U0 X0 ⊕ X0 U1 S = X1 ≡ G1 U1 U0 X1 (a) X0 X0 ≡ G1 X1 X1 (b) u0 u0 ⊕ u1 X0 X0 ≡ G1 u1 X1 u1 X1 (c) U0 U0 X0 ⊕ X0 ≡ G1 X1 X1 (d) u0 X0 ≡ G1 U1 X1 U1 πσ X0 S = X1 (e) Figure 2.5: Useful equivalences that are mostly based on Figure 1.4. Note that the permutation in (e) is given by σ(x0 ) = x0 ⊕ u0 . 2.3 Genie-Aided Successive Cancellation Decoding X0 u0 .. . ui−1 W y0 .. . Gn Ui .. . UN −1 25 XN −1 W yN −1 Figure 2.6: The factor graph from Figure 2.4 if additionally U0i−1 = ui−1 and Y0N −1 = y0N −1 are known. 0 W y0 .. . Gn−1 .. . .. . Ui W yN/2−1 ⊕ W yN/2 .. . Gn−1 .. . .. . W yN −1 Figure 2.7: This factor graph and the one from Figure 2.6 are Ui −equivalent. We assumed i to be even but the case when i is odd can be handled analogously. 26 Polar Codes ⊕ U4 ⊕ ⊕ (4) π W y0 = W y1 π W y2 = W y3 π W y4 = W y5 π W y6 = W y7 Figure 2.8: The tree factor graph of W3 . Note that the four permutations depend on the values u30 which is omitted in the notation for the sake of clarity. 2.3 Genie-Aided Successive Cancellation Decoding 27 the factor graph of Figure 2.7. Using the recursion from Figure 2.2 and the equivalences from Figure 2.5 over and over again leads to a cycle-free factor graph that is tree-like and which is called the tree factor graph (i) of Wn . An example for n = 3 and i = 4 is shown in Figure 2.8. The (i) tree factor graph of Wn and the factor graph from Figure 2.6 are Ui equivalent. Therefore, Proposition 1.4 implies that the MAP decision of Ui coincides in both factor graphs. Note that the MAP decision of Ui for (i) the distribution (2.15) equals the MAP decision for Wn . Thus, as the (i) tree factor graph of Wn is cycle-free and tree-like, the MAP decision (i) of Wn can be efficiently computed by SPMP on the tree factor graph (i) of Wn (see (1.30) and (1.31)). Note that although message passing usually consists of computing the forward and the backward messages, here we only need to compute the messages in the direction towards Ui as we want to decide on Ui only. 2.3.3 Max-Product on the Tree Factor Graph In this section, we show what the decision on Ui according to (1.31) means if MPMP instead of SPMP is used on the tree factor graph of (i) denote the variables of all edges of the tree factor graph Wn . Let sm−1 0 (i) for Wn that do not correspond to Ui , X0N −1 or Y0N −1 . The tree factor (i) −1 m−1 N −1 graph for Wn represents the distribution p ui , xN , s y0 , u0i−1 . 0 0 (i) Suppose we use MPMP on the tree factor graph for Wn and finally decide for Ui according to (1.31). Then by (1.32), we get the decision ûi = argmax ui max −1 m−1 xN ,s0 0 −1 m−1 N −1 p ui , xN , s0 y0 , ui−1 0 0 (2.16) which is the same as ûi = argmax ui max x0N −1 ,s0m−1 X N −1 i−1 −1 y p uN , x0N −1 , sm−1 , u0 . (2.17) 0 0 i N −1 ui+1 N −1 We can use Proposition 1.6 to transform the sum over ui+1 in (2.17) into a maximization, because probabilities are non-negative and because −1 N −1 uN i+1 is determined by x0 ûi = argmax ui max N −1 N −1 m−1 ui+1 ,x0 ,s0 N −1 i−1 −1 y p uN , x0N −1 , sm−1 , u0 . (2.18) 0 0 i 28 Polar Codes Moreover, we can transform the maximization over x0N −1 , sm−1 in (2.18) 0 −1 m−1 −1 into a sum, because xN , s0 are determined by uN , hence 0 0 X −1 −1 m−1 N −1 y0 , ui−1 (2.19) p uN , xN , s0 ûi = argmax max 0 0 i ui −1 uN i+1 −1 m−1 ,s0 xN 0 −1 N −1 = argmax max p uN y0 , ui−1 . 0 i ui −1 uN i+1 (2.20) From (2.20), we see that ûi is the decision on Uiof the symbol-wise block MAP decision of UiN −1 based on Y0N −1 , U0i−1 . 2.4 Polarization It can be shown [1] that if W is a uniform and binary input channel with finite output alphabet then for any δ ∈ (0, 1) n o (i) i ∈ ZN I(Wn ) ∈ [0, δ) = 1 − I(W ), (2.21) lim n→∞ N n o (i) i ∈ ZN I(Wn ) ∈ (1 − δ, 1] = I(W ). (2.22) lim n→∞ N Thus, as the blocklength goes to infinity a fraction of I(W ) of the chan(i) (i) nels Wn will be almost noise-free (I(Wn ) almost one) and the remain(i) ing fraction 1 − I(W ) of channels will be almost useless (I(Wn ) almost zero). This phenomenon was called polarization. 2.4.1 Definition of Polar Codes (i) Let IPC be the index set of the dN Re channels with largest I(Wn ) and let FPC = ZN \IPC . Definition 2.3 (Polar Code). A polar code of blocklength N and rate R is the Gn -code Cn (FP C ). As IPC and FPC depend on the transmitting channel W , the polar code itself depends on W as well. If R < I(W ) then according to (2.22) for n → ∞ the polar code contains only almost noise-free channels. Proofs that the convergence to noise-free channels happens fast enough to counteract the growing number of information bits are given in [1] and [2]. Therefore, polar codes achieve I(W ), which is the capacity for a uniform input distribution. 2.5 Successive Cancellation Decoder 2.4.2 29 Construction of Polar Codes An algorithm which determines the index set IPC of the information bits of a polar code is called a code construction algorithm. An obvious way for a code construction algorithm is based on Monte Carlo simulations [1]. Namely, do Monte Carlo simulations and use the genie-aided (i) SCD until you have for every i ∈ ZN an estimate of I(Wn ) that allows (i) to reliably decide if I(Wn ) belongs to the dN Re largest values. Unfortunately, the number of Monte Carlo samples needed can be intractably high. Several other algorithms have been proposed to construct polar codes: Density evolution in [22] and a Gaussian approximation for density evolution in [14], [38]. In [36], efficient degrading and upgrading quantization methods were proposed. 2.5 Successive Cancellation Decoder The SCD is very similar to the genie-aided SCD, but it does not require a genie and can therefore be used to decode a Gn -code (and thus also a polar code) in practice. 2.5.1 Definition The SCD for the Gn -code Cn (F) computes decisions of the bits u0N −1 successively according to ( argmaxui p Ui | Y N −1 ,U i−1 ui y0N −1 , ûi−1 , if i ∈ I, 0 0 0 ûi , (2.23) 0, otherwise. For i ∈ I, if the past decisions ûi−1 are correct then ûi corresponds to 0 (i) the MAP decision of the channel Wn . Hence, the rule to choose the information bits of a polar code is matched to SCD. 2.5.2 Complexity (i) The MAP decision of the channel Wn can be computed by doing SPMP (i) on the tree factor graph for Wn . Many messages can be shared between (i) message passing on the tree factor graphs for Wn for different i ∈ ZN . −1 More precisely, for doing SCD of one block uN , we need to compute 0 for every G1 the two messages along U0 and U1 in Figures 2.5d and 2.5e, 30 Polar Codes respectively. As Gn consists of nN/2 copies of G1 , the complexity of −1 decoding the whole block uN is O(N log(N )). 0 2.5.3 Successive Cancellation and Block MAP Block MAP decoding of the Gn -code Cn (F) is given by ûI = argmax p UI | Y N −1 ,UF uI y0N −1 , 0 . uI (2.24) 0 It leads to the lowest block error probability among all decoders but is not practical. The SCD on the other hand is practical but leads to a higher block error probability than block MAP. The difference in block error probability can be large [35], [21]. Hence, it is interesting to understand the difference between block MAP decoding and SCD. We discuss this difference based on the decision on one bit Ui , i ∈ I. The SCD does a MAP decision of Ui based on Y0N −1 , U0i−1 and uses the previous decisions ûi−1 as observed values of U0i−1 . A block 0 MAP decoder does a block MAP decision of UI based on Y0N −1 , UF . Thus, there are the following differences 1. The SCD computes MAP decisions (1.8). The block MAP decoder computes block MAP decisions (1.3). 2. Compared to the block MAP decoder, the SCD uses the additional observations U0i−1 \UF . However, the SCD uses previous decisions and not necessarily the true values for them. 3. Compared to the SCD, the block MAP decoder uses the additional observations UF \U0i−1 . In Chapter 5, we will see that the ignorance of the known values of UF \U0i−1 is the main culprit for the performance loss of SCD. Hence, to come close to a block MAP decoder, we need to take into account at least some of the known bits UF \U0i−1 when deciding on Ui . The reason why the SCD ignores the known values of UF \U0i−1 is to maintain its low complexity. Namely, the argument used to obtain a tree-like factor graph in Section 2.3.2 would break down if the values of the bits UF \U0i−1 were not ignored. 2.6 Characteristics 2.6 31 Characteristics Polar codes have many favorable characteristics: 1. Low-complexity encoder (see Section 2.1) 2. Low-complexity decoder (see Section 2.5.2) 3. Capacity-achieving for binary-input discrete memoryless channels [1], [10] and for much more channels 4. No error floor [21] 5. Explicit and efficient construction (see Section 2.4.2) However, the major undesirable characteristic of polar codes is the slow speed at which polarization takes place and hence they are weak at practical blocklengths [35]. The following section gives an incomplete overview of what can be done to improve the performance of polar codes at practical blocklengths. 2.7 Performance Improvements The measures to improve the performance of polar codes can be split into the two groups: 1. Improve upon the SCD, 2. Improve polar codes themselves. However, finally the combination of code and decoder needs to perform well with an acceptable complexity. 2.7.1 Improve the Decoder A practical decoder needs a good tradeoff between performance and complexity. Two practical decoders, both applicable to Gn -codes in general, with a lower block error probability than the SCD are discussed roughly. 32 Polar Codes Iterative Sum-Product Message Passing Iterative SPMP can be used on a factor graph with cycles and hence the known bits UF \U0i−1 that are ignored by the SCD (see Section 2.5.3) can be used. The iterative SPMP decoder has a low complexity and simulation results show that it can lead to a marked block error probability reduction [11]. However, improving the iterative SPMP is not straightforward. Successive Cancellation List Decoder [35] The idea behind SCLD of Gn -codes is as follows. Let L be a positive integer and let L−1 , ∅. The SCLD works successively through steps i = 0, 1, . . . , N − 1. After every step i, a list of maximal L estimates Li = ûi0 (l) l ∈ ZL (2.25) of U0i is generated according to: 1. If Ui is a frozen bit then Li = Li−1 × {0}. i 2. If Ui is an information bit then Li contains the L estimates û0 from Li−1 × Z2 with largest probabilities p U i | Y N −1 ûi0 y0N −1 . 0 0 After the last step i = N − 1, the most probable estimate from the list −1 LN −1 is the final decision for uN . 0 SCLD of Gn -codes has space complexity O(LN ) and time complexity O(LN log(N )). SCLD leads to a remarkable block error probability reduction. For many parameters, SCLD of polar codes achieves the performance of a block MAP decoder even for moderate list sizes L. However, even block MAP decoding of polar codes has still a quite poor performance. Therefore, to make polar codes useful in practice the codes themselves need to be improved as well. 2.7.2 Improve the Code There are many ways of improving a polar code. Two thereof are discussed roughly. Use Other Transforms Than G1 Other transforms than G1 can be used in the recursive construction of (i) Figure 2.2. To keep the property that the factor graph of Wn is treelike, the transform should be bijective. 2.7 Performance Improvements 33 Binary matrices of dimensions larger than 2 × 2 are used instead of G1 in [13]. They showed that there exists no invertible, binary matrix of dimension smaller than 15 × 15 with a better polarization rate than G1 . Moreover, they constructed a 16 × 16 binary matrix with better polarization rate than G1 , but SCD of a 16×16 matrix is computationally demanding. Non-linear transforms over the binary field are considered in [27]. Explicit non-linear transforms were presented with dimensions 14 × 14, 15 × 15 and 16 × 16 and better polarization rates than the best linear transforms of the same dimensions. Linear transforms over an arbitrary finite field are considered in [25]. They showed that Reed-Solomon matrices have the largest possible polarization rate for a given transform and field size. However, SCD of Reed-Solomon matrices is impractical except for small matrices (see Section 4.4.2). It was empirically shown that replacing G1 by a reasonably small Reed-Solomon matrix of size 4 × 4 reduces the block error probability substantially. In [26], mixed transforms were presented, i.e., transforms that depend on the location in the recursive construction of Figure 2.2. They empirically compared several explicit mixed transforms to standard polar codes under SCLD and they found a performance and complexity advantage in favour of their mixed transforms. Use Other Gn -Codes than Polar Codes Polar codes are matched to the SCD. If a more powerful decoder is used, it is manifest to conjecture that additionally to the gain due to the more powerful decoder, choosing a Gn -code other than polar codes would improve the performance even more. For example for the binary erasure channel, it is empirically known [21] that Reed-Muller codes perform better than polar codes under block MAP decoding. As another example, [21] introduced a family of Gn -codes that is parametrized by α ∈ (0, 1], and interpolates between polar (α = 1) and Reed-Muller codes (α = 0). They empirically showed that the block error probability under SCLD or iterative SPMP is lowered by picking a code with α between 0 and 1, i.e., between polar and Reed-Muller codes. Chapter 3 A Partial Order for the Synthesized Channels of Successive Cancellation Decoding (i) A partial order (PO) for the channels Wn is presented that is independent of the transmitting channel W . The PO is based on the observation (j) (i) that a channel Wn is stochastically degraded to Wn if j is obtained by swapping a more significant 1 with a less significant 0 in the binary ex(i) (j) pansion of i. As this order is only partial, not all channels Wn , Wn are (i) comparable. However, knowing the quality of one channel Wn gives us (j) information about the quality of all channels Wn that are comparable (i) to Wn through the PO. (i) In [23], a different PO for the channels Wn was given that is also independent of the transmitting channel W . However, it is completely different from the one in this chapter. The PO in [23] relates channels (i) (j) Wn , Wn with indexes i, j of different Hamming weight. The PO of this chapter relates channels with indexes of the same Hamming weight. The PO of this chapter is combined with the PO from [23]. As an example, for N = 220 and i = 436586, the combined PO implies that (i) 23.06% of all synthesized channels are worse than Wn and 23.07% are 35 36 A Partial Order for the Synthesized Channels of SCD (i) better than Wn , independently of W . Finally, a minimal representation of the combined PO is derived that enables us to practically use the PO for simplifying code construction of polar codes. Part of this chapter was published in [31]. The same combined PO was independently derived by Bardet et al. [3]. This chapter is organized as follows. Section 3.1 introduces the necessary preliminaries. In Section 3.2, we derive different representations of the generator matrix of a polar code. From these representations, we get a sufficient condition for channel degradation, which is the basis of the PO. In Section 3.3, we introduce the PO, derive a minimal representation of the PO and give examples for n = 4, 6. The PO from [23] is reviewed in Section 3.4. In Section 3.5, we combine the PO from Section 3.3 with the PO in [23]. In Section 3.6, we comment on how the combined PO can be used to simplify code construction of a polar code. Finally, Section 3.7 concludes by discussing generalizations of the PO from Section 3.3. 3.1 Preliminaries The definition of a bit significance permutation is given below and an explanatory example is given in Table 3.1. Definition 3.1 (Bit Significance Permutation). Let π be a permutation on m letters. The permutation sπ on 2m letters is called a bit significance permutation and is defined as the mapping k 7→ k 0 where k 0 is the integer with binary representation 0 k00 , k10 , . . . , km−1 = kπ(0) , kπ(1) , . . . , kπ(m−1) (3.1) where k0 , k1 , . . . , km−1 is the binary representation of k. Example 3.1 (Bit-Reversal Permutation). The bit-reversal permutation is a bit significance permutation sπ with 0 1 ... m − 2 m − 1 π= . (3.2) m − 1 m − 2 ... 1 0 For example, Table 3.1 shows that for m = 3 the bit-reversal permutation is given by 0 1 2 3 4 5 6 7 sπ = . (3.3) 0 4 2 6 1 5 3 7 3.1 Preliminaries binary bit permutation integer 37 0 ↓ 000 ↓ 000 ↓ 0 1 ↓ 001 ↓ 100 ↓ 4 2 ↓ 010 ↓ 010 ↓ 2 3 ↓ 011 ↓ 110 ↓ 6 4 ↓ 100 ↓ 001 ↓ 1 5 ↓ 101 ↓ 101 ↓ 5 6 ↓ 110 ↓ 011 ↓ 3 7 ↓ 111 ↓ 111 ↓ 7 Table 3.1: An example of a bit significance permutation on 8 letters. First, the binary representation of every i ∈ Z8 is derived. Second, the bits are permuted by an arbitrary permutation, which reverses the bits in this example. Third, the integer representation is derived. The permutation matrix of the bit-reversal permutation on N letters is denoted by Bn . Definition 3.2 (Partial Order [8, p. 2]). A PO on a set S is a binary relation on S that satisfies for all x, y, z ∈ S that 1. x x (Reflexivity), 2. x y and y x imply x = y (Antisymmetry), 3. x y and y z imply x z (Transitivity). We write x ≺ y if and only if x y and x 6= y. The covering relation of a PO is a minimal representation of the PO. Definition 3.3 (Covering Relation [8, p. 11]). Let S be a set with a PO and let x, y, z ∈ S. Then we say x is covered by z, denoted by x → z, if x ≺ z and if x y ≺ z implies x = y. If the set S is finite, the PO is determined by its covering relation [8, p. 11] and hence the covering relation can be seen as a generator of the PO. Namely, x y if and only if there exists a finite sequence (k ≥ 0) of covering relations between x ∈ S and y ∈ S x = a0 → a1 → . . . → ak = y. (3.4) Definition 3.4 (Channel Equivalence Relation [36]). Let W and W 0 be two channels. If W W 0 and W 0 W then we say W and W 0 are equivalent and denote this by W ≡ W 0 . 38 A Partial Order for the Synthesized Channels of SCD Example 3.2. Let W be the channel X → Y and W 0 be the channel X → f (Y ) where f is a bijective mapping. Then W ≡ W 0 . 3.2 A Sufficient Condition for Degradation In this section, we derive a sufficient condition for stochastic degradation (j) (i) between two channels Wn and Wn . Lemma 3.1 (Gn is Bit Significance Invariant). We can write Gn in the form Gn = P 0 Gn P −1 (3.5) where P is any bit significance permutation matrix of dimensions N × N and P 0 , Bn P Bn−1 . Proof. From [1, Prop. 16] we have Gn = Bn G⊗n 1 . From (1.1) follows that n−1 Y ⊗n [G1 ]ik ,jk (3.6) G1 i,j = k=0 where i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 are the binary representations of i and j, respectively. Let sπ be the bit significance permutation corresponding to P and let C , P −1 G⊗n 1 P . We get ⊗n Ci,j = G1 sπ (i),sπ (j) (3.7) = = = Thus, we proved that G⊗n 1 n−1 Y [G1 ]iπ(k) ,jπ(k) (3.8) k=0 n−1 Y [G1 ]il ,jl l=0 ⊗n G1 i,j = P −1 G⊗n 1 P . It follows ⊗n −1 G⊗n 1 = P G1 P (3.9) (3.10) that (3.11) and we can conclude that Gn = Bn G⊗n 1 = = = −1 Bn P G⊗n 1 P −1 Bn P Bn−1 Bn G⊗n 1 P Bn P Bn−1 Gn P −1 (3.12) (3.13) (3.14) (3.15) 3.2 A Sufficient Condition for Degradation 39 where P 0 , Bn P Bn−1 is a product of permutation matrices and therefore again a permutation matrix. There exist n! different bit significance permutations on N letters. Therefore, Lemma 3.1 gives us n! different representations of Gn . From (2.3) and (3.5) we get U0N −1 = X0N −1 P 0 Gn P −1 (3.16) and therefore U0N −1 P = X0N −1 P 0 Gn . (3.17) N −1 N −1 0 It follows that the P has the same distribution vector U0 P, Y0 N −1 N −1 as U0 , Y0 and therefore Wn(i) ≡ Usπ (i) → Y0N −1 P 0 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1) (3.18) where sπ is the bit significance permutation corresponding to P . Moreover, as P 0 is a permutation matrix, there is a one-to-one correspondence between Y0N −1 P 0 and Y0N −1 and according to Example 3.2 we get for all i ∈ ZN that Wn(i) ≡ Usπ (i) → Y0N −1 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1) . (3.19) Before we state and prove the sufficient condition for stochastic degra(i) dation of the channels Wn , we give an example that shows the basic idea. Assume n = 2. There are only two bit significance permutations for n = 2, namely the identity and the bit-reversal permutation. Clearly, using the identity in (3.19) gives nothing new. However, using the bitreversal permutation implies (0) W2 (1) W2 (2) W2 (3) W2 ≡ U0 → Y03 , (3.20) ≡ U1 → Y03 , U0 , Y03 , U0 , U2 , ≡ U3 → Y03 , U0 , U2 , U1 ≡ U2 → (3.21) (3.22) . (3.23) The equivalences (3.20) and (3.23) are trivial. However, the equivalences (1) (2) (3.21) and (3.22) let us conclude that W2 W2 by (2) (3.24) W2 ≡ U1 → Y03 , U0 , U2 U1 → Y03 , U0 (3.25) (1) ≡ W2 (3.26) 40 A Partial Order for the Synthesized Channels of SCD where we used Example 1.3 for (3.25). In general, we get the following lemma. Lemma 3.2 (Sufficient Condition for Degradation). Let i, j ∈ ZN and let sπ be a bit significance permutation on N letters. If 1. j = sπ (i) 2. Zj ⊂ sπ (Zi ) (j) then Wn (i) is stochastically degraded with respect to Wn . Proof. The proof is a generalization of (3.24) - (3.26) Wn(i) ≡ Uj → Y0N −1 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1) Uj → Y0N −1 , U0 , U1 , . . . , Uj−1 ≡ Wn(j) (3.27) (3.28) (3.29) where we used (3.19) and the condition j = sπ (i) for (3.27) and the condition Zj ⊂ sπ (Zi ) for (3.28). 3.3 A Partial Order Let i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 be the binary representations of i, j ∈ ZN , respectively. Definition 3.5. We write j % i if there exists l, l0 ∈ Zn with l < l0 such that 1. jl = 1 and jl0 = 0 2. il = 0 and il0 = 1 3. For all k ∈ Zn \{l, l0 }: jk = ik Example 3.3. For n = 5, the following table shows that 22 % 28 with l = 1 and l0 = 3. 22 in binary: 28 in binary: 10110 11100 Clearly, if j % i then j < i and wH (j) = wH (i). The following theorem is the basis of the PO. 3.3 A Partial Order 41 (j) Theorem 3.1. If j % i then Wn is stochastically degraded with respect (i) to Wn . Proof. We make use of Lemma 3.2. Let sπ be the bit significance permutation that swaps the bits with index l and l0 , where l and l0 are defined according to Definition 3.5. The first condition of Lemma 3.2 is satisfied by j % i. The second condition is equivalent to sπ (Zj ) ⊂ Zi (3.30) because sπ is its own inverse. Let k0 , k1 , . . . , kn−1 be the binary representation of k. We can write sπ as if kl = kl0 , k, 0 (3.31) sπ (k) = k − 2l + 2l , if kl = 1 and kl0 = 0, l0 l k − 2 + 2 , if kl = 0 and kl0 = 1 0 which is therefore upper bounded by k − 2l + 2l . Hence, for k ∈ Zj , we get sπ (k) ≤ k − 2l + 2l < j − 2l + 2l 0 (3.32) 0 (3.33) =i (3.34) and thus (3.30) is satisfied. (i) The PO 1 on the set Wn , i ∈ ZN , is defined as follows. (j) (i) Definition 3.6 (The Partial Order 1 ). Wn 1 Wn if and only if there exists a finite sequence a0 , a1 , . . . , ak ∈ ZN , k ≥ 0, such that a0 = j, ak = i and am % am+1 for all m ∈ Zk . It can be shown that 1 fulfills Definition 3.2. Theorem 3.1 and the transitivity of imply Wn(j) 1 Wn(i) ⇒ Wn(j) Wn(i) (3.35) and hence 1 orders the channels according to their quality. The PO 1 is a consequence of Lemma 3.2. However, it is not obvious whether the PO 1 fully exploits Lemma 3.2. It could be that there are (j) (i) (j) (i) j, i ∈ ZN such that Lemma 3.2 implies Wn Wn but Wn 1 Wn . The following proposition, whose proof is in the Appendix, shows that this is not the case. 42 A Partial Order for the Synthesized Channels of SCD W BSC(0.11) BEC(0.5) (6) (6) Pe (W4 ) 0.1802 0.2660 (9) Pe (W4 ) 0.2073 0.2338 (9) Table 3.2: Pe (W4 ) and Pe (W4 ) for the binary symmetric channel (BSC) with crossover probability 0.11 and for the binary erasure channel (BEC) with erasure probability 0.5. Proposition 3.1. Let j, i ∈ ZN and let the bit significance permutation (j) sπ be such that both conditions of Lemma 3.2 are satisfied. Then Wn 1 (i) Wn . The covering relation of the PO 1 is given in the following proposition, whose proof is in the Appendix. (j) (i) Proposition 3.2 (Covering Relation of 1 ). Wn →1 Wn if and only if j % i with l0 = l + 1, where l and l0 are defined as in Definition 3.5. (j) (i) Proposition 3.2 implies that Wn can only be covered by a Wn , if the binary expansion of j contains a one with a subsequent zero. As the binary expansion of j has n bits, it contains at most n/2 ones with a subsequent zero. Thus for any j ∈ ZN , there exists at most n/2 (i) (j) indexes i ∈ ZN such that Wn →1 Wn . Moreover, an analogue argument implies that there exists at most n/2 indexes i ∈ ZN such that (i) (j) Wn →1 Wn . Thus the number of covering relations for the PO 1 is O(N log(N )). We give examples of the PO 1 for n = 4, 6. It is enough to specify (i) the covering relation of 1 because the set of channels Wn , i ∈ ZN , is finite and hence the PO 1 is generated by its covering relation. A visual representation of a covering relation is the Hasse-diagram [8, pp. 10-11]. An example of a Hasse-diagram for n = 4 is shown in Figure (i) 3.1. It can be seen that every subset of channels W4 with equal Hamming weight wH (i) is totally ordered [8, p. 11] except for the subset (6) (9) with wH (i) = 2 where no order between W4 and W4 exists. Table (6) (9) 3.2 shows Pe (W4 ) and Pe (W4 ) for two different channels W . We (6) (9) can conclude that there exists no order between W4 and W4 that is independent of the transmitting channel W . (i) The Hasse-diagram of all channels W6 with wH (i) = 3 is shown in Figure 3.2. 3.4 Another Partial Order from the Literature 43 (12) W4 (8) (14) W4 W4 (10) W4 (4) (13) W4 (0) W4 (6) W4 (9) W4 (15) W4 (2) W4 (11) W4 W4 (5) W4 (1) (7) W4 W4 (3) W4 (i) Figure 3.1: The Hasse-diagram of the PO 1 on the set W4 , i ∈ Z16 . 3.4 Another Partial Order from the Literature The following PO, which is independent of the transmitting channel W , was given in [23, Sec. V]. (j) (i) Definition 3.7 (The Partial Order 2 from [23]). Wn 2 Wn if and only if for all k ∈ Zn we have jk = 1 ⇒ ik = 1 where j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 are the binary representations of j and i, respectively. (20) (29) Example 3.4. For n = 5, the following table shows that Wn 2 Wn because if a bit of 20 equals one, the corresponding bit for 29 is never zero. 20 in binary: 29 in binary: 10100 11101 It can be shown that the PO 2 fulfills Definition 3.2. If the transmitting channel W is symmetric then Wn(j) 2 Wn(i) ⇒ Wn(j) Wn(i) (3.36) which is proved in [30, Sec. V]. For a not necessarily symmetric W , we get the following proposition. 44 A Partial Order for the Synthesized Channels of SCD (56) W6 (52) W6 (44) (50) W6 W6 (28) W6 (26) W6 (22) W6 (14) W6 W6 W6 W6 W6 (42) W6 (38) W6 (25) W6 (21) W6 (13) (49) (41) (37) (35) (19) W6 W6 (11) W6 (7) W6 (i) Figure 3.2: The Hasse-diagram of the PO 1 on the set W6 i ∈ Z64 and wH (i) = 3. with 3.4 Another Partial Order from the Literature (j) 45 (i) Proposition 3.3. If Wn 2 Wn then I(Wn(j) ) ≤ I(Wn(i) ), H(Wn(j) ) Pe (Wn(j) ) ≥ ≥ (3.37) H(Wn(i) ), Pe (Wn(i) ). (3.38) (3.39) Proof. Let Ws be the symmetrized channel of W as in [15, Def. 1.3]. As Ws is symmetric, (3.36) and Proposition 1.2 imply that if (j) (i) (Ws )n 2 (Ws )n (3.40) then (j) (i) I((Ws )n ) ≤ I((Ws )n ), (j) H((Ws )n ) (j) Pe ((Ws )n ) ≥ ≥ (3.41) (i) H((Ws )n ), (i) Pe ((Ws )n ). (3.42) (3.43) It can be shown that (k) I((Ws )n ) = I(Wn(k) ), (k) H((Ws )n ) (k) Pe ((Ws )n ) = = (3.44) H(Wn(k) ), Pe (Wn(k) ) (3.45) (3.46) for all k ∈ ZN , which concludes the proof. The covering relation of the PO 2 is given in the following proposition. (j) (i) Proposition 3.4 (Covering Relation of 2 ). Wn →2 Wn if and only (j) (i) if Wn ≺2 Wn and wH (i) = wH (j) + 1. Proof. Let j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 be the binary representations of j and i, respectively. We start by proving the ⇒ part. If (j) (i) (j) (i) Wn →2 Wn then by definition we obtain Wn ≺2 Wn . It follows that jk = 1 ⇒ ik = 1, k ∈ Zn , and that there is at least one bit index o with jo = 0 and io = 1 and hence wH (i) − wH (j) > 0. Suppose wH (i) − wH (j) ≥ 2, i.e. there are at least two bit indexes o, o0 such that the corresponding bits are 0 for j and 1 for i. Let m ∈ ZN be the integer that has the same binary representation as j but with the bit of index (j) (m) (i) o equal to 1. Then Wn ≺2 Wn ≺2 Wn , which is a contradiction to (j) (i) Wn →2 Wn and hence wH (i) − wH (j) = 1 which proves the ⇒ part. 46 A Partial Order for the Synthesized Channels of SCD In order to prove the ⇐ part, we need to prove (j) (i) (j) (m) 1. Wn ≺2 Wn 2. Wn 2 Wn (i) (m) (j) ≺2 Wn ⇒ Wn = Wn where the first condition is true by assumption. To prove the second con(i) (m) (j) dition, assume Wn 2 Wn ≺2 Wn . This implies wH (j) ≤ wH (m) < wH (i) and with wH (i) = wH (j) + 1 we get wH (j) = wH (m). Because (m) (j) (m) (j) additionally Wn 2 Wn we can conclude Wn = Wn . Proposition 3.4 implies that for any j ∈ ZN there are exactly n − (i) (j) wH (j) indexes i ∈ ZN such that Wn →2 Wn and exactly wH (j) (j) (i) indexes i ∈ ZN such that Wn →2 Wn . Thus the number of covering relations for the PO 2 is O(N log(N )). 3.5 The Combined Partial Order We combine 1 with 2 to get the more powerful PO c , which is still independent of the transmitting channel W . (j) (i) Definition 3.8 (The Combined Partial Order c ). Wn c Wn if and only if there exists a finite sequence a0 , a1 , . . . , ak ∈ ZN , k ≥ 0, such that (a ) (a ) (a ) (a ) a0 = j, ak = i and such that Wn m →1 Wn m+1 or Wn m →2 Wn m+1 for all m ∈ Zk . It can be shown that the PO c fulfills Definition 3.2. The following proposition is a direct consequence of (3.35), Proposition 1.2 and Proposition 3.3. (j) (i) Proposition 3.5. If Wn c Wn then I(Wn(j) ) ≤ I(Wn(i) ), (3.47) H(Wn(j) ) ≥ H(Wn(i) ), (3.48) Pe (Wn(j) ) ≥ Pe (Wn(i) ). (3.49) (i) Hence the PO c orders the channels Wn according to their quality. The covering relation of the PO c is given in the following proposition whose proof is in the Appendix. 3.5 The Combined Partial Order 47 (14) W4 (12) W4 (10) W4 (13) W4 (11) W4 (8) W4 W4 (2) W4 W4 (0) (6) W4 (4) W4 W4 (9) W4 (15) W4 (7) W4 (5) (3) (1) W4 (i) Figure 3.3: The Hasse-diagram of the PO c on the set W4 , i ∈ Z16 . (j) (i) Proposition 3.6 (Covering Relation of c ). Wn →c Wn if and only if one of the following two conditions is satisfied (j) (i) (j) (i) 1. Wn →1 Wn 2. Wn →2 Wn with j0 = 0, i0 = 1 where j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 are the binary representations of j and i, respectively. Proposition 3.6 implies that for any j ∈ ZN there are at most n/2 + 1 (j) (i) indexes i ∈ ZN such that Wn →c Wn and at most n/2 + 1 indexes (i) (j) i ∈ ZN such that Wn →c Wn . Thus the number of covering relations for the PO c is O(N log(N )). As an example, the Hasse-diagram of the combined PO c for n = 4 is shown in Figure 3.3. 48 3.6 A Partial Order for the Synthesized Channels of SCD Code Construction with the Combined Partial Order (i) This chapter presented the two POs 1 and c for the channels Wn , i ∈ ZN . We give a few remarks on how the more powerful PO c can be used to simplify code construction of a polar code. Suppose the channels (i) (i) with Pe (Wn ) ≤ γ are used to transmit information. Let Pel (Wn ) and (i) (i) u Pe (Wn ) denote the lower and upper bound on Pe (Wn ) obtained by the method in [36], respectively. Algorithm 2 is a possible code construction algorithm. Algorithm 2 Code Construction with the Combined Partial Order 1: I ← {}, F ← {}. 2: while I ∪ F 6= ZN do 3: Pick an i ∈ ZN \(I ∪ F ) . (i) (i) 4: Determine Pel (Wn ) and Peu (Wn ). (i) 5: if Peu (Wn )n≤ γ then o (i) (j) 6: I ← I ∪ j ∈ ZN \I Wn c Wn 7: else o n (j) (i) 8: F ← F ∪ j ∈ ZN \F Wn c Wn 9: end if 10: end while (i) (i) Note that only the channels Wn , for which Pe (Wn ) ≤ γ is guaranteed, are used to transmit information. Note further that the sets in Line 6 and 8 of Algorithm 2 can be efficiently determined with the covering relations of c . An open question is which i should be picked in Line 3 of Algorithm 2. One possibility is the Maximin strategy which picks an i that maximizes the minimum of n o (3.50) j ∈ ZN \I Wn(i) c Wn(j) and n o j ∈ ZN \F Wn(j) c Wn(i) . (3.51) For example, let γ = 10−10 , n = 20 and let W be the UBAWGNC with SNR = 2 dB. Then Algorithm 2 with the Maximin strategy requires the 3.6 Code Construction with the Combined Partial Order (i) (i) 49 (i) computation of Pel (Wn ) and Peu (Wn ) for 10’859 channels Wn , which is 1.0% of all N channels. This requires the execution of 59’435 times each the degrading method and the upgrading method from [36], which is 2.8% of the 2N − 2 executions needed in [36]. In [1], an upper bound on the block error probability of a Gn -code under SCD was given X P U0N −1 6= Û0N −1 Y0N −1 ≤ Pe (Wn(i) ). (3.52) i∈I (i) (i) We can use Pel (Wn ) and Peu (Wn ) to bound the right hand side of (i) (i) (3.52). If we use the bounds Pel (Wn ) and Peu (Wn ) of all synthesized channels, we get X 1.2 · 10−7 ≤ Pe (Wn(i) ) ≤ 2.2 · 10−7 (3.53) i∈I (i) (i) By using only the 10’859 bounds Pel (Wn ) and Peu (Wn ) computed in Algorithm 2 and the PO c to spread them, we get the bounds X 8.7 · 10−8 ≤ Pe (Wn(i) ) ≤ 5.0 · 10−7 (3.54) i∈I which are not much looser than the bounds in (3.53). To elaborate on that, we define as in [36] the minimal value of the right hand side of (3.52) for any Gn -code of rate R = K/N X PW,n (K) , min Pe (Wn(i) ). (3.55) |I |=K i∈I Figure 3.4 shows the upper and lower bound on PW,n (K) obtained by the method in [36] and the upper and lower bound on PW,n (K) obtained (i) (i) by using only the 10’859 bounds Pel (Wn ) and Peu (Wn ) computed in Algorithm 2 and the PO c to spread them. It can be seen that near the rate R = 0.568, which is the rate of the code constructed by Algorithm (i) 2, tight bounds on PW,n (K) are obtained even though Pel (Wn ) and (i) Peu (Wn ) is computed for only 10’859 channels. However, it is not clear whether a sufficiently efficient implementation for the Maximin strategy exists such that Algorithm 2 is faster than the method in [36]. Note that the relation between the parameter γ and the block error probability is not known. Therefore, to meet a target block error 50 A Partial Order for the Synthesized Channels of SCD Bounds on PW,n (K) 104 10−2 10−8 Upper bound by Algorithm 2 Upper bound by [36] Lower bound by [36] Lower bound by Algorithm 2 10−14 10−20 0.5 0.55 0.6 0.65 0.7 R Figure 3.4: Upper and lower bounds on PW,n (K) obtained by [36] or by Algorithm 2 with the Maximin strategy. The vertical black line indicates the rate of the code that was constructed by Algorithm 2. 3.7 Generalizations 51 U0 U1 U2 U3 G1 G0 G1 G0 X0 X1 X2 X3 Figure 3.5: A generalization of G2 where the building blocks G1 and G0 , I2 are used in the two layers. probability, Algorithm 2 needs to be run several times with different (i) (i) γ. However, the determined Pel (Wn ) and Peu (Wn ) from one run of Algorithm 2 can be used in all subsequent runs. 3.7 Generalizations It is an interesting question if and how the POs from this chapter can be generalized if G1 is replaced by other transforms as in Section 2.7.2. The following example shows that (3.35) does not hold if different building blocks are used in the recursive construction of Gn . Let n = 2 and suppose we use G1 and G0 , I2 for the two layers as in Figure 3.5. The tree factor graph for the channels (1) WG1 ,G0 , U1 → Y03 , U0 , (2) WG1 ,G0 , U2 → Y03 , U0 , U1 (3.56) (3.57) are shown in Figure 3.6. Thus, we can conclude that (1) (0) (1) (1) WG1 ,G0 ≡ W1 , (3.58) (2) WG1 ,G0 (3.59) ≡ (0) W1 . Because I(W1 ) < I(W1 ) holds for nontrivial W [1, Prop. 4], Proposi(1) (2) tion 1.2 implies that for nontrivial W we have WG1 ,G0 WG1 ,G0 . Therefore in this example, (3.35) does not hold. e 1 as the Suppose we use always the same m × m binary matrix G 52 A Partial Order for the Synthesized Channels of SCD U1 W y0 G1 W y2 (a) W y1 U2 G1 W y3 (b) (1) Figure 3.6: (a) The tree factor graph for the channel WG1 ,G0 . (b) The (2) tree factor graph for the channel WG1 ,G0 . building block in the recursion of Figure 2.2 as in [13]. We define N , mn , en , B en G e ⊗n , G 1 N −1 N −1 e U ,X Gn , 0 (3.60) (3.61) (3.62) 0 fn(i) , Ui → Y N −1 , U i−1 W 0 0 (3.63) en is the base-m digit-reversal permutation matrix of dimensions where B N × N . From the definition of the Kronecker product in (1.1), we get h e ⊗n G 1 i = i,j n−1 Yh e1 G i k=0 (3.64) ik ,jk where i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 are the base-m representations of i and j, respectively. With (3.64), we can generalize Lemma 3.1. e n is base-m Significance Invariant). We can write G e n in Lemma 3.3 (G the form en = P 0 G e n P −1 G (3.65) where P is any base-m significance permutation matrix of dimensions en P B e −1 is a permutation matrix. N × N and P 0 = B n 3.7 Generalizations 53 Moreover, Lemma 3.2 can be generalized to the following lemma. Lemma 3.4 (Sufficient Condition for Degradation). Let i, j ∈ ZN and let sπ be a base-m significance permutation on N letters. If 1. j = sπ (i) 2. Zj ⊂ sπ (Zi ) fn(i) . fn(j) is stochastically degraded with respect to W then W Theorem 3.1 can be generalized to the following theorem. Theorem 3.2. For i, j ∈ ZN , let i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 be the base-m representations of i and j, respectively. If there exists l, l0 ∈ Zn with l < l0 such that 1. jl = m − 1 and jl0 = 0 2. il = 0 and il0 = m − 1 3. For all k ∈ Zn \{l, l0 } : jk = ik fn(i) . fn(j) is stochastically degraded with respect to W then W Proof. Let sπ be the base-m significance permutation on N letters that swaps the digits with index l and l0 . It follows that sπ (i) = j. Let k0 , k1 , . . . , kn−1 be the base-m representation of k. We can write sπ as 0 sπ (k) = k + (kl − kl0 )ml + (kl0 − kl )ml 0 = k + (kl − kl0 ) ml − ml and because kl − kl0 ≤ m − 1 = jl − jl0 we get for k ∈ Zj that 0 sπ (k) ≤ k + (jl − jl0 ) ml − ml 0 < j + (jl − jl0 ) ml − ml =i (3.66) (3.67) (3.68) (3.69) (3.70) Therefore sπ (Zj ) ⊂ Zi and since sπ is its own inverse, the conditions of Lemma 3.4 are satisfied. 54 A Partial Order for the Synthesized Channels of SCD The definition of a PO analogue to 1 and the derivation of the covering relation of the PO is left to the interested reader. Note that for m > 2, Theorem 3.2 does not fully exploit Lemma 3.4. For example, if n = 2, m = 3, j = 1, i = 3 and if sπ is the base-m digit-reversal permutation we obtain fn(3) ≡ U1 → Y07 , U0 , U3 , U6 W (3.71) 7 U1 → Y0 , U0 (3.72) (1) f ≡ Wn (3.73) which does not follow from Theorem 3.2. Improving Theorem 3.2 to fully exploit Lemma 3.4 is left to the interested reader. Chapter 4 Successive Cancellation Decoding of Discrete Fourier Transform Codes SCD is conceptually applicable to codes based on any transform not only on Gn . However, it is in general only practical for transforms of small size. This chapter addresses SCD of Discrete Fourier Transform (DFT) codes, which are an important class of error-correcting codes that include the widely-used Reed-Solomon codes. Although SCD in the natural symbol order of the DFT leads to fast polarization, it is computationally feasible only for a DFT of small size. Based on the Cooley-Tukey Fast Fourier Transform (FFT), we found that SCD in the digit-reversed symbol order has significantly lower complexity but comes along with slower polarization. For a specific example with symbol blocklength N = 256, SCD of the DFT-code in digit-reversed order is feasible and empirically the performance is comparable to a standard binary polar code. However, SCD of the DFT-code is much more complex. With the SCLD from [35] and by switching some frozen and information symbols, the performance is substantially improved and is comparable to the well-performing polar code with outer cyclic redundancy check code from [35]. However, the DFT-code still has a much higher decoding complexity. The result that the Cooley-Tukey FFT can be used to reduce the SCD complexity of DFT-codes was independently derived in [28]. 55 56 Successive Cancellation Decoding of DFT-Codes This chapter is organized as follows. DFT-codes are introduced in Section 4.1. In Section 4.2, we briefly discuss preliminaries that are necessary for the subsequent sections. In Section 4.3, the Cooley-Tukey FFT is discussed, which is the basis of the SCD complexity gain. In Section 4.4, SCD of DFT-codes in the natural and the digit-reversed order are discussed and polar DFT-codes are defined. In Section 4.5, an explicit example for the blocklength N = 256 is given and the block error probability under SCLD is presented for three DFT-codes. Finally, Section 4.6 gives conclusions and an outlook. 4.1 Discrete Fourier Transform Codes Let F be a finite field and let α be an element in F of multiplicative order N . The DFT matrix VN (α) is the N × N matrix over F defined by [VN (α)]i,j = αij [4, Def. 6.1.1]. Note that we often omit the dependency of VN (α) on α by simply writing VN . Example 4.1 (V6 ). Let F = Z7 , α = 5 and 1 1 1 1 1 1 5 4 6 2 1 4 2 1 4 V6 = 1 6 1 6 1 1 2 4 1 2 1 3 2 6 4 N = 6 then 1 3 2 . 6 4 5 It can be shown [4, Th. 6.1.2] that VN is invertible with −1 VN i,j = NF−1 α−ij (4.1) (4.2) where NF is the sum of N ones in F . Definition 4.1 (DFT-Codes). The family of DFT-codes contains all codes of the form n o −1 −1 DFT CN (F ) , uN VN uI ∈ F |I | , uF = 0 (4.3) 0 with F ⊂ ZN and I = ZN \F. Analogue to Gn -codes, the elements of uF and uI are called frozen symbols and information symbols, respectively. An important special case of DFT-codes are the Reed-Solomon codes [4, Sec. 6.2]. 4.2 Digit Representation 57 Definition 4.2 (Reed-Solomon Code). A Reed-Solomon code of blockDFT length N and rate R is the DFT-code CN (F ) where F = ZbN (1−R)c . Note that DFT-codes with F being any contiguous block of bN (1 − R)c integers are usually also called Reed-Solomon codes. However, Definition 4.2 is more suitable for this chapter. Reed-Solomon codes are maximum-distance separable [4, Th. 6.2.1], which means that their minimum Hamming distance equals N − K + 1, where K is the number of information symbols, i.e., K = |I|. 4.2 Digit Representation Qk−1 Let N = i=0 Ni with Ni , i ∈ Zk , being positive integers. The following definition is a well-known generalization of the binary representation of integers. Definition 4.3 (Digit Representation). The N0k−1 -representation of an integer l ∈ ZN is defined as the unique sequence l0k−1 ∈ ZN0 × ZN1 × . . . × ZNk−1 (4.4) such that l= k−1 X i=0 li i−1 Y Nj . (4.5) j=0 Note that the N0k−1 -representation of an integer l depends on the order of the numbers in N0k−1 . The digits l0 and lk−1 can be interpreted as the least significant and the most significant digits, respectively. Example 4.2 ((N0 = 3, N1 = 2)-Representation). The (3, 2)-representation of the integers 0, 1, 2, 3, 4, 5 are given by (0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1). Let N ↓ji denote the vector (Nj , Nj−1 , . . . , Ni ), which is empty if i > j. Definition 4.4 (Digit-Reversal Permutation). The digit-reversal permutation bN k−1 is a permutation on N letters which maps l ∈ ZN with 0 N0k−1 -representation l0k−1 to l0 with N ↓k−1 -representation l↓k−1 . The 0 0 permutation matrix of the digit-reversal permutation bN k−1 is denoted 0 by B N0k−1 . 58 Successive Cancellation Decoding of DFT-Codes Example 4.3. The digit-reversal permutation b3,2 is a permutation on 6 letters given by 0 1 2 3 4 5 b3,2 = . (4.6) 0 2 4 1 3 5 Proposition 4.1 (Inverse of the Digit-Reversal). The inverse of the digit-reversal permutation bN k−1 is given by 0 b−1 N0k−1 = bN ↓k−1 . (4.7) 0 Moreover, we have B N0k−1 −1 = B N ↓k−1 . 0 (4.8) Proof. By Definition 4.4, the permutation bN ↓k−1 maps l with N ↓k−1 0 0 k−1 k−1 0 representation equal to l↓0 to the integer l with N0 -representation equal to l0k−1 . Hence (4.9) bN k−1 ◦ bN ↓k−1 (l) = l 0 0 which proves (4.7). Finally ,(4.9) and (1.47) imply (4.8). Qk−1 Proposition 4.2. Let N 0 = j=1 Nj . Then B N0k−1 = IN0 ⊗ B N1k−1 B(N0 , N 0 ) (4.10) k−1 0 = B(N0 , N ) B N1 ⊗ IN0 . (4.11) Proof. Let b, b0 be the permutations of IN0 ⊗ B N1k−1 and B N1k−1 ⊗ IN0 , respectively. With (1.47), equations (4.10) and (4.11) are equivalent to bN k−1 = b ◦ bN0 ,N 0 (4.12) 0 = bN0 ,N 0 ◦ b0 (4.13) Suppose l ∈ ZN has N0k−1 -representation l0k−1 . The N1k−1 , N0 -rep resentation of bN0 ,N 0 (l) is l1k−1 , l0 and therefore the N ↓k−1 -represen0 k−1 tation of (b ◦ bN0 ,N 0 )(l) is l↓0 , which proves (4.12). Moreover, the N0 , N ↓k−1 -representation of b0 (l) is l0 , l↓k−1 and thus the N ↓k−1 1 1 0 representation of (bN0 ,N 0 ◦ b0 )(l) is l↓k−1 , which proves (4.13). 0 4.3 The Cooley-Tukey Fast Fourier Transform 4.3 59 The Cooley-Tukey Fast Fourier Transform Let M, L be positive integers such that N = M L. The Cooley-Tukey −1 FFT [6] is a way of efficiently computing the DFT u0N −1 = xN VN of 0 N −1 x0 . It is based on the recursive application of the decomposition given in [37, Th. 3.4]. However, we use the slightly different decomposition [37, p. 68, eq. (b2)] that is given by VN = B(L, M ) IM ⊗ VL αM B(M, L) ·TM,L IL ⊗ VM αL B(L, M ) (4.14) where TM,L is the N × N diagonal matrix defined by [TM,L ]i,i , αiM iL with (iM , iL ) being the (M, L)-representation of i. Note that this decomposition is valid for VN over any field F even though [37] discusses only the complex field. The following theorem is the key to more efficient SCD of DFT-codes. Theorem 4.1 (Decomposition of a Permuted DFT-matrix). Let N0k−1 Qk−1 be positive integers such that N = i=0 Ni and let N0 , k−1 Y Nj , (4.15) j=1 VeN , B N0k−1 VN (α)B N0k−1 , VeN 0 , B N1k−1 VN 0 αN0 B N1k−1 , −1 TeN0 ,N 0 , B N1k−1 ⊗ IN0 TN0 ,N 0 B N1k−1 ⊗ IN0 (4.16) (4.17) (4.18) then we have 0 VeN = IN0 ⊗ VeN 0 B(N0 , N 0 )TeN0 ,N 0 IN 0 ⊗ VN0 αN . (4.19) Moreover, TeN0 ,N 0 is a N × N diagonal matrix with h TeN0 ,N 0 i =α i0 ·b N k−1 1 ( i0 ) i,i where (i0 , i0 ) is the (N0 , N 0 )-representation of i. (4.20) 60 Successive Cancellation Decoding of DFT-Codes Proof. We first prove (4.19). We get with (4.14) for M = N0 , L = N 0 and (4.16) that VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 ) (4.21) k−1 0 ·TN0 ,N 0 (IN 0 ⊗ VN0 )B(N , N0 )B N0 . (4.22) Propositions 4.2 and 4.1 imply B(N 0 , N0 )B N0k−1 = B N1k−1 ⊗ IN0 , which commutes with IN 0 ⊗ VN0 (see [37, eq. (2.5)]) and therefore VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 ) (4.23) k−1 ·TN0 ,N 0 B N1 ⊗ IN0 (IN 0 ⊗ VN0 ). (4.24) With (4.18) we obtain VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 ) · B N k−1 ⊗ IN TeN ,N 0 (IN 0 ⊗ VN ). 1 0 0 0 (4.25) (4.26) Another use of Proposition 4.2 gives us VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 ) IN0 ⊗ B N1k−1 (4.27) 0 e ·B(N0 , N )TN0 ,N 0 (IN 0 ⊗ VN0 ). (4.28) Propositions 4.2 and 4.1 imply B N0k−1 B(N 0 , N0 ) = IN0 ⊗ B N1k−1 and by applying [37, Th. 2.2] twice, we finally get VeN = IN0 ⊗ B N1k−1 VN 0 B N1k−1 B(N0 , N 0 )TeN0 ,N 0 (IN 0 ⊗ VN0 )(4.29) which proves (4.19). To prove that TeN0 ,N 0 is a diagonal matrix satisfying N −1 e (4.20), let z0N −1 , x0 TN0 ,N 0 and let σ be the permutation correspondk−1 ing to B N1 ⊗ IN0 . By using (4.18) and twice (1.45), we get for all i ∈ ZN that zi = xi [TN0 ,N 0 ]σ(i),σ(i) and hence TeN0 ,N 0 is again a diagonal matrix with h i TeN0 ,N 0 = [TN0 ,N 0 ]σ(i),σ(i) . (4.30) (4.31) i,i If (i0 , i0 ) denotes the (N0 , N 0 )-representation of i, we get that (i0 , bN k−1 (i0 )) is the (N0 , N 0 )-representation of σ(i). Thus the defini1 tion of TN0 ,N 0 implies (4.20). 4.4 Successive Cancellation 4.4 61 Successive Cancellation Let W , X → Y be a channel with input alphabet F and with uniform distribution on the input X. Let the channels Xi → Yi , i ∈ ZN , be independent and identically distributed to W . We define U0N −1 , X0N −1 VN . 4.4.1 (4.32) Partial Distances e N −1 , U N −1 P and let SCD Let P be a N ×N permutation matrix, let U 0 0 e e eN −1 . The partial distance be applied in the symbol order U0 , U1 , . . . , U (i) DN (P ) corresponds to the minimum Hamming distance of the step i ∈ e N −1 and is defined analogously to D(i) in [24] as ZN in SCD of U 0 (i) DN (P ) , min wH 0, 0, . . . , 0, ũiN −1 P −1 VN−1 . (4.33) N −1 | {z } ũ i i times ũi 6=0 In [24] the partial distances were used to measure the asymptotic rate of polarization, however we use it to measure the (non-asymptotic) errorcorrecting abilities of SCD, even though they are only a crude measure for that. 4.4.2 Natural Symbol Order (i) For i ∈ ZN , let WN denote the channel (i) WN , Ui → Y0N −1 , U0i−1 . (i) (4.34) (i) Let DN , DN (IN ), i ∈ ZN , denote the partial distances associated with (i) the natural decoding order U0N −1 . Note that WN is linked to ReedSolomon codes in the sense that in both cases the contiguous block of symbols U0i−1 is known. Thus, as Reed-Solomon codes are maximumdistance separable codes, it is not surprising [24] that (i) DN = i + 1 (4.35) which is the highest possible value for a partial distance of the i-th SCD (i) step. It follows that DN is strictly increasing with index i and hence it (i) is tempting to conjecture that the best channels among WN , i ∈ ZN , are the ones with highest index. It would follow that freezing the symbols 62 Successive Cancellation Decoding of DFT-Codes with lowest index is a reasonable choice which results in a Reed-Solomon code. SCD in the natural order U0N −1 consists of successive MAP decisions (i) of the channels WN , i ∈ I, which in turn needs the probabilities (4.36) p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1 0 0 0 for all ui ∈ F . An obvious way of calculating the in (4.36) N probabilities y −1 is independent of is given in (4.39) where c = p U i−1 | Y N −1 ui−1 0 0 0 0 ui and therefore irrelevant for the decision on Ui . p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1 0 0 0 1 X −1 N −1 = y0 (4.37) p U N −1 | Y N −1 uN 0 0 0 c N −1 ui+1 = 1 X −1 −1 N −1 V N y0 (4.38) p X N −1 | Y N −1 uN 0 0 0 c N −1 ui+1 = N −1 −1 −1 1 X Y p Xi |Yi uN V N i yi 0 c N −1 i=0 (4.39) ui+1 However, we need N −i−1 A = |F | |F | −1 (4.40) additions, N −i M = (N − 1)|F | (4.41) N −i multiplications and |F | applications of VN−1 to compute (4.39) for all ui ∈ F . This is only feasible for all i ∈ ZN if the blocklength N is small. There exists tricks to decrease the SCD complexity, however it is still exponential in the blocklength N and polynomial in |F |. 4.4.3 Digit-Reversed Symbol Order Qk−1 e N −1 , Let N0k−1 be positive integers such that N = i=0 Ni . Let U 0 (i) N −1 k−1 f U0 B N0 be the symbol order of SCD. For i ∈ ZN , let WN denote the channel f (i) , U ei → Y N −1 , U e i−1 W (4.42) 0 0 N 4.4 Successive Cancellation 63 e0 X ũ0 .. . W ỹ0 ũi−1 ei U .. . VeN .. . eN −1 X eN −1 U W ỹN −1 e N −1 , X e N −1 given Figure 4.1: A factor graph for the distribution on U 0 i e i−1 = ũi−1 are known. that Y0N −1 = y0N −1 , U 0 0 where the dependency on α and on the ordered factorization of N into N0 , . . . , Nk−1 is omitted in the notation for the sake of brevity. fn(i) ) Let IePC be the index set of the dN Re channels with largest I(W e e and let FPC , ZN \IPC . Definition 4.5 (Polar DFT-Code). A polar DFT-code of blocklength N and rate R is the DFT-code DFT (4.43) CN bN k−1 FePC . 0 Note that a polar DFT-code depends on α (the chosen element in F of multiplicative order N ), on the ordered factorization of N into N0 , . . . , Nk−1 and on the transmitting channel W . Let VeN be defined as in Theorem 4.1. We can rewrite (4.32) to get e N −1 = X e N −1 VeN U 0 0 (4.44) e N −1 , X N −1 B N k−1 −1 is a permuted version of X N −1 . If where X 0 0 0 0 e i−1 = ũi−1 are observed, then we we assume that Y0N −1 = y0N −1 , U 0 0 get the factor graph in Figure 4.1. By using the decomposition of VeN from (4.19), the relation (4.44) can be represented as shown in Figure 4.2. Suppose we replace the node VeN in Figure 4.1 by the equivalent factor graph in Figure 4.2. Then by using equivalences analogue to those in Figure 2.5, we obtain the equivalent factor graph in Figure 4.3. Since also VeN 0 can be decomposed by (4.19), the steps that were used to obtain the simplified graph in Figure 4.3 can be applied recursively. 64 Successive Cancellation Decoding of DFT-Codes π .. . e0 U .. . VN0 π .. . .. . .. . π .. . ekN U 0 VN0 π .. . .. . VeN 0 .. . π .. . e 0 U (N −1)N0 VN0 .. . e(j+1)N 0 −1 X .. . eN −1 U ejN 0 X .. . π e(k+1)N −1 U 0 .. . eN 0 −1 X .. . .. . VeN 0 π eN −1 U 0 .. . e0 X .. . e(N −1)N 0 X 0 .. . π .. . .. . VeN 0 π .. . eN −1 X VeN Figure 4.2: The figure shows the relation (4.44), where VeN is decomposed according to (4.19). Note that in total there are N 0 copies of VN0 and N0 copies of VeN 0 . The permutation nodes correspond to the multiplication by the entries of the diagonal matrix TeN0 ,N 0 from Theorem 4.1. More explicitly, if the permutation nodes are numbered from top to bottom starth i e ing by zero, then the permutation σi (x) , x TN0 ,N 0 i,i corresponds to the permutation node i. 4.4 Successive Cancellation 65 W ỹ0 . .. VeN 0 .. . .. . W ỹN 0 −1 .. . ei U . .. .. . π .. . VN0 π .. . π W ỹjN 0 . .. VeN 0 .. . .. . W ỹ(j+1)N 0 −1 .. . W ỹ(N0 −1)N 0 . .. VeN 0 .. . .. . W ỹN −1 Figure 4.3: This factor graph and the factor graph in Figure 4.1 are Ui -equivalent which follows from Figures 1.4e and 1.4f by noting that VN0 is a bijective transform and thus a permutation. 66 Successive Cancellation Decoding of DFT-Codes V3 e5 U V2 π V3 V2 W y0 W y6 V2 W y2 W y8 V2 W y4 W y10 V2 W y1 W y7 V2 W y3 W y9 V2 W y5 W y11 (5) f Figure 4.4: The tree factor graph for W 12 with the ordered factorization N = 2 · 3 · 2. The permutation nodes whose corresponding permutation is the identity are omitted. The only remaining permutation node corresponds to the permutation σ(x) , xα4 . f (i) . An example This leads to the tree factor graph for the channel W N for N = 12, i = 5 with the ordered factorization N = 2 · 3 · 2 is shown in Figure 4.4. Hence, SCD in the digit-reversed order consists of SCD of the transforms VNi , i ∈ Zk . Note that the SCD complexity depends exponentially on the transform sizes (see Section 4.4.2) which decreased from N to N0 , N1 , . . . , Nk−1 . Thus the complexity of SCD in the digite N −1 is substantially lower as compared to the natural reversed order U 0 symbol order. To reduce the complexity the most, the blocklength N should be factorized into prime numbers. Note that the Good-Thomas FFT [37, Ch. 5] could also be used to obtain a similar decomposition of 4.4 Successive Cancellation 67 VN as in Theorem 4.1. However, the factors N0 , . . . , Nk−1 would need to be relatively prime, which results in general in larger transforms and therefore in a higher SCD complexity. e (i) , D(i) B N k−1 , i ∈ ZN , denote the partial distances Let D 0 N N e N −1 . The following proposition, whose associated with the SCD order U 0 e (i) . proof is in the Appendix, specifies D N Qk−1 0 Proposition 4.3. Let N = j=1 Nj and let (i0 , i0 ) be the (N0 , N 0 )representation of i ∈ ZN . Then 0 e (i) = (i0 + 1)D e (i 0) D N N (4.45) e (j)0 , j ∈ ZN 0 , are the partial distances of SCD of X N 0 −1 VeN 0 . where D 0 N By using Proposition 4.3 recursively, we get that e (i) = D N k−1 Y (ij + 1) (4.46) j=0 where ik−1 is the N0k−1 -representation of i ∈ ZN . 0 Proposition 4.4. Let ik−1 be the N0k−1 -representation of i ∈ ZN , then 0 k−1 Y (ij + 1) ≤ i + 1. (4.47) j=0 Proof. We prove by induction that k−1 Y (ij + 1) = 1 + k−1 X it t=0 j=0 t−1 Y (ij + 1). (4.48) j=0 Clearly, (4.48) is true for k = 1. Assume (4.48) is true for k = M , then for k = M + 1 we have M Y (ij + 1) = iM M −1 Y j=0 (ij + 1) + j=0 = iM M −1 Y M −1 Y (ij + 1) + 1 + M X t=0 M −1 X t=0 j=0 =1+ (ij + 1) (4.49) j=0 it t−1 Y j=0 (ij + 1) it t−1 Y (ij + 1) (4.50) j=0 (4.51) 68 Successive Cancellation Decoding of DFT-Codes where we used the induction hypothesis (4.48) for k = M to justify (4.50). This concludes the induction step and thus (4.48) is true. By the upper bound ij + 1 ≤ Nj for all j ∈ Zk and by (4.5), we get (4.47). Therefore, Proposition 4.4 implies that the partial distances corresponding to the digit-reversed order are smaller or equal to the partial distances corresponding to the natural symbol order. This suggests that SCD in the natural order polarizes faster than SCD in digit-reversed order. Thus, we pay a prize for having a reduction in SCD complexity. An extreme example is N = 256 with factorization N = 28 , where for i = 128 we have (i) DN = 129, e (i) = 2. D N 4.5 (4.52) (4.53) An Example for the Blocklength 256 For DFT-codes, the alphabet size |F | has to be larger than the blocklength N (otherwise no element of multiplicative order N would exist) and thus is already large. Moreover, as SCD in digit-reversed order still needs SCD of the transforms VNi , i ∈ Zk , the discussion on SCD complexity in Section 4.4.2 strongly suggests that Ni should be as small as possible. Thus, a reasonable choice is to choose N as a power of two. However, as a DFT-code also needs a finite field with an element of multiplicative order N , the choice is restricted. Let N = 256 and let F be the finite field with 257 elements, denoted by GF(257). There exists an element in F of multiplicative order N . The prime factorization of 256 is 28 and hence all remaining transforms are of dimensions 2 × 2. SCD of the 2 × 2 transform V2 with an alphabet size of 257 is computationally demanding but feasible. We present simulation results of SCLD for the transmitting channel W , X → Y defined as follows. Let X be uniformly distributed over GF(257) and let X0 , X1 , . . . , X8 be the binary representation of X. We transmit X0 , X1 , . . . , X7 over eight uses of a BAWGNC and we denote by Y the vector containing the eight outputs. Note that we did not use the bit X8 and hence X = 0 and X = 256 are mapped to the same transmitted sequence X0 , X1 , . . . , X7 . This implies that the binary inputs to the eight BAWGNC are not exactly uniformly and independently distributed, which makes W worse than eight uses of UBAWGNC. However, for example for a SNR of 2 dB, we get I(W ) = 5.136 bits, which 4.5 An Example for the Blocklength 256 69 1 (i) f ) I(W 256 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 i Figure 4.5: The mutual information (with logarithm to the base 257) of f (i) , i ∈ ZN , with blocklength factorization the channels W N N = 28 , α = 3 and SNR = 2 dB. is almost equal to the mutual information of eight uses of UBAWGNC which is 5.137 bits. We use the procedure from Section 1.4.1 to obtain a lower bound on the block error probability of the block MAP decoder and refer to this bound as the MAP bound. 4.5.1 Polar DFT-Code (i) f ), i ∈ ZN , for a SNR of 2 dB. It can be seen that Figure 4.5 shows I(W N many channels are very close to a noiseless or useless channel and hence are polarized. Figure 4.6 shows the same as Figure 4.5 but the order of the channels is changed such that it corresponds to the natural symbol order U0N −1 . We see that this polar DFT-code is far away from being a Reed-Solomon code because there exists no long consecutive block of bad channels. Figure 4.7 shows the block error probability for the rate R = 1/2 polar DFT-code under SCLD. For high enough SNR, block MAP per- 70 Successive Cancellation Decoding of DFT-Codes 1 (i) f ) I(W 256 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 0 i Figure 4.6: The same as shown in Figure 4.5 but the x-coordinate is permuted such that it corresponds to the natural symbol order U0N −1 , i.e., i equals i0 bit-reversed. 4.5 An Example for the Blocklength 256 71 Block error probability 100 L=1 L=2 L=4 L=8 L = 16 L = 32 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Figure 4.7: The block error probability of the rate R = 1/2 polar DFTcode with blocklength factorization N = 28 , α = 3 under SCLD with list size L. Code construction was carried out for SNR = 2 dB which corresponds to using the inputs to the channels with highest mutual information in Figure 4.5 as information symbols. formance is achieved with a moderate list size L. In Figure 4.8a, the block error probability of our rate R = 1/2 polar DFT-code is compared to a polar code with similar blocklength and rate. Both codes show similar performance with an advantage for polar codes at high SNR. Note that this disappointing result might be attributed to the “low” partial distances of our polar DFT-code. Let us compare the complexity of doing SCLD for both codes by counting the number of real multiplications needed on average per block. For the polar DFT-code, to perform SCD for a 2 × 2 kernel, according to (4.41) we need 2572 real multiplications for i = 0 and 257 for i = 1. For a polar code, SCD for the 2 × 2 binary kernel G1 needs 4 real multiplications for i = 0 and 2 for i = 1. The results are shown in Figure 4.8b. It can be seen that the complexity of SCLD of the polar DFT-code is higher by a factor of about 103 . Note that we can use the FFT trick from [20, Sec. 3.3] to lower the message 72 Successive Cancellation Decoding of DFT-Codes passing complexity of SCD of the polar DFT-code. However, it is not easy to get a clear and meaningful complexity measure in that case. 4.5.2 Improved Polar DFT-Code ei Suppose we attribute every block error in Figure 4.7 to the symbol U with lowest index i whose final decision was wrong. Then by increasing e224 . If we the list size L, the errors start to concentrate on the symbol U e224 and use the input to the next best frozen channel as additional freeze U e168 ), then we get a DFT-code of rate R = 1/2 information symbol (i.e. U which we call the improved polar DFT-code. Figure 4.9 shows the block error probability for the improved polar DFT-code under SCLD. Compared to the polar DFT-code the performance is substantially improved without altering the decoding complexity. 4.5.3 Doubly Improved Polar DFT-Code We can repeat this procedure to obtain the doubly improved polar DFTe102 and using code. This results in additionally freezing the symbol U e U208 as an additional information symbol. Figure 4.10 shows the block error probability for the doubly improved polar DFT-code under SCLD. Compared to the improved polar DFT-code the performance gain is not very pronounced. The performance for L = 32 is similar to polar codes with an outer cyclic redundancy check code from [35], however Figure 4.8b shows that our polar DFT-code needs a much higher decoding complexity. 4.6 Conclusion and Outlook We investigated SCD of DFT-codes. Based on the Cooley-Tukey FFT, we found a symbol order that substantially decreased the SCD complexity compared to the natural symbol order. Although this complexity is still impractically high for many parameters, there are DFT-codes with reasonable blocklength such that SCD is feasible. An example is the polar DFT-code of blocklength N = 256 considered in this chapter. However, the SCD complexity of this code is still much higher than the one for standard polar codes of similar blocklength and rate without having a performance advantage (at least not for the considered transmitting channel). There are two culprits for the high SCD complexity of DFT-codes in the digit-reversed order: 4.6 Conclusion and Outlook 73 Block error probability 100 polar DFT-code polar code 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of multiplications per block (a) 1010 polar DFT-code polar code 109 108 107 106 105 104 1 2 4 8 16 32 L (b) Figure 4.8: (a) The block error probability under SCLD with list size L = 32 for the polar DFT-code used in Figure 4.7 compared to a polar code of rate R = 1/2 and blocklength N = 2048 bits. (b) The number of real multiplications used on average to decode one block. 74 Successive Cancellation Decoding of DFT-Codes Block error probability 100 L=1 L=2 L=4 L=8 L = 16 L = 32 MAP bound 10−1 10−2 10−3 10−4 10−5 1 1.5 2 2.5 3 SNR (dB) Figure 4.9: The block error probability of the improved polar DFTcode of rate R = 1/2 under SCLD. 4.6 Conclusion and Outlook 75 Block error probability 100 L=1 L=2 L=4 L=8 L = 16 L = 32 MAP bound 10−1 10−2 10−3 10−4 10−5 1 1.5 2 2.5 3 SNR (dB) Figure 4.10: The block error probability of the doubly improved polar DFT-code of rate R = 1/2 under SCLD. 76 Successive Cancellation Decoding of DFT-Codes 1. the transform sizes N0 , N1 , . . . , Nk−1 can not be chosen freely, 2. the large alphabet size. For the Cooley-Tukey FFT to be applicable, the kernel sizes need to factorize the blocklength N and can thus not be chosen arbitrary. The best we can do is to factorize N into prime numbers. Moreover, the alphabet size has to be larger than the blocklength because we need an element in the field of multiplicative order N to construct a DFT-code. Hence the alphabet size is necessarily large. One way of decreasing the high SCD complexity of DFT-codes is to use an approximate SCD for the transforms of size N0 , N1 , . . . , Nk−1 . In general, all decoders that are applicable to decode Reed-Solomon codes can be used to do SCD of the transforms of size N0 , N1 , . . . , Nk−1 . For example, it was proposed in [28] to use the Berlekamp-Massey algorithm instead of exact SCD. Alternatively, one could try to approximate the messages such that the storing and SPMP complexity is reduced without deteriorating the performance too much. Another direction for future research is to use the decoder from Chapter 5 for DFT-codes. More details are given in Section 5.8. Finally, we give another starting point to reduce the computational complexity caused by the high alphabet size. To that end, we consider an idea from BCH codes [4, Sec. 6.5]. Let Fs be a subfield of F . A BCH code is a Reed-Solomon code but with the additional requirement that the codewords need to have all elements in the subfield Fs . This is translated to our setting by defining the input distribution of the channel W = X → Y to be ( 1 , if x ∈ Fs , pX (x) = |Fs | (4.54) 0, otherwise. We can generalize the discussion by allowing Fs to be any subset of F . Let the channels Xi → Yi , i ∈ ZN , be independent and identically distributed to W . We define U0N −1 , X0N −1 VN . (4.55) Clearly, the transmitting channel W is not a uniform input channel. But this does not change the structure of the factor graph for the distribution U0N −1 , X0N −1 , Y0N −1 and therefore the factor graph for SCD in the digitreversed order still decomposes into transforms of sizes N0 , N1 , . . . , Nk−1 . The objective of restricting the input of X to values in Fs is to decrease 4.6 Conclusion and Outlook 77 the complexity of SCD. This can be achieved by storing and processing only those elements of a message that are nonzero. However, only X0N −1 is restricted to take values in Fs and not necessarily the variables connecting the different transforms of sizes N0 , N1 , . . . , Nk−1 . Hence, it is not obvious how much decoding complexity can be saved. Note that if the input distribution of W is not uniform, then the random variables U0N −1 are not independent anymore. For example, if Fs is a subfield of F , the dependencies between the symbols U0N −1 are given in [4, Th. 6.3.1]. These dependencies imply that not all symbols U0N −1 can be used to transmit log2 (F ) bits of information. The case where the input distribution is not uniform was addressed in [10] for polar codes. Chapter 5 A Generalized Successive Cancellation Decoder with Look-Ahead In this chapter, we describe the fat-tree decoder (FTD) which is a generalization of the SCD. The FTD is applicable to any Gn -code and also to codes based on a transform with a similar structure as Gn . The FTD does message passing on a cycle-free factor graph of Gn that is structured like a binary tree. Thus, it computes exact (up to numerical imprecisions) posterior probabilities. The FTD is able to decode the bits u0N −1 in any desired order. Moreover, it has the ability to take into account N −1 the values of any frozen bits even from the future bits ui+1 . However, to keep the complexity manageable, only a limited number of frozen bits can be considered and the bit decoding order is not completely arbitrary. We give two FTD strategies with a decoding complexity that is adjustable by the parameter Dmax . The time complexity of the FTD with 2Dmax either of the two strategies is O 2 N log(N ) and the space com plexity is O 2Dmax N . Simulations show that both strategies perform significantly better than the SCD already for small Dmax . This chapter is organized as follows. Two cycle-free factor graphs of Gn are derived in Section 5.2 that are both necessary to describe the FTD. Section 5.3 describes the FTD in full generality. Section 5.4 gives examples of the decoding process of a FTD and shows that SCD, block MAP and SCD with different bit order are special cases of the 79 80 A Generalized SCD with Look-Ahead FTD. Section 5.5 discusses ideas for getting good FTD strategies. In Section 5.6, two specific FTD strategies are introduced. In Section 5.7, we present simulation results for the FTD with the two strategies from Section 5.6 and we compare them to the SCLD. Finally, Section 5.8 gives conclusions and an outlook. 5.1 Abbreviations and Symbols For future reference, we summarize all important abbreviations and symbols that are introduced in this chapter. Abbreviations FTD FV TV FTFG RG K N D MKBS MKBLIS fat-tree decoder front view top view fat-tree factor graph reduced graph known noise decode most known bits strategy most known bits of lowest index strategy Symbols Se S0N −1 K N D µ(uD ) Le KU KI li Te the state of the super-edge e in the FV of the FTFG states of the edges corresponding to U0N −1 the set {i ∈ ZN |Si = K} the set {i ∈ ZN |Si = N} the set {i ∈ ZN |Si = D} message along the edge UD in the TV of the RG index set of all U0N −1 that belong to the subtree of the edge e in the FV of the FTFG index set of all known and incorporated U0N −1 index set of all known and ignored U0N −1 log-likelihood ratio of Ui the score of the super-edge e in the FV of the FTFG 5.2 Cycle-Free Factor Graphs for the Polar Transform 5.2 81 Cycle-Free Factor Graphs for Gn In this section, we derive two cycle-free factor graphs for Gn , the socalled front view (FV) and top view (TV). Although message passing is performed on the TV, the FV is necessary to describe the FTD. Suppose we superpose both copies of Gn−1 in Figure 2.2 and bundle superposed edges or nodes to one super-edge or one super-node as shown in Figure 5.1a. The same graph is shown from the top in Figure 5.1b. Both views in Figure 5.1 are cycle-free graphs. We can use this idea again to decompose both copies of Gn−1 in Figure 5.1a according to the recursion in Figure 2.2 and superpose the resulting four copies of Gn−2 . The result is again a graph with a cycle-free FV and TV. We can repeat this procedure until we are left with a factor graph that has a cycle-free FV and TV consisting only of G1 nodes. An example for N = 8 is shown in Figure 5.2. We refer to both views of such a factor graph as the fat-tree factor graph (FTFG) of Gn . Although both views of the FTFG are cycle-free graphs and thus exact posterior probabilities could be obtained by SPMP, the SPMP complexity is impractical for reasonable blocklengths. Note that a similar idea was used in [9, Sec. VI.D] to get cycle-free factor graphs for Reed-Muller codes. Moreover, the FTFG is a particular example of a junction tree [17, Sec. 2.2.3]. We end this section by introducing some terms. We call a super-edge that bundles i edges an i-edge, and a super-node that bundles i copies of the node G an i-node of G. The super-edges in both the FV and TV of the FTFG form a binary tree. Hence the terms root, subtree, children, sibling, descendant and ancestor are defined as in a binary tree. 5.3 Fat-Tree Decoder This section is organized as follows. First in Section 5.3.1, states are assigned to all super-edges in the FV of the FTFG. Then in Section 5.3.2, based on the states, a reduced graph is derived in order to decrease the message passing complexity. Message passing on the TV of the reduced graph is explained in Section 5.3.3. Finally in Section 5.3.4, −1 the decoding process of one block uN is described which consists of 0 alternately performing message passing and updating states according to the messages and a strategy. 82 A Generalized SCD with Look-Ahead U0 G1 U1 .. . 2 U2i G1 U2i+1 2 .. . G1 UN −1 .. . Gn−1 .. . UN −2 X0 , XN/2 2 .. . 2 2 XN/2−1 , XN −1 2 (a) Front view Gn−1 U0N −1 N/2−1 N 2 X0 N 2 N G1 N 2 N 2 Gn−1 N 2 N −1 XN/2 (b) Top view Figure 5.1: The graph of Figure 2.2 but with both copies of Gn−1 superposed. An integer i below an edge or a node indicates that i copies are bundled, i.e., the alphabet size of such an edge is 2i . 5.3 Fat-Tree Decoder U0 83 G1 U1 2 U2 G1 U3 G1 2 4 2 U4 G1 U5 2 G1 G1 4 8 X07 4 2 U6 G1 U7 2 (a) Front view G1 X0 X1 2 G1 U07 2 4 8 X2 X3 2 G1 4 G1 4 G1 2 G1 X4 X5 2 2 G1 X6 X7 (b) Top view Figure 5.2: Two cycle-free factor graphs of G8 . 84 A Generalized SCD with Look-Ahead 5.3.1 States For each super-edge in the FV of the FTFG, we assign a state equal to one of the following three possibilities 1. KNOWN (K) All bits of the super-edge are assumed to be known. 2. NOISE (N) All bits of the super-edge are assumed to be unknown. 3. DECODE (D) All bits of the super-edge are assumed to be unknown. We perform message passing for that super-edge. For i ∈ ZN , let Si denote the state of the edge corresponding to Ui and let K , {i ∈ ZN |Si = K}, (5.1) N , {i ∈ ZN |Si = N}, (5.2) D , {i ∈ ZN |Si = D}. (5.3) The state Se of any other super-edge e is determined by the states Su and Sd of the two children edges u, d of e in the FV of the FTFG through D, if Su = D or Sd = D, (5.4) Se , K, if Su = K and Sd = K, N, otherwise. Hence all states are determined by S0N −1 . An example is shown in Figure 5.3a. Additionally, bit values are assigned to every super-edge with state equal to K. The bit value of a known Ui is its frozen value if i ∈ F or the decision on Ui if Ui is already decoded. For any super-edge e in state K that does not correspond to an Ui , the children edges of e are also in state K. Then the bit values of e are obtained by applying G1 to the bit values of the children of e according to the FTFG. 5.3.2 Reduced Graph We remove all edges from the FTFG with state not equal to D and we remove all G1 -nodes that have no directly connected edge of state D. The remaining graph is called the reduced graph (RG). An example is shown in Figure 5.3. The rule (5.4) implies that the FV of the RG is a connected graph. Note that the structure of the TV of the RG did 5.3 Fat-Tree Decoder U0 U1 U2 U3 85 K N G1 2 D D G1 2 N D G1 4 2 U4 U5 K K K G1 2 D G1 4 D D 7 D G1 8 X0 4 2 U6 U7 D N G1 2 (a) Front view G1 X0 X1 2 G1 U2 U3 U6 2 2 3 X2 X3 2 G1 2 G1 2 G1 2 G1 X4 X5 2 2 G1 X6 X7 (b) Top view Figure 5.3: The FTFG for G8 with states. (a) The FV with states satisfying (5.4). The RG is shown in bold lines. (b) The TV of the RG. The arrow indicates the message we are finally interested in. 86 A Generalized SCD with Look-Ahead µ(s) j G1 i i µt (t) µb (b) i Figure 5.4: An i-node of G1 . We denote by µ(s), s ∈ Zj2 , the message along the j-edge in the direction of the arrow. We denote by µt (t), t ∈ Zi2 , and µb (b), b ∈ Zi2 , the messages along the top and bottom i-edge, respectively, in the direction of the arrows. not change compared to the FTFG but the number of edges or nodes in one bundle decreased (for example compare Figure 5.2b to Figure 5.3b). Therefore, the alphabet size decreased as well and this is one reason why message passing on the TV of the RG can be feasible even for reasonable blocklengths. 5.3.3 Message Passing Let the messages along the edges corresponding to X0N −1 come from the transmitting channel W as usual. From these messages, we want to compute the message corresponding to the U s in the TV of the RG. For example in Figure 5.3b, this corresponds to the message in the direction of the arrow. To derive this message, we need to compute for every superedge in the TV of the RG only the message in the direction towards the U s. Note that all nodes in the TV of the RG are of the same kind as the node in Figure 5.4 and thus it is enough to discuss message passing through such a node. Message Passing Through the Node in Figure 5.4 The j-edge in Figure 5.4 was a 2i-edge before reduction. Let T02i−1 be the states of these 2i edges inherited from the FV of the FTFG and let us define TK , {k ∈ Z2i |Tk = K}, (5.5) TN , {k ∈ Z2i |Tk = N}, (5.6) TD , {k ∈ Z2i |Tk = D}. (5.7) The j-edge in Figure 5.4 contains only those edges with state Tk = D and therefore j = |TD |. Let the two edges with states T2l and T2l+1 5.3 Fat-Tree Decoder 87 belong to the l-th copy of G1 in Figure 5.4. By the construction of the RG, at least one of T2l and T2l+1 equals D (otherwise the l-th copy of G1 would not be in the RG) and hence we get |TD | ≥ i. (5.8) T02i−1 . Let c ∈ Z2i The 2 represent the 2i bits of the 2i edges with states FTFG implies that the bits c are in one-to-one correspondence with t and b of Figure 5.4 according to (c2k , c2k+1 ) = (tk , bk )G1 , k ∈ Zi . (5.9) As the bits cTN are assumed to be unknown, we sum over all possibilities |T | cTN ∈ Z2 N . As the bits cTK are assumed to be known, we plug in the known values. We define message passing through the node in Figure 5.4 by X µ(cTD ) , µt (t)µb (b) (5.10) |T | cTN ∈Z2 N where the known bit values are plugged in for cTK and t, b are determined by the one-to-one correspondence between c and t, b in (5.9). By replacing the summation in (5.10) by a maximization, we obtain an alternative message passing rule given by µ(cTD ) , max |T | cTN ∈Z2 N µt (t)µb (b). (5.11) Complexity The summation or maximization in (5.10) or (5.11), respectively, goes over 2|TN | elements. Therefore, to compute µ(cTD ) for all cTD we need M , 2|TD |+|TN | (5.12) A , 2|TD | 2|TN | − 1 (5.13) real multiplications and real additions or maximizations, respectively. Let us lower and upper bound M and A. As |TN | ≥ 0, we get M ≥ 2|TD | , A≥0 (5.14) (5.15) 88 A Generalized SCD with Look-Ahead Although the bound on A is trivial, it is the best one can obtain because A = 0 can occur. On the other hand, with |TK | + |TN | + |TD | = 2i and (5.8) we get |TN | ≤ |TD | and hence M ≤ 22|TD | A ≤ 2|TD | 2|TD | − 1 ≤ 22|TD | (5.16) (5.17) As a result, increasing |TD | by one increases the number of multiplications or additions by a factor of two in the best case and a factor of four in the worst case. Recall the definition of the set D in (5.3). For all i-nodes of G1 in the TV of the RG, Equation (5.8) is equivalent to j ≥ i and thus the number of edges in any super-edge is upper bounded by |D| (see for example Figure 5.3b where |D| = 3). Combining this with (5.16), we get that M ≤ 22|D| , (5.18) A ≤ 22|D| . (5.19) This suggests that D should be a “small” set and if so the message passing complexity through the node in Figure 5.4 is manageable. Message Interpretation Let µ(uD ) denote the message along the edge corresponding to UD in the TV of the RG obtained by using one of the two message passing rules (5.10) or (5.11). In this paragraph, we specify what the rules (5.10) and (5.11) and the message µ(uD ) represent. We call an edge adjacent to the RG if it does not belong to the RG but is directly connected to it. If we add to the RG 1. all adjacent edges, 2. an observation node (see Table 1.1) with value equal to the assigned bit value for every adjacent edge with state K, 3. a channel node (see Table 1.1) with observed value yi to every edge corresponding to a Xi then the resulting graph is called the decoding factor graph. Figure 5.5 shows the FV of the decoding factor graph corresponding to the RG in Figure 5.3. It is not very difficult to see that the two message passing rules (5.10) and (5.11) correspond to SPMP (1.28) and MPMP (1.29) 5.3 Fat-Tree Decoder U2 U3 89 G1 G1 2 4 2 G1 G1 4 4 X07 8 W y07 8 2 U6 G1 2 Figure 5.5: The FV of the decoding factor graph corresponding to the RG in Figure 5.3. in the TV of the decoding factor graph, respectively. Hence, we refer to (5.10) as SPMP and to (5.11) as MPMP. Consider the factor graph in Figure 5.6. It represents the distribution 7 y0 , û4 , û5 p u30 , u76 , sm−1 0 (5.20) where sm−1 are the variables of all edges in Figure 5.6 not correspond0 ing to U03 or U67 . By using Figures 2.5b and 2.5c, we see that the factor graphs in Figures 5.5 and 5.6 are (U2 , U3 , U6 )-equivalent. By Proposition 1.4, both factor graphs represent the same distribution on U2 , U3 and U6 , namely p u2 , u3 , u6 y07 , û4 , û5 . Therefore (1.30) implies that if µ(u2 , u3 , u6 ) was obtained by SPMP then µ(u2 , u3 , u6 ) ∝ p u2 , u3 , u6 y07 , û4 , û5 (5.21) where we used that the message in the opposite direction in (1.30) is constant because the (u2 , u3 , u6 )-edge is open. Note that the known value û0 for U0 is ignored in (5.21), however the values û4 , û5 are used. The same arguments used to obtain (5.21) can be used to prove the following theorem which is the key result of this section. 90 A Generalized SCD with Look-Ahead U0 G1 U1 2 U2 G1 U3 G1 2 4 2 û4 û5 G1 G1 G1 2 4 X07 8 4 W y07 8 2 U6 U7 G1 2 Figure 5.6: A factor graph that is equivalent to the factor graph in Figure 5.5. Theorem 5.1. Let Le be the index set of all bits U0N −1 that belong to the subtree of the edge e in the FV of the FTFG. Let Ai denote the set of all super-edges in the FV of the FTFG that are adjacent to some edge in the path from Ui to the root of the FTFG. Let [ [ KU , Le , (5.22) i∈D e∈Ai Se =K c KU , ZN \KU , KI , K\KU . (5.23) (5.24) If SPMP was used, then µ(uD ) ∝ p UD | Y N −1 ,UK 0 U uD y0N −1 , ûKU . (5.25) If MPMP was used, then µ(uD ) ∝ max p U uN ,uKI Kc U N −1 N −1 Y0 ,UKU uKUc y0 , ûKU . (5.26) Proof. Note that the set of super-edges ∪i∈D {e ∈ Ai |Se = K} is exactly the set of all super-edges of state K that are adjacent to the RG. We obtain (5.25) by the same argumentation as in the preceding example. 5.3 Fat-Tree Decoder 91 We prove (5.26). The probability distribution on UD in the decoding factor graph is given by the right hand side of (5.25) and thus the decoding factor graph represents the distribution −1 m−1 N −1 y0 , ûKU p uD , xN , s0 (5.27) 0 where sm−1 denotes the variables of all edges not corresponding to UD 0 or X0N −1 . If MPMP is used, then we have by (1.32) that −1 m−1 N −1 , s0 y0 , ûKU (5.28) µ(uD ) ∝ max p uD , xN 0 −1 m−1 ,s0 xN 0 which is the same as µ(uD ) ∝ max X x0N −1 ,s0m−1 N −1 y p uKUc , x0N −1 , sm−1 , ûKU . (5.29) 0 0 uN ,uKI We can use Proposition 1.6 to transform the sum over uN , uKI into a maximization, because uN , uKI is determined by x0N −1 , and obtain N −1 y p uKUc , x0N −1 , sm−1 , ûKU . (5.30) µ(uD ) ∝ max 0 0 uN ,uKI ,x0N −1 ,s0m−1 Moreover, we can transform the maximization over x0N −1 , sm−1 into a 0 −1 m−1 N −1 sum because xN , s is determined by u , therefore we get 0 0 0 X N −1 p uKUc , x0 , s0m−1 y0N −1 , ûKU (5.31) µ(uD ) ∝ max uN ,uKI x0N −1 ,s0m−1 which is the same as the right hand side of (5.26). It is important to note that the set KU depends on the sets D and K and that we can use the values of all known bits, i.e. we can achieve KU = K, by choosing the set D accordingly. For example, the known value for U0 in Figure 5.3 could be used if additionally S1 is set to D. For SPMP, we can compute for every bit Ui , i ∈ D, the log-likelihood ratio p Ui | Y N −1 ,UK 0 y0N −1 , ûKU 0 U li , log (5.32) p Ui | Y N −1 ,UK 1 y0N −1 , ûKU 0 U by P ,ui =0 uD\{i} P li = log uD\{i} ,ui =1 µ(uD ) µ(uD ) . (5.33) 92 A Generalized SCD with Look-Ahead For MPMP, we can compute for every bit Ui , i ∈ D, the max-product log-likelihood ratio li , log max pU max pU N −1 N −1 Y0 ,UKU uKUc y0 , ûKU N −1 (5.34) N −1 c y u , û K K U Y0 ,UKU 0 U max µ(uD ) max µ(uD ) uKc \{i} ,ui =0 U uKc U \{i} ,ui =1 Kc U Kc U by li = log uD\{i} ,ui =0 uD\{i} ,ui =1 . (5.35) Decisions can be made according to ( 0, if li ≥ 0, ûi , 1, otherwise. (5.36) If ûKU are correct decisions for UKU then for SPMP, (5.36) is a MAP decision of the channel Ui → Y0N −1 , UKU (5.37) whereas for MPMP, (5.36) is the decision on U i of the symbol-wise block MAP decision of UKUc based on Y0N −1 , UKU (see (1.5)). 5.3.4 Decoding Process Let I and F be the indices of the information and frozen bits of the Gn -code, respectively. Let r be the root edge of the FV of the FTFG. Decoding the block U0N −1 works according to Algorithm 3. Note that the FTD terminates if Sr = K, which is satisfied if and only if all bits U0N −1 are known. 5.3 Fat-Tree Decoder NOISE 93 DECODE KNOWN Figure 5.7: Allowed state transitions. Known bits remain known and noise bits are not allowed to become known directly. Algorithm 3 Fat-Tree Decoder 1: for i = 0 to N − 1 do 2: if i ∈ F then 3: Si ← K with decision ûi = 0 4: else 5: Si ← N 6: end if 7: end for 8: while Sr 6= K do 9: Compute the message µ(uD ) by SPMP or MPMP. 10: Update the states S0N −1 according to µ(uD ) and a strategy. 11: end while A specific choice of state transitions in Line 10 of Algorithm 3 is called a FTD strategy. The state transitions need to satisfy Figure 5.7. The performance under a specific FTD strategy depends on how good the channels in (5.37) are. Time Complexity We divide the edges of the FV and the TV of the FTFG into n+1 vertical layers. We label these layers with numbers in Zn+1 where the layer 0 consists of all edges corresponding to X0N −1 and the layer n consists of all edges corresponding to U0N −1 . The FTD is able to reuse messages between iterations similar to the SCD. Namely, if no states changed in a certain layer of the FTFG, the RG stays the same in that layer and we do not need to recompute the messages in that layer. The number of reused messages in a decoding process depends heavily on the FTD strategy and hence we can not generally specify it. However, the reuse of messages is essential for the FTD to have manageable complexity. 94 A Generalized SCD with Look-Ahead Space Complexity There are 2n−l super-edges at layer l in the TV of the RG. Therefore, we need to store n X 2n−l = 2N − 1 (5.38) l=0 messages. The number of edges in every super-edge in the TV of the RG is at most |D| and hence the alphabet size of every message is at most 2|D| . Thus, the space complexity is O 2Dmax N where Dmax is the maximal cardinality of |D| in the decoding process. The space complexity for the states is O(N ) because there are 2N −1 states. The space complexity for the assigned bit values is O(N ). To see this, suppose that at the beginning of the decoding process, we use one bit of storage for every Ui , i ∈ ZN , which contains the assigned bit value for a known bit or an arbitrary value for a not yet known bit. As soon as two sibling super-edges in the FV of the FTFG are both in state K, the state of their parent super-edge is also K. Then the storage needed for the bit values of the two siblings can be used to store the bit values of the parent super-edge because the assigned bit values of the two siblings are not needed anymore. Therefore, N bits are enough. Thus, the space complexity for the FTD is O 2Dmax N . 5.4 Examples In this section, we give examples of a full decoding process for a Gn -code of blocklength N = 8. In all examples, we assume the bits U0 , U4 and U5 to be frozen and the remaining bits to be information bits. Moreover, in Section 5.4.4 we give a realistic numerical example to show that incorporating future bits can significantly decrease the error probability (i) of a channel Wn . 5.4.1 Successive Cancellation The SCD is essentially a special case of the FTD. To get a SCD, we use SPMP and the states in Table 5.3. The FV of the RG for the states at time 1-5 are shown in Figures 5.8 - 5.12. Decisions (state transitions from D to K) are made according to (5.36). The strategy used 5.4 Examples t 0 1 2 3 4 5 6 95 S0 K K K K K K K S1 N D K K K K K S2 N N D K K K K S3 N N N D K K K S4 K K K K K K K S5 K K K K K K K S6 N N N N D K K S7 N N N N N D K Table 5.3: An example for the states of a full decoding process of the FTD leading to SCD. The states that changed relative to the previous time step are shown in bold. in this example is called the SC strategy and can easily be generalized to arbitrary blocklengths N and an arbitrary set of frozen bits F. However, there is a subtle difference between the SCD and a FTD with SC strategy. Suppose for example that U3 is additionally frozen. Then when decoding U2 in Figure 5.9, the known value of U3 would be taken into account, but the SCD ignores this value. 5.4.2 SCD with Different Bit Order The FTD is able to decode the bits U0N −1 of a Gn -code in any order. For example, the bit decoding order U2 , U0 , U1 , U4 , U5 , U7 , U6 , U3 is obtained in this example. We use SPMP and the states in Table 5.4. The FV of the RG for the states at time 1-5 are shown in Figures 5.13 - 5.17. Decisions are made according to (5.36) but only for the bit that is supposed to be decoded. Note that although U3 is not decoded before time step 5, the state S3 is equal to D to consider the known value for U2 in the decisions on U1 , U7 , U6 . It is not difficult to generalize this strategy to arbitrary bit decoding orders. However, the complexity of the FTD strongly depends on the bit decoding order. 5.4.3 Block Maximum a Posteriori To get a block MAP decoder (2.24), we need to use MPMP and we need to incorporate all frozen bits in every decision. Although the complexity for a block MAP decoder is in general impractically high, for this small example it is easily feasible. One possibility of states leading to a block 96 A Generalized SCD with Look-Ahead U0 U1 U2 U3 K D G1 N N G1 2 2 D N G1 4 2 U4 U5 K K G1 K 2 N G1 D D 7 N G1 8 X0 4 4 2 U6 U7 N N G1 2 Figure 5.8: The FV of the RG at time step 1 in Table 5.3. U0 U1 U2 U3 K K G1 D N G1 2 2 K D G1 4 2 U4 U5 K K K G1 2 N G1 4 D D 7 N G1 8 X0 4 2 U6 U7 N N G1 2 Figure 5.9: The FV of the RG at time step 2 in Table 5.3. 5.4 Examples U0 U1 U2 U3 97 K K G1 K D G1 2 2 K D G1 4 2 U4 U5 K K G1 K 2 N G1 4 D D 7 N G1 8 X0 4 2 U6 U7 N N G1 2 Figure 5.10: The FV of the RG at time step 3 in Table 5.3. U0 U1 U2 U3 K K G1 K K G1 2 2 K K G1 4 2 U4 U5 K K K G1 2 D G1 4 K D 7 D G1 8 X0 4 2 U6 U7 D N G1 2 Figure 5.11: The FV of the RG at time step 4 in Table 5.3. 98 A Generalized SCD with Look-Ahead U0 U1 U2 U3 K K G1 2 K K G1 2 K K G1 4 2 U4 U5 K K G1 K 2 D G1 K D 7 D G1 8 X0 4 4 2 U6 U7 K D G1 2 Figure 5.12: The FV of the RG at time step 5 in Table 5.3. t 0 1 2 3 4 5 6 S0 K K K K K K K S1 N N D K K K K S2 N D K K K K K S3 N N D D D D K S4 K K K K K K K S5 K K K K K K K S6 N N N N D K K S7 N N N D K K K Channel U2 U1 U7 U6 U3 → Y07 → Y07 , U0 , U2 → Y07 , U02 , U45 → Y07 , U02 , U45 ,U7 → Y07 , U02 , U47 Table 5.4: An example for the states of a full decoding process of the FTD leading to the bit decoding order U2 , U0 , U1 , U4 , U5 , U7 , U6 , U3 . Note that because U0 , U4 and U5 are assumed to be frozen, we do not need to decode them. The states that changed relative to the previous time step are shown in bold. 5.4 Examples U0 U1 U2 U3 99 K N G1 D N G1 2 2 N D G1 4 2 U4 U5 K K G1 K 2 N G1 4 D D 7 N G1 8 X0 4 2 U6 U7 N N G1 2 Figure 5.13: The FV of the RG at time step 1 in Table 5.4. U0 U1 U2 U3 K D G1 K D G1 2 2 D D G1 4 2 U4 U5 K K K G1 2 N G1 4 D D 7 N G1 8 X0 4 2 U6 U7 N N G1 2 Figure 5.14: The FV of the RG at time step 2 in Table 5.4. 100 A Generalized SCD with Look-Ahead U0 U1 U2 U3 K K G1 K D G1 2 2 K D G1 4 2 U4 U5 K K G1 K 2 D G1 D D 7 D G1 8 X0 4 4 2 U6 U7 N D G1 2 Figure 5.15: The FV of the RG at time step 3 in Table 5.4. U0 U1 U2 U3 K K G1 K D G1 2 2 K D G1 4 2 U4 U5 K K K G1 2 D G1 4 D D 7 D G1 8 X0 4 2 U6 U7 D K G1 2 Figure 5.16: The FV of the RG at time step 4 in Table 5.4. 5.4 Examples 101 K U0 K G1 U1 2 K U2 D G1 U3 K D G1 2 D 4 D 7 K G1 8 X0 2 K U4 K K G1 U5 K G1 2 4 4 2 K U6 2 K G1 U7 Figure 5.17: The FV of the RG at time step 5 in Table 5.4. t 0 1 2 3 4 S0 K K K K K S1 N D K K K S2 N N D K K S3 N N N D K S4 K K K K K S5 K K K K K S6 N D K K K S7 N N D K K Table 5.5: An example for the states of a full decoding process of the FTD leading to block MAP decoding. The states that changed relative to the previous time step are shown in bold. MAP decoder are given in Table 5.5. The FV of the RGs for the states at time 1-3 are shown in Figures 5.18 - 5.20. If decisions are made for all i ∈ D according to (5.36) then by Example 1.1 it might happen that we decide for a uD that does not maximize µ(uD ). Therefore, we need to use either ûD = argmax µ(uD ) (5.39) uD which corresponds to (1.3) or we successively decide on the bits Ui , i ∈ D, according to Algorithm 1. 102 A Generalized SCD with Look-Ahead U0 U1 U2 U3 K D G1 N N G1 2 2 D N G1 4 2 U4 U5 K K G1 K 2 D G1 D D 7 D G1 8 X0 4 4 2 U6 U7 D N G1 2 Figure 5.18: The FV of the RG at time step 1 in Table 5.5. U0 U1 U2 U3 K K G1 D N G1 2 2 K D G1 4 2 U4 U5 K K K G1 2 D G1 4 D D 7 D G1 8 X0 4 2 U6 U7 K D G1 2 Figure 5.19: The FV of the RG at time step 2 in Table 5.5. 5.4 Examples U0 U1 U2 U3 103 K K G1 2 K D G1 2 K D G1 4 2 U4 U5 K K K G1 2 K G1 D D 7 K G1 8 X0 4 4 2 U6 U7 K K G1 2 Figure 5.20: The FV of the RG at time step 3 in Table 5.5. 5.4.4 The Benefits of Future Bits: A Numerical Example Let W be the UBAWGNC with SNR = 2 dB. For n = 11, N = 2048 (i) and i = 127, we get from simulations that Pe (Wn ) = 0.26. This is too high for Ui to be an information bit. However, this channel has a large potential for improvement because it is the worst channel among (j) all Wn , j ∈ ZN , with wH (j) = 7 (see Chapter 3). By incorporating some future bits UA , we obtain the improved channel f (i) = Ui → Y N −1 , U i−1 , UA . W n 0 0 (5.40) fn(i) ). By using the FTD, we are able to get a numerical estimate of Pe (W As an example, suppose F is the index set of the frozen bits of a polar code of blocklength N = 2048 and rate R = 0.5 designed for SNR = 2 dB. Let A be the largest subset of {j ∈ F |j > 127} such that the FTD with given cardinality of D can incorporate the known bits U0126 , UA , i.e. KU = Z127 ∪ A (see Section 5.6.3 for how such a D can be determined). fn(i) for several |D| are shown in Table 5.6. The error probabilities of W We see that significant gains can be obtained. However, note that not all synthesized channels have such a large potential for improvement. 104 A Generalized SCD with Look-Ahead |D| (127) f Pe (W11 ) 1 2.6 · 10−1 2 4.8 · 10−3 3 8.8 · 10−5 4 9.5 · 10−6 (127) Table 5.6: The error probabilities of an improved version of W11 5.5 . Ideas for Good Strategies The number of possible FTD strategies is huge, thus this section is supposed to give some intuition on how to design efficient FTD strategies. 5.5.1 “Decode” Frozen Bits First Suppose we can either decode Ui now or first Uj and then Ui . Decoding Uj first has a smaller error probability for the decision on Ui , because the channel Ui → Y0N −1 , UKU (5.41) is stochastically degraded with respect to Ui → Y0N −1 , UKU , Uj . (5.42) However, decoding Uj first has a larger error probability for the decision on Uj , because the channel Uj → Y0N −1 , UKU (5.43) is stochastically degraded with respect to Uj → Y0N −1 , UKU , Ui . (5.44) But if Uj is a frozen bit, then it is impossible to make errors on Uj and hence it is beneficial to “decode” Uj first. In general, it is beneficial to “decode” as many frozen bits as early as possible. In the extreme case, if all frozen bits are “decoded” before all information bits and if we use MPMP, then we obtain a block MAP decoder (see Section 5.4.3). 5.5.2 Taking Decisions: Benefits and Drawbacks In this section, we try to qualitatively understand when it is beneficial to take a decision and when not. Assume Ui and Uj are information bits 5.5 Ideas for Good Strategies 105 and we can choose between decoding Ui now or first Uj and then Ui . For the sake of brevity, all probability distributions in this section should be understood to be conditioned on Y0N −1 = y0N −1 , UKU = ûKU although this is omitted in the notation. Suppose that deciding first on Uj leads to the reliable decision ûj , i.e. the probability pUj (ûj ) is close to 1. We obtain the following bound on the absolute difference between the probabilities of Ui in the two cases. pUi (ui ) − p U |U (ui | ûj ) i j X = p Ui |Uj (ui |uj )pUj (uj ) − p Ui |Uj (ui | ûj ) (5.45) uj = p Ui |Uj (ui | ûj ) pUj (ûj ) − 1 + p Ui |Uj (ui | ûj ⊕ 1)pUj (ûj ⊕ 1)(5.46) (5.47) = p Ui |Uj (ui | ûj ⊕ 1) − p Ui |Uj (ui | ûj )pUj (ûj ⊕ 1) ≤ pUj (ûj ⊕ 1) (5.48) As pUj (ûj ) is almost 1, pUj (ûj ⊕ 1) = 1 − pUj (ûj ) is almost zero and therefore the additional known value Uj = ûj changes the distribution on Ui only minor. Therefore, roughly speaking, a reliable decision does not substantially improve the reliability of future decisions. On the other hand, we show by the example in Figure 5.21 that taking reliable decisions can be indirectly beneficial for future decisions. Suppose N = 8, KU = {0, 1, 3} and suppose we can choose between decoding U4 now (see Figure 5.21a) or first U2 and then U4 (see Figure 5.21b). Let us assume the decision on U2 is reliable. Although the decision on U2 does not substantially improve the reliability of the decision on U4 , it reduces the message passing complexity. Or in other words, taking reliable decisions may bundle known bit values such that they are accessible with lower message passing complexity. Then the freed complexity can be used to incorporate more known bits (as for example U7 in Figure 5.21b). Thus taking decisions that are reliable is recommended. Now suppose that deciding first on Uj leads to an unreliable decision ûj , i.e. the probability pUj (ûj ) is not close to 1. In this case, the decision on Ui might be substantially improved by considering the additional known value Uj = ûj . However, the decision on Uj is not reliable and hence we should postpone it until its reliability is improved or until there is no hope to improve its reliability without exceeding the given complexity. 106 A Generalized SCD with Look-Ahead U0 U1 U2 U3 K K G1 D K G1 2 2 K D G1 D D 7 D G1 8 X0 4 2 U4 U5 D N G1 D 2 N G1 4 4 2 U6 U7 N K G1 2 (a) U4 → Y07 , U0 , U1 , U3 U0 U1 U2 U3 K K G1 K K G1 2 2 K K G1 4 2 U4 U5 D N G1 D 2 N G1 4 K D 7 D G1 8 X0 4 2 U6 U7 N K G1 2 (b) U4 → Y07 , U03 Figure 5.21: An example that shows the benefit of taking reliable decisions. The decision on U2 was taken in (b) but not in (a). If we assume that the decision on U2 was reliable, then the reliability of the decision on U4 is in both graphs very similar but message passing in the TV of the RG corresponding to (b) is more efficient than for (a). 5.6 Implemented Strategies 5.5.3 107 Choose the Bits to Decode with Close Indexes The integers i, j in Figure 5.4 correspond to the number of parallel superedges in a current layer in the FV of the RG (for example see Figure 5.3). Therefore, in view of the decoding complexity, it is desirable to choose the set D such that there are as few as possible parallel edges in the FV of the RG. Suppose D = {0, N − 1}, then we have two parallel edges at all layers. On the other hand, if D = {0, 1}, then we have one parallel edge except at the last stage. We can give the rule of thumb that the decoding complexity tends to be lower if the elements of D are close to each other, i.e. |i − j| is small for many i, j ∈ D. 5.6 Implemented Strategies This section describes two specific FTD strategies, i.e. specifications of Line 10 in Algorithm 3. Note that both strategies can be used with SPMP or MPMP. Let Le , KU and KI be defined as in Theorem 5.1. We use the notation KU (D, K) and KI (D, K) to make the dependency of KU and KI on the sets D and K visible. In Section 5.6.1, scores are assigned to all super-edges in the FV of the FTFG. They are needed in the description of both strategies. Section 5.6.2 introduces the most known bits of lowest index strategy (MKBLIS), which incorporates as many known bits of lowest index as possible without exceeding a given complexity. Section 5.6.3 presents the most known bits strategy (MKBS), which incorporates as many known bits as possible without exceeding a given complexity. 5.6.1 Scores We define for any j ∈ N the set (j) ∆KU (D, K) , KU (D ∪ {j}, K)\KU (D, K) (5.49) which is the index set of all known but ignored bits UKI (D,K) that would be used if the state Sj would be set to D. We assign to every super-edge e in the FV of the FTFG the non-negative score value Te defined by (j) Te , max ∆KU (D, K) ∩ Le (5.50) j∈N ∩Le 108 A Generalized SCD with Look-Ahead where the result of the maximization is defined to be zero if N ∩ Le = ∅. Therefore, the score Te equals the number of known bits from the subtree of the super-edge e that can be incorporated maximally by setting the state of one bit Uj in the subtree of the edge e to D. Note that the score Te is zero if the super-edge e is in state K, because then N ∩ Le = ∅. The scores can be computed recursively as is shown in the following proposition whose proof is given in the Appendix. Proposition 5.1 (Recursion for Te ). If the super-edge e corresponds to an Ui , then we have Te = 0. Otherwise, let u, d denote the two children edges of e with states Su , Sd and scores Tu , Td . Then we have that Tu + |Ld |, Te = |Lu | + Td , max{Tu , Td }, if (Su , Sd ) = (N, K), if (Su , Sd ) = (K, N), otherwise. (5.51) Note that |Lu | or |Ld | are equal to 2n−m where m is the depth of the edge u or d in the FV of the FTFG, respectively. 5.6.2 Most Known Bits of Lowest Index Strategy Let Dmax be a positive integer. The MKBLIS incorporates as many known bits of lowest index as possible under the constraint |D| ≤ Dmax . Definition Let r be the root edge in the FV of the FTFG. Note that Tr = 0 if and only if all known bits are incorporated, i.e. KU = K. The strategy is given in Algorithm 4 where iMKBLIS is defined as follows. We start at the root edge r and sequentially go • to the up child edge u, if Tu > 0 or if Tu = 0,Td = 0 and Su 6= K, • to the down child edge d, otherwise. Then iMKBLIS is the index of the bit Ui we finally end up. A step-by-step example is given in the next paragraph. 5.6 Implemented Strategies 109 Algorithm 4 Most Known Bits of Lowest Index Strategy 1: if |D| > 0 then 2: Compute lj for all j ∈ D according to (5.33) or (5.35). 3: i ← argmaxj∈D |lj | ( 0, if li ≥ 0, . 4: Si ← K with decision ûi = 1, otherwise. 5: end if 6: while |D| < Dmax and Tr > 0 do 7: SiMKBLIS ← D 8: end while In every except the first iteration of the FTD in Algorithm 3, the MKBLIS takes a decision only on the most reliable bit among Ui , i ∈ D, and thus the FTD terminates after |I| + 1 iterations. Example Suppose N = 8, Dmax = 2 and U0 , U3 , U6 and U7 are frozen bits. The MKBLIS leads to the state transitions in Table 5.7. The states and scores after every state transition of the first execution of the MKBLIS (i.e. the state transitions between time step 0 and 1) are shown in Figures 5.22 5.24. Note that in Figure 5.24, the set D reached its maximal cardinality of Dmax = 2 and therefore the first execution of the MKBLIS terminates. Note that Dmax would need to be at least 3 to additionally incorporate the known bits U6 and U7 . As claimed, the most known bits with lowest index U0 , U3 are incorporated without |D| exceeding Dmax . Complexity We divide the edges of the FV and the TV of the FTFG into n+1 vertical layers. We label these layers with numbers in Zn+1 where the layer 0 consists of all edges corresponding to X0N −1 and the layer n consists of all edges corresponding to U0N −1 . Recall from Section 5.3.4 that all messages at one layer can be reused if no states change in that layer. All states either equal K already at the beginning of the decoding process or do the state transitions N → D → K. Thus, for every state at most two state transitions occur in the decoding process. But the number of states at layer l equals 2l and hence at most 2 · 2l state transitions can occur at layer l. Therefore we need to recalculate the messages at layer l at most 2 · 2l times. From the TV of the FTFG follows that the number 110 A Generalized SCD with Look-Ahead t 0 1 2 3 4 5 S0 K K K K K K S1 N D K K K K S2 N D D K K K S3 K K K K K K S4 N N D D K K S5 N N N N D K S6 K K K K K K S7 K K K K K K Table 5.7: An example for the states of a full decoding process of the FTD with MKBLIS. We arbitrarily assumed that the most reliable bit from UD is always the bit with lowest index. The states that changed relative to the previous time step are shown in bold. U0 (K,0) G1 U (N,0) 1 U2 (N,0) G1 U (K,0) 3 U4 U5 U6 U7 (N,0) (N,1) G1 (N,1) (K,0) G1 G1 (N,0) (N,0) G1 (K,0) (N,1) G1 (N,2) X07 (N,2) (K,0) Figure 5.22: The FV of the FTFG at time step 0 in Table 5.7. All pairs, as for example (N,1), refer to the state and the score of its super-edge. It follows that iMKBLIS = 1 and thus S1 is set to D, which incorporates the known bit U0 . 5.6 Implemented Strategies U0 (K,0) G1 U (D,0) 1 U2 (N,0) G1 U (K,0) 3 U4 (N,0) (D,0) G1 (D,1) (N,1) G1 (N,0) G1 U5 (N,0) U6 (K,0) G1 U (K,0) 111 G1 (D,2) X07 (N,2) (K,0) 7 Figure 5.23: The FV of the FTFG as in Figure 5.22 but with S1 = D. The quantities that changed relative to Figure 5.22 are shown in bold. It follows that iMKBLIS = 2 and thus S2 is set to D, which additionally incorporates the known bit U3 . U0 (K,0) G1 U (D,0) 1 U2 (D,0) G1 U (K,0) 3 U4 U5 U6 U7 (N,0) (D,0) G1 (D,0) (K,0) G1 G1 (N,0) G1 (N,0) G1 (K,0) (D,0) (D,2) X07 (N,2) (K,0) Figure 5.24: The FV of the FTFG as in Figure 5.23 but with S2 = D. The quantities that changed relative to Figure 5.23 are shown in bold. It follows that iMKBLIS = 4, but as |D| = 2, the bound Dmax = 2 is already reached and hence the first execution of the MKBLIS terminates. 112 A Generalized SCD with Look-Ahead of messages at layer l equals 2n−l . Therefore, to decode one block we need to calculate at most n X 2 · 2l · 2n−l = (n + 1)2N (5.52) l=0 messages. If we combine this result with the bounds (5.18) and (5.19), we get a message passing complexity of O 22Dmax N log(N ) . We derive the complexity of determining iMKBLIS . As the states and scores can be determined recursively (see (5.4) and (5.51)), a change of a state Si implies a change of the states and scores only of the ancestor super-edges of Ui . Thus, at most n + 1 changes of states and scores are necessary. Because there are at most Dmax + 1 state changes in one execution of the MKBLIS and exactly |I|+1 executions of the MKBLIS, the complexity of updating the states and scores is O(Dmax N log(N )). Moreover, the complexity of determining iMKBLIS once according to the states and scores is O(log(N )) and this has to be performed at most Dmax (|I| + 1) times. Thus, updating the states and scores and finding iMKBLIS has complexity O(Dmax N log(N )). The space complexity for the scores is O(N ) because there are 2N −1 scores. Discussion The bias towards known bits of lowest index comes from going up even if Tu and Td are both positive. Note further that we go up in the case Tu = 0, Td = 0, Su 6= K, Sd 6= K, but we could also go down and would still incorporate the most known bits of lowest index. The following proposition, whose proof is given in the Appendix, justifies why we call this strategy MKBLIS. Proposition 5.2. Let C(D, K) be the number of contiguous known bits of lowest index in KU (D, K): K<m , {i ∈ K|i < m}, <m K C(D, K) , max (5.53) (5.54) m≥0 K<m ⊂KU (D,K) After every execution of the MKBLIS, the set D maximizes C(D, K) among all D ⊂ ZN \K with |D| ≤ Dmax . 5.6 Implemented Strategies 113 The MKBLIS corresponds to SCD but with a possibly different bit decoding order (as in the example of Section 5.4.2). This follows from the proof of Proposition 5.2 which implies that KU in dependence of time is a sequence of increasing sets and the decisions are incorporated just after they are made. However, the bit decoding order is not fix but depends on y0N −1 because the reliabilities of the bits UD depend on y0N −1 . Note that if the Gn -code is a polar code for some transmitting channel W 0 , then the MKBLIS with Dmax = 1 is the same as the SC strategy in Section 5.4.1. Intuition Behind MKBLIS When the SCD decides on Ui , then all bits U0i−1 are incorporated. When the FTD with MKBLIS decides on Ui , then all known bits from U0i−1 are incorporated and often more, i.e. K ∩ Zi ⊂ KU . But the bits Uj , j ∈ Zi \KU , which are incorporated by the SCD but not by the FTD, were either unreliable decisions or if they were reliable then they would not significantly improve the reliability of the decision on Ui (see Section 5.5.2). Thus the FTD with MKBLIS can be viewed as an improved SCD where the improvement comes from the additionally incorporated bits KU \Zi and from only deciding on the bit Ui , i ∈ D, with highest reliability. Moreover, as the set D for the MKBLIS tends to contain close indexes, Section 5.5.3 indicates that the decoding complexity should be relatively low for a FTD with |D| = Dmax . 5.6.3 Most Known Bits Strategy The most known bits strategy (MKBS) incorporates as many known bits as possible under the constraint |D| ≤ Dmax . Definition The MKBS equals the MKBLIS in Algorithm 4 but with iMKBLIS replaced by iMKBS , which is defined as follows. We start at the root edge r and sequentially go • to the up child edge u, if Tu > Td or if Tu = Td and Su 6= K, • to the down child edge d, otherwise. 114 A Generalized SCD with Look-Ahead Then iMKBS is the index of the bit we finally end up. As for the MKBLIS, the MKBS terminates after |I| + 1 iterations. Complexity As for the MKBLIS, updating the scores and states and determining iMKBS has complexity O(Dmax N log(N )) and the message passing complexity is O 22Dmax N log(N ) . Discussion The bias towards incorporating the most known bits comes from going up if Tu > Td and going down if Td > Tu . Note that we go up in the case Tu = Td , Su 6= K, Sd 6= K, but we could also go down and would still incorporate the most known bits. The following proposition, whose proof is given in the Appendix, justifies why we call this strategy MKBS. Proposition 5.3. After every execution of the MKBS, the set D maximizes |KU (D, K)| among all D ⊂ ZN \K with |D| ≤ Dmax . The MKBS corresponds to SCD but with a possibly different bit decoding order (as in the example of Section 5.4.2). This follows from the proof of Proposition 5.3 which implies that KU in dependence of time is a sequence of increasing sets and the decisions are incorporated just after they are made. However, the bit decoding order is not fix but depends on y0N −1 because the reliabilities of the bits UD depend on y0N −1 . Note that if the Gn -code is a polar code for some transmitting channel W 0 , then the MKBS with Dmax = 1 is the same as the SC strategy in Section 5.4.1. Intuition Behind MKBS The MKBS exploits the idea in Section 5.5.1. Note that in contrast to the MKBLIS, when the MKBS decides on the bit Ui there is no guarantee that all known bits from U0i−1 are incorporated. 5.7 Simulation Results We present simulation results for the FTD with MKBLIS and MKBS for the transmitting channel UBAWGNC. In order to improve the perfor- 5.7 Simulation Results 115 mance under the FTD, we use the idea from [21] and consider also polar codes designed for the channel UBAWGNC ασ02 for various α ∈ (0, 1]. Note that for α → 0, such a code approaches a Reed-Muller code [21]. We call 1/σ02 the base SNR and 1/ ασ02 the design SNR. Theterm SNR refers to the SNR of the transmitting channel UBAWGNC σ 2 for which the polar code is used. We measure the decoding complexity by counting the average number of multiplications (5.12) and additions (or maximizations) (5.13) that are needed in total to obtain the message µ(uD ). Moreover, we use the procedure from Section 1.4.1 to obtain a lower bound on the block error probability of the block MAP decoder and refer to this bound as the MAP bound. 5.7.1 MKBLIS and MKBS In this section, we compare the MKBLIS and the MKBS. We use a polar code of rate R = 0.5, blocklength N = 2048 and base SNR = 2 dB. For α = 1.0, the block error probability and the decoding complexity of the FTD with MKBLIS and MKBS are shown in Figures 5.25 and 5.26, respectively. For high enough SNR, both strategies approach the MAP bound. The decoding complexity grows exponentially with Dmax for both strategies. Increasing Dmax by one increases the complexity by a factor of about 2.5 which is less than the factor of 4 suggested by (5.18) and (5.19). Note that the decoding complexity does not depend heavily on the SNR. For the same Dmax , the MKBS strategy performs better than the MKBLIS strategy but the MKBS has also higher decoding complexity. For α = 0.7, the block error probability and the decoding complexity of the FTD with MKBLIS and MKBS are shown in Figures 5.27 and 5.28, respectively. Compared to α = 1.0, we see a marked performance improvement at high SNR without a significant change in decoding complexity. Moreover, if we use Dmax = 5 for the MKBLIS and Dmax = 4 for the MKBS, then both have a similar decoding complexity but the MKBS has lower block error probability. The same could also be observed for other parameters and hence the MKBS has a better trade-off between complexity and performance. Figure 5.29 shows the block error probability and the decoding complexity for the FTD with MKBS for various α. We see that for high SNRs it is better to use a code with a lower α. Note that the decoding complexity does not depend heavily on α. 116 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 1 2 3 4 5 6 Dmax Figure 5.25: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 2 dB under the FTD with MKBLIS and SPMP. 5.7 Simulation Results 117 Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 1 2 3 4 5 6 Dmax Figure 5.26: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 2 dB under the FTD with MKBS and SPMP. 118 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 1 2 3 4 5 6 Dmax Figure 5.27: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under the FTD with MKBLIS and SPMP. 5.7 Simulation Results 119 Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 1 2 3 4 5 6 Dmax Figure 5.28: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under the FTD with MKBS and SPMP. 120 A Generalized SCD with Look-Ahead Block error probability 100 α = 1.0 α = 0.9 α = 0.8 α = 0.7 α = 0.6 α = 0.5 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 multiplications, SNR = 2 dB additions, SNR = 2 dB 106 0.5 0.6 0.7 0.8 0.9 1 α Figure 5.29: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and base SNR = 2 dB under the FTD with MKBS, Dmax = 5 and SPMP. 5.7 Simulation Results 5.7.2 121 Sum-Product and Max-Product In this section, we compare SPMP in (5.10) with MPMP in (5.11). We use a polar code of rate R = 0.5, blocklength N = 2048, base SNR = 2 dB and α = 0.7. The block error probability and the complexity of the FTD with MKBS and MPMP are shown in Figure 5.30. We see that the block error probability is slightly worse than for SPMP in Figure 5.28, whereas the decoding complexity is almost identical (if we assume that an addition has the same complexity as a maximization). 5.7.3 Varying the Rate In this section, we compare the FTD with MKBS for polar codes of rates R = 0.05 and R = 0.95. The block error probability and the decoding complexity of the FTD with MKBS are shown in Figure 5.31 for the rate R = 0.05 with base SNR = -9 dB and α = 0.3 and in Figure 5.32 for the rate R = 0.95 with base SNR = 8 dB and α = 0.7. In both cases, the parameter α was chosen to optimize the performance. It can be seen that also for low and high rates the FTD achieves substantially better performance than SCD. Moreover, for R = 0.05, the decoding complexity does not grow as fast with Dmax as for R = 0.5 or R = 0.95. One possible explanation is that the FTD with MKBS or MKBLIS increases the set D only if not all known bits are incorporated (i.e. Tr > 0). When the FTD reaches the phase where most of the bits are known, then all known bits can be incorporated also with a D of cardinality smaller than Dmax . A code with a small rate is relatively longer in that phase than a code with a high rate. Another possible explanation is that more bits are frozen for a code with low rate and thus the message passing complexity is closer to the lower bounds in (5.14) and (5.15). Table 5.8 shows the minimal Dmax that is needed by the FTD to be a block MAP decoder. We see that the minimal Dmax increases with decreasing α. This is not a big surprise as Reed-Muller codes, which are approached for α → 0, are more powerful than polar codes. Moreover, we can see that the minimal Dmax is significantly larger for R = 0.5 than for high or low rates. 5.7.4 Varying the Blocklength In this section, we compare the FTD with MKBS for polar codes of various blocklengths. We use polar codes of rate R = 0.5 and base SNR = 2 dB. 122 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 3 dB maximizations, SNR = 1 dB maximizations, SNR = 3 dB 1 2 3 4 5 6 Dmax Figure 5.30: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under the FTD with MKBS and MPMP. 5.7 Simulation Results 123 Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 −10 −9.5 −9 −8.5 −8 SNR (dB) Number of operations per block 106 105 104 103 multiplications, SNR = -10 dB multiplications, SNR = -8 dB additions, SNR = -10 dB additions, SNR = -8 dB 1 2 3 4 5 6 Dmax Figure 5.31: The block error probability and the decoding complexity for a polar code of rate R = 0.05, blocklength N = 2048 and design SNR = -3.7712 dB under the FTD with MKBS and SPMP. 124 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 7 7.5 8 8.5 9 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 7 dB multiplications, SNR = 9 dB additions, SNR = 7 dB additions, SNR = 9 dB 1 2 3 4 5 6 Dmax Figure 5.32: The block error probability and the decoding complexity for a polar code of rate R = 0.95, blocklength N = 2048 and design SNR = 9.5490 dB under the FTD with MKBS and SPMP. 5.7 Simulation Results R = 0.05 R = 0.5 R = 0.95 α = 1.0 24 93 28 125 α = 0.9 24 102 32 α = 0.8 26 110 36 α = 0.7 27 120 41 α = 0.6 28 135 44 Table 5.8: The minimal Dmax that is needed by the FTD to be a block MAP decoder for a polar code of blocklength N = 2048. The base SNRs are -9 dB, 2 dB and 8 dB for the rates 0.05, 0.5 and 0.95, respectively. N = 256 N = 2048 N = 8192 α = 1.0 20 93 302 α = 0.9 20 102 341 α = 0.8 21 110 385 α = 0.7 21 120 431 α = 0.6 22 135 487 Table 5.9: The minimal Dmax that is needed by the FTD to be a block MAP decoder for a polar code of rate R = 0.5 and base SNR = 2 dB. The block error probability and the decoding complexity of the FTD with MKBS for N = 256 and α = 0.1 are shown in Figure 5.33. We see that even for α = 0.1 the FTD with MKBS approaches the MAP bound. The block error probability and the decoding complexity of the FTD with MKBS for N = 8192 and α = 1.0 are shown in Figure 5.34. We see that for high enough SNR the FTD with MKBS is able to approach the MAP bound for α = 1.0. However, the higher the blocklength, the more difficult it is to achieve the MAP bound with manageable complexity. But it can also be seen in Figure 5.34 that the performance gain obtained by increasing Dmax is much larger than for smaller blocklengths. Table 5.9 shows the minimal Dmax that is needed by the FTD to be a block MAP decoder. It can be seen that the minimal Dmax increases markedly with increasing blocklength. 5.7.5 Experiments In this section, we modify the MKBS slightly and see what happens. We use a polar code of rate R = 0.5, blocklength N = 2048, base SNR = 2 dB and α = 0.7. 126 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1.5 2 2.5 3 3.5 SNR (dB) Number of operations per block 106 105 104 103 multiplications, SNR = 1.5 dB multiplications, SNR = 3.5 dB additions, SNR = 1.5 dB additions, SNR = 3.5 dB 1 2 3 4 5 6 Dmax Figure 5.33: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 256 and design SNR = 12 dB under the FTD with MKBS and SPMP. 5.7 Simulation Results 127 Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 MAP bound 10−1 10−2 10−3 10−4 10−5 1 1.5 2 2.5 SNR (dB) Number of operations per block 107 106 105 104 multiplications, SNR = 1 dB multiplications, SNR = 2.5 dB additions, SNR = 1 dB additions, SNR = 2.5 dB 1 2 3 4 Dmax Figure 5.34: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 8192 and design SNR = 2 dB under the FTD with MKBS and SPMP. 128 A Generalized SCD with Look-Ahead Our first experiment is to decide on the bit Ui , i ∈ D, with lowest index i instead of highest reliability. Figure 5.35 shows the block error probability and the decoding complexity of the FTD with modified MKBS. It can be seen that the modification applied to the MKBS substantially deteriorates the performance. Thus, it is very beneficial for the FTD to be able to postpone the decisions on the less reliable bits until more frozen bits are incorporated. Note that the modification applied to the MKBS reduced the decoding complexity by a factor of almost two for high Dmax . Our second experiment is to decide on the two most reliable bits from Ui , i ∈ D. The intention is to get a better trade-off between performance and decoding complexity. Figure 5.36 shows the block error probability and the decoding complexity for the FTD with modified MKBS. It can be seen that the modified MKBS with Dmax = k has an almost identical block error probability as the MKBS with Dmax = k − 1 (see Figure 5.28). However, the decoding complexity of the modified MKBS with Dmax = k is higher than the decoding complexity of the MKBS with Dmax = k − 1. Thus, the modified MKBS has a worse trade-off between decoding complexity and performance than the MKBS. 5.7.6 Comparison to the Successive Cancellation List Decoder In this section, we compare the SCLD and the FTD with MKBS. The SCLD can also achieve block MAP performance of polar codes with a manageable complexity [35]. We give two extreme examples to show that the SCLD and the FTD have different strengths and weaknesses Suppose that at some stage in the FTD, the message µ(uD ) is {0, 1}valued and equals 1 if and only if uD ∈ {(0, 0, . . . , 0), (1, 1, . . . , 1)}. (5.55) From (5.33) or (5.35), we obtain that li = 0 for all i ∈ D. Thus, although only two out of 2|D| probabilities are positive, all possible decisions are as unreliable as possible. However, in the same situation, the SCLD with list size L = 2 would be able to proceed without taking unreliable decisions. On the other hand, suppose ûi−1 are correct decisions and that there 0 are m consecutive information bits Uii+m−1 with probabilities 1 p uii+m−1 y0N −1 , ûi−1 = m 0 2 (5.56) 5.7 Simulation Results 129 Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 Number of operations per block SNR (dB) 107 106 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 105 104 1 2 3 4 5 6 Dmax Figure 5.35: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under the FTD with modified MKBS and SPMP. The modified MKBS decides on the bit Ui , i ∈ D, with lowest index i instead of highest reliability. 130 A Generalized SCD with Look-Ahead Block error probability 100 Dmax = 1 Dmax = 2 Dmax = 3 Dmax = 4 Dmax = 5 Dmax = 6 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 multiplications, SNR = 1 dB multiplications, SNR = 3 dB additions, SNR = 1 dB additions, SNR = 3 dB 105 104 1 2 3 4 5 6 Dmax Figure 5.36: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under the FTD with modified MKBS and SPMP. The modified MKBS decides on the two most reliable bits from Ui , i ∈ D. 5.8 Conclusion and Outlook 131 m for all ui+m−1 ∈ Zm is likely to 2 . Then, the SCLD with list size L < 2 i prune the correct path. However, the FTD is likely to have only one state of the bits Uii+m−1 equal to D and can use the remaining complexity to incorporate additional frozen bits which make the decisions on Uii+m−1 more reliable. We compare SCLD and the FTD with MKBS in terms of performance and decoding complexity. We use polar codes of rate R = 0.5 and base SNR = 2 dB. The block error probability and the decoding complexity of the SCLD for the blocklength N = 2048 and α = 0.7 are shown in Figure 5.37. Note that the decoding complexity for the SCLD is independent of the SNR. The SCLD with L = 16 needs 830’080 real multiplications and 259’940 real additions per block. The FTD with MKBS and Dmax = 4 needs on average 844’960 real multiplications and 640’150 real additions per block. Thus, the two decoders have roughly the same decoding complexity. Figure 5.38 shows the block error probability of the two decoders. It can be seen that the SCLD is better than the FTD with MKBS. Figure 5.39 shows the block error probability of the two decoders for the blocklength N = 8192 and α = 0.8. It can be seen that the performance difference is similar to Figure 5.38. Qualitatively the same could be observed also for other parameters. 5.8 Conclusion and Outlook We introduced the FTD, which is a generalization to SCD. We gave two specific strategies for the FTD which led to substantially better performance than SCD. For many parameters, the FTD approaches block MAP performance of polar codes with manageable complexity. For a fixed decoding complexity, the performance of the FTD is not quite as good as the one of the SCLD. Nevertheless, as the FTD is very flexible, it is promising to do more research. We think that designing a Gn -code specifically for a FTD strategy is a very promising way of improving the performance of the FTD. For example, suppose we first fix a bit decoding order such that the FTD is able to decode in this order without violating |D| ≤ Dmax . Then we determine the quality of the resulting bit channels in (5.37) by simulations and use the best dN Re channels to transmit information. An interesting but very complicated problem is to determine which known but ignored bits UKI should be incorporated such that the relia- 132 A Generalized SCD with Look-Ahead Block error probability 100 L=1 L=2 L=4 L=8 L = 16 L = 32 MAP bound 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Number of operations per block 107 106 105 multiplications additions 104 1 2 4 8 16 32 L Figure 5.37: The block error probability and the decoding complexity for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under SCLD with various list sizes L. 5.8 Conclusion and Outlook Block error probability 100 133 FTD with MKBS and Dmax = 4 SCLD with L = 16 10−1 10−2 10−3 10−4 1 1.5 2 2.5 3 SNR (dB) Figure 5.38: The block error probability for a polar code of rate R = 0.5, blocklength N = 2048 and design SNR = 3.549 dB under SCLD and FTD with MKBS of similar decoding complexity. 134 A Generalized SCD with Look-Ahead Block error probability 100 FTD with MKBS and Dmax = 4 SCLD with L = 16 10−1 10−2 10−3 10−4 10−5 1 1.5 2 2.5 SNR (dB) Figure 5.39: The block error probability for a polar code of rate R = 0.5, blocklength N = 8192 and design SNR = 2.9691 dB under SCLD and FTD with MKBS of similar decoding complexity. 5.8 Conclusion and Outlook 135 bility of the bits UD is increased the most. Solving this problem would be very useful for improving the performance of the FTD. Note that this problem is related to the problem of determining the quality of the channels in (5.37). One could try to use the combined partial order from Chapter 3 to obtain bounds for these channels. The SCLD and the FTD work very different and one could call them orthogonal to each other. The discussion in Section 5.7.6 suggests that a combination of the FTD and the SCLD would result in a decoder superior to both. The SPMP complexity of the FTD can be reduced by using the FFT trick mentioned in [20, Sec. 3.3]. It would be interesting to determine how much can be saved and whether this would close the gap to the SCLD. Moreover, we think that the FTD is easier to build if we use this FFT trick. If the bit decisions for |D| = k are reliable, then it is not necessary to use |D| > k. Thus, one could adjust |D| to the reliability of the decisions which would save some decoding complexity. It would be interesting to determine how much can be saved. Note that such kind of mechanism was also given in [18] and [5] for the SCLD. The space complexity and the decoding complexity of the FTD grows exponentially with Dmax . However, there might be a way to approximate messages that live on a large alphabet such that the space complexity and the time complexity is reduced without deteriorating the performance too much. As a result, we could use a higher Dmax . The FTD is applicable also to other codes than Gn -codes. For example, consider the DFT-codes from Chapter 4. One can derive a cycle-free front view and top view of the factor graph in Figure 4.2 and thus the FTD is also applicable to DFT-codes. Although the high alphabet size of a DFT-code of practical blocklength prohibits to have |D| > 1, there is still the freedom of which i ∈ I is chosen to be in D. Moreover, it might even be possible to use a set D with more than one element by approximating the messages as mentioned in the previous item. Note that the FTD is applicable also to all codes discussed in the first part of Section 2.7.2. Finally, the trade-off between decoding complexity and performance might be better if we use many FTD with different strategies and with lower Dmax in parallel. For example, one could even use the FTD many times with the same strategy but every time on a different representation of the Gn -code obtained from Lemma 3.1 (see also [11, Sec. II.B]). Appendix A Proofs A.1 Proof of Proposition 1.2 This proposition is proved along the same line as [36, Lemma 3]. Let Wi be the channel Xi → Yi for i = 1, 2. The condition W1 W2 implies the existence of distributions p(y1 |y2 ) such that X pX1 ,Y1 (x, y1 ) = p(y1 |y2 )pX2 ,Y2 (x, y2 ). (A.1) y2 Define Y10 such that pX2 ,Y2 ,Y10 (x2 , y2 , y10 ) = p(y10 |y2 )pX2 ,Y2 (x2 , y2 ). (A.2) Then X2 , Y2 , Y10 form a Markov chain in that order according to [7, p. 34]. The data-processing inequality [7, Th. 2.8.1] implies I(W2 ) = I(X2 ; Y2 ) ≥ I(X2 ; Y10 ). (A.3) But X2 , Y10 have the same joint distribution as X1 , Y1 and thus we get I(X2 ; Y10 ) = I(W1 ), which proves (1.23). The relation (1.24) follows from (1.23) because (1.16) implies H(X1 ) = H(X2 ) and thus H(W1 ) = H(X1 |Y1 ) (A.4) = H(X1 ) − I(W1 ) (A.5) ≥ H(X1 ) − I(W2 ) (A.6) = H(X2 ) − I(W2 ) (A.7) = H(W2 ) (A.8) 137 138 Proofs Finally, (1.25) follows from Pe (W1 ) = P X2 6= X̂2 (Y10 ) = 1 − P X2 = X̂2 (Y10 ) ( ) X X 0 =1− max pX2 ,Y2 ,Y10 (x2 , y2 , y1 ) x2 y10 ≥1− XX y10 =1− y2 X =1− max pX2 ,Y2 ,Y10 (x2 , y2 , y10 ) x2 x2 X (A.11) X p(y10 |y2 ) (A.12) (A.13) y10 max{pX2 ,Y2 (x2 , y2 )} x2 y2 (A.10) y2 max{pX2 ,Y2 (x2 , y2 )} y2 (A.9) (A.14) = 1 − P X2 = X̂2 (Y2 ) (A.15) = Pe (W2 ) (A.16) where we used for (A.9) that X2 , Y10 have the same joint distribution as X1 , Y1 and we used for (A.12) the inequality X X g(x, y) ≤ max g(x, y). (A.17) max x A.2 y y x Proof of Proposition 1.5 We first prove all equivalences without assuming an additional factor graph. Let fl and fr denote the functions that are represented by the left hand side and the right hand side in Figure 1.4, respectively. In Figure 1.4a, we have fl (x, y, z) = δz,x⊕y fr (x, z) = 1 (A.18) (A.19) For every pair (x, z), there is exactly one y such that x ⊕ y = z and therefore X fl (x, y, z) = 1 (A.20) y A.2 Proof of Proposition 1.5 139 which proves Figure 1.4a. In Figure 1.4b, we have fl (x, y, z) = δy,y0 δz,x⊕y fr (x, z) = δz,σ(x) where σ(x) = x ⊕ y 0 . We get X fl (x, y, z) = δz,x⊕y0 (A.21) (A.22) (A.23) y which proves Figure 1.4b. In Figure 1.4c, we have fl (x, y, z) = δx,y δy,z fr (x, z) = δx,z (A.24) (A.25) We get X fl (x, y, z) = δx,z (A.26) y which proves Figure 1.4c. In Figure 1.4d, we have fl (x, y, z) = δy,y0 δx,y δy,z (A.27) fr (x, z) = δx,y0 δz,y0 (A.28) We get X fl (x, y, z) = δx,y0 δy0 ,z (A.29) y which proves Figure 1.4d. In Figure 1.4e, we have fl (x, y) = δx,σ(y) fr (x) = 1 (A.30) (A.31) For every x, there is exactly one y such that x = σ(y) and therefore X fl (x, y) = 1 (A.32) y which proves Figure 1.4e. In Figure 1.4f, we have fl (x, y) = δy,y0 δx,σ(y) fr (x) = δx,σ(y0 ) (A.33) (A.34) 140 Proofs We get X fl (x, y) = δx,σ(y0 ) (A.35) y which proves Figure 1.4f. The second claim of this proposition follows readily by noting that an additional factor independent of y can be factored out of the sum over y. A.3 Proof of Proposition 3.1 Let j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 denote the binary representations of j and i, respectively. We define Jb , {o ∈ Zn |jo = b} and Ib , {o ∈ Zn |io = b} for b ∈ {0, 1}. The first condition of Lemma 3.2 implies that jo = iπ(o) for o ∈ Zn and thus π(J1 ) = I1 , (A.36) π(J0 ) = I0 . (A.37) Let t , wH (i) = wH (j). Let l0 , l1 , . . . , lt−1 and m0 , m1 , . . . , mt−1 denote the elements of J1 and I1 in ascending order, respectively. We first prove that lk ≤ mk for all k ∈ Zt by contradiction. Assume there exists a k ∈ Zt such that lk > mk and consider the four sets J1≥lk , {o ∈ J1 |o ≥ lk }, (A.38) I1≥lk , {o ∈ I1 |o ≥ lk }, (A.39) J0<lk I0<lk , {o ∈ J0 |o < lk }, (A.40) , {o ∈ I0 |o < lk }. (A.41) The assumption lk > mk implies ≥lk ≥lk I1 < J1 . (A.42) Then by (A.36) there exists a q ∈ J1≥lk such that π(q) < lk . Moreover, the assumption lk > mk implies <lk <lk (A.43) I0 < J0 . A.3 Proof of Proposition 3.1 141 Then by (A.37) there exists a q 0 ∈ J0<lk such that π(q 0 ) ≥ lk . Thus we have q > q0 , (A.44) π(q) < π(q 0 ). (A.45) 0 Let v , j − 2q + 2q . Since jq = 1 and jq0 = 0, the binary representation v0 , v1 , . . . , vn−1 of v is given by 1. vq = 0 2. vq0 = 1 3. For all o ∈ Zn \{q, q 0 } : vo = jo Moreover, from (A.44) follows that 0 ≤ v < j. But sπ−1 (v) = = = n−1 X vπ−1 (m) 2m m=0 n−1 X 0 vm0 2π(m ) m0 =0 n−1 X (A.47) ! jm0 2 m0 =0 π(q) =i−2 (A.46) π(m0 ) 0 − 2π(q) + 2π(q ) 0 + 2π(q ) (A.48) (A.49) and with (A.45) we get sπ−1 (v) > i which is a contradiction to the second condition of Lemma 3.2. Thus, we can conclude that lk ≤ mk for all k ∈ Zt . Algorithm 5 constructs a sequence a0 , a1 , . . . , ak as in (j) (i) Definition 3.6 which proves Wn 1 Wn . Algorithm 5 Generate a0 , a1 , . . . , ak according to Definition 3.6 1: a0 ← j 2: k ← 0 3: for o = t − 1 to 0 do 4: if lo < mo then 5: k ←k+1 6: ak ← ak−1 − 2lo + 2mo 7: end if 8: end for 142 Proofs A.4 Proof of Proposition 3.2 We need the following weight function w(i) , n−1 X ik k (A.50) k=0 where i0 , i1 , . . . , in−1 is the binary representation of i. If j % i then w(j) < w(i) and therefore we get Wn(j) ≺1 Wn(i) ⇒ w(j) < w(i). (A.51) For the ⇐ part, we need to prove (j) (i) (j) (m) 1. Wn ≺1 Wn , 2. Wn 1 Wn (j) (i) (m) ≺1 Wn ⇒ Wn = Wn . (i) ≺1 Wn . Assume by contradiction condition j % i implies (i) (m) (j) (m) there is a channel Wn such that Wn ≺1 Wn ≺1 Wn . By (j) Wn The that (A.51), we have w(j) < w(m) < w(i). But the weight w(.) is integer valued and there is no integer between w(j) and w(i) = w(j) + 1, which is a contradiction. (i) (j) Let us prove the ⇒ part. Since Wn is covered by Wn we have (j) (i) Wn ≺1 Wn . Let a0 , a1 , . . . , ak be a sequence in Definition 3.6 corre(j) (i) sponding to Wn ≺1 Wn . This sequence contains at least two elements, (j) (i) otherwise Wn = Wn . Moreover, it contains at most two elements, otherwise for any m ∈ {1, . . . , k − 1} Wn(j) ≺1 Wn(am ) ≺1 Wn(i) (j) (i) (A.52) which contradicts that Wn is covered by Wn . Therefore, the sequence consists of exactly two elements and hence we get j = a0 % a1 = i. It remains to prove that l0 = l + 1. Let k be the smallest integer larger than l such that ik = 1, which exists because il0 = 1. It follows that ik−1 = 0. The proof can be divided into the following three cases 1. Case 1: l = k − 1, k = l0 2. Case 2: l = k − 1, k < l0 3. Case 3: l < k − 1 A.5 Proof of Proposition 3.6 l 1 1 0 j m i 143 l0 0 1 1 k 1 0 1 (a) Case 2 j m i l 1 0 0 k−1 0 1 0 l0 0 0 1 (b) Case 3 Table A.1: The bits of j, m, i ∈ ZN with bit indexes l, k − 1, k, l0 ∈ Zn are shown for the cases 2 and 3. All bits of m not listed in the tables are equal to the bits of j (and i). In both cases, we have j % m % i. whereas the first case corresponds to l0 = l + 1. Table A.1 shows that for the cases 2 and 3 there exists an m ∈ ZN such that Wn(j) ≺1 Wn(m) ≺1 Wn(i) . (A.53) (i) (j) But this contradicts that Wn is covered by Wn and therefore l0 = l+1. A.5 Proof of Proposition 3.6 (j) (i) (j) We first prove the ⇒ part. Since Wn →c Wn we have Wn ≺c (i) Wn . Let a0 , a1 , . . . , ak be a sequence as in Definition 3.8 corresponding (j) (i) to Wn ≺c Wn . This sequence a0 , a1 , . . . , ak contains at least two (j) (i) elements, otherwise Wn = Wn . Moreover, it contains at most two elements, otherwise for any m ∈ {1, . . . , k − 1} Wn(j) ≺c Wn(am ) ≺c Wn(i) (j) (i) (A.54) which contradicts Wn →c Wn . Thus, the sequence consists of exactly (j) (i) (j) (i) two elements and hence we get either Wn →1 Wn or Wn →2 Wn . (j) (i) If Wn →2 Wn then it follows that there exists a bit index l ∈ Zn such that jl = 0, il = 1 and for all k ∈ Zn \{l} : jk = ik . We want to show that l 6= 0 leads to a contradiction which would conclude the proof of the ⇒ part. Hence, assume l 6= 0 and let us consider the following two cases 1. Case 1: j0 = 0 2. Case 2: j0 = 1. 144 Proofs j m i l 0 0 1 0 0 1 0 j m i (a) Case 1 l 0 1 1 0 1 0 1 (b) Case 2 Table A.2: The bits of j, m, i ∈ ZN with bit indexes 0, l ∈ Zn are shown for the cases 1 and 2. All bits of m not listed in the tables are equal to the bits of j (and i). (j) If j0 = 0, let m ∈ ZN be defined as in Table A.2a. It follows Wn →2 (m) (j) and hence Wn ≺c Wn . Moreover, we have m % i and hence (i) (m) (i) ≺1 Wn which implies Wn ≺c Wn . Hence, we found an m ∈ ZN such that (m) Wn (m) Wn Wn(j) ≺c Wn(m) ≺c Wn(i) (A.55) (i) (j) which contradicts Wn →c Wn . If j0 = 1, let m ∈ ZN be defined as in Table A.2b. It follows j % m (m) (j) (m) (j) which implies Wn ≺c Wn . Moreover, and hence Wn ≺1 Wn (i) (m) (i) (m) ≺c Wn . As before, this →2 Wn and hence Wn we have Wn (j) (i) contradicts Wn →c Wn and we can conclude l = 0 which proves the ⇒ part. For the ⇐ part, we need to prove (j) (i) (j) (m) 1. Wn ≺c Wn , 2. Wn c Wn (i) (j) (m) ≺c W n ⇒ W n = W n . (j) (i) Any of the two conditions in this proposition implies Wn ≺c Wn . We assume by contradiction that there is a m ∈ ZN such that Wn(j) ≺c Wn(m) ≺c Wn(i) . (j) (i) (A.56) If Wn →1 Wn then it follows wH (j) = wH (i) and therefore both ≺c relations in (A.56) can only be decomposed into →1 relations, which (j) (i) (j) (i) contradicts Wn →1 Wn . Now assume Wn →2 Wn with j0 = 0, i0 = 1. It follows that w(j) = w(i) with w(.) defined in (A.50) and therefore by (A.51) both ≺c relations in (A.56) can only be decomposed into →2 (j) (i) relations, which contradicts Wn →2 Wn . A.6 Proof of Proposition 4.3 A.6 145 Proof of Proposition 4.3 We get from (4.33) with P = B N0k−1 that e (i) = min wH 0, 0, . . . , 0, ũN −1 B N k−1 −1 V −1 D 0 i N N (A.57) N −1 ũ i ũi 6=0 = min wH 0, 0, . . . , 0, ũiN −1 VeN−1 B N0k−1 (A.58) = min wH 0, 0, . . . , 0, ũiN −1 VeN−1 (A.59) N −1 ũ i ũi 6=0 N −1 ũ i ũi 6=0 where we used (4.16) and the fact that the Hamming weight of a vector is independent of the order of its elements. We decompose VeN as in Figure 4.2. Without loss of generality, suppose that kN0 ≤ i ≤ (k + 1)N0 − 1. Then the set over which we minimize in (A.59) satisfies 1. ũj = 0 for 0 ≤ j ≤ kN0 − 1, 2. ũj can be chosen freely for (k + 1)N0 ≤ j ≤ N − 1. But VN0 and the permutations in Figure 4.2 are linear, bijective maps and thus they map zeros to zeros and the set F N0 to F N0 . Therefore we e (i) equals the minimal Hamming weight of all X e N −1 obtained get that D 0 N if in Figure 4.3 1. the values of all known symbols, except those corresponding to y0N −1 , are set to zero, ei , 2. the values of all open edges, except the one corresponding to U are allowed to take any value, ei is allowed to take 3. the value of the open edge corresponding to U any value except zero. It follows from (4.35) that at least i0 + 1 of the N0 connections between the VN0 and VeN 0 layers in Figure 4.3 are nonzero. By the definition of e partial distances, this can also be achieved with equality. Let X(j) be the N −1 0 e N variables of X0 that are directly linked to the connection j ∈ ZN0 through the transform VeN 0 . For every connection j with a zero value, if the remaining open edges of the VeN 0 transform are set to zero, then all e values of X(j) are zero. For every connection j with a nonzero value, at 146 Proofs (i0 ) e 0 values of X(j) e least D are forced to be nonzero by the definition of N 0 (i ) e 0 . Moreover, this can also be achieved with equality. To see this, let D N w 6= 0 denote the value of the connection j and let w̃iN0 w̃i0 6= 0 and 0 e (i 0) = wH 0, 0, . . . , 0, w̃N0 0 −1 Ve −1 D 0 i N N 0 −1 be such that (A.60) 0 e (i 0) . Then this implies which needs to exists by the definition of D N 0 N 0 −1 e −1 e (i 0) = wH ww̃−1 D 0, 0, . . . , 0, w̃ V (A.61) 0 0 0 i i N N and hence the vector 0, 0, . . . , 0, w · w̃i−1 · w̃iN0 −1 has the component 0 0 e (i 0) values of X(j) e indexed by i0 equal to w and results in exactly D N that are nonzero. Therefore we obtain (4.45) with equality and the proposition is proven. A.7 Proof of Proposition 5.1 If e corresponds to Ui , then Le = {i}. Either Si 6= N, and therefore N ∩ Le = ∅, or Si = N but then i is not in K and hence also not in (i) ∆KU (D, K) which proves Te = 0. Suppose e does not correspond to an Ui . Then the set Le can be partitioned into Lu and Ld and thus (j) (j) Te = max max ∆KU (D, K) ∩ Le , max ∆KU (D, K) ∩ Le j∈N ∩Lu j∈N ∩Ld (A.62) and it follows further that (j) (j) (j) ∆KU (D, K) ∩ Le = ∆KU (D, K) ∩ Lu + ∆KU (D, K) ∩ Ld . (A.63) (j) First, we determine ∆KU (D, K) ∩ Ld for j ∈ N ∩ Lu and for the two cases (Su , Sd ) = (N, K) and (Su , Sd ) 6= (N, K). Suppose first (Su , Sd ) = (N, K). Hence, all bits ULd are known but are not incorporated because Su = N: Ld ∩ KU (D, K) = ∅. (A.64) But because Su = N and by (5.4), at least one bit of ULu has state N. Thus, if the state of such a Uj is set to D then by (5.4), the state Su is A.8 Proof of Proposition 5.2 147 set to D as well and it follows that Ld ⊂ KU (D ∪ {j}, K). Thus, we have that (j) Ld ⊂ ∆KU (D, K). (A.65) If (Su , Sd ) 6= (N, K) then it is not difficult to show that no bit from ULd is additionally incorporated by setting any state Sj , j ∈ N ∩ Lu , to D. In summary, we have for any j ∈ N ∩ Lu that (j) 1. if (Su , Sd ) = (N, K), then |N ∩ Lu | > 0 and Ld ⊂ ∆KU (D, K), (j) 2. otherwise, ∆KU (D, K) ∩ Ld = ∅. With (A.63), it follows that ( Tu + |Ld |, if (Su , Sd ) = (N, K), j∈N ∩Lu Tu , otherwise. (A.66) By symmetry, the same holds if we exchange the roles of u and d in (A.66) and therefore with (A.62) we get max{Tu + |Ld |, Td }, if (Su , Sd ) = (N, K), (A.67) Te = max{Tu , |Lu | + Td }, if (Su , Sd ) = (K, N), max{Tu , Td }, otherwise. (j) max ∆KU (D, K) ∩ Le = Finally, if Sd = K, by (5.4) the states of all ULd equal K and thus N ∩ Ld is empty and it follows Td = 0. By the same argument, if Su = K then Tu = 0 and therefore (5.51) follows from (A.67). A.8 Proof of Proposition 5.2 A super-edge e in the FV of the FTFG is called a termination edge if 1. the sibling of e has state K, 2. Le ∩ K = ∅. An example is shown in Figure A.1. It follows that the state of a termination edge can not be K (otherwise Le ∩ K = Le ). Moreover, no termination edge e1 can be an ancestor or descendant of any other termination edge e2 and thus Le1 ∩ Le2 = ∅. (A.68) 148 Proofs U0 U1 U2 U3 K K G1 K N G1 2 K N G1 2 4 2 U4 U5 K K G1 K D G1 2 4 N D 7 D G1 8 X0 4 2 U6 U7 D N G1 2 Figure A.1: This example contains two termination edges which are drawn as dashed lines. We say the termination edge e1 is lower than the termination edge e2 if and only if i1 < i2 for all i1 ∈ Le1 , i2 ∈ Le2 . The importance of termination edges comes from the fact that all known bits are incorporated if and only if Se = D for all termination edges e. The known bits of the sibling of any termination edge e can only be incorporated if the state of at least one of the bits ULe equals D. It then follows by (A.68) that for every termination edge e we need to set a different Si to D in order to incorporate the known bits of the sibling of e. Thus, a set D that maximizes C(D, K) with |D| ≤ Dmax consists of exactly one element from every Le , e ∈ TDmax , where TDmax is the set of the Dmax lowest termination edges. The MKBLIS determines iMKBLIS if and only if Tr > 0, i.e. not all known bits are incorporated. The index iMKBLIS is determined such that if (Tu , Td ) 6= (0, 0) then we go along an edge with positive score. Therefore, at some point we reach a node with children edges u, d of states (Tu , Td ) equal to (N, K) or (K, N). But the score of the edge in state K is zero and the index iMKBLIS is determined such that we do not go along an edge in state K with zero score. Thus, we go along the edge with state N and we get (i ) ∆KU MKBLIS (D, K) 6= ∅ (A.69) A.8 Proof of Proposition 5.2 149 If the score of the edge in state N is zero, then this edge is a termination edge. If the score of the edge in state N is not zero, then we will later reach another node with children edges u, d of states (Tu , Td ) equal to (N, K) or (K, N). And so on, until the state of the edge in state N is zero. Thus, we have that iMKBLIS ∈ Le for some termination edge e with Se = N. Moreover, because we go up if Tu and Td are both positive, iMKBLIS ∈ Le where e is the lowest termination edge with Se = N. If |D| = 0 before the MKBLIS starts, then the MKBLIS repeatedly sets an Si to D from the set Le where e is the lowest termination edge with Se = N, which therefore maximizes C(D, K). If |D| > 0 before the MKBLIS starts, then let i ∈ D denote the index of the bit from UD with highest reliability. Let e denote the termination edge for which i ∈ Le before Si is set to K. After setting Si to K, e is not a termination edge anymore because i ⊂ Le ∩ K. After setting Si to K, the index iMKBLIS ∈ Le0 where e0 is the lowest termination edge with state N. If at most one new termination edge is created by setting Si to K, then clearly setting SiMKBLIS to D leads to a contiguous set of incorporated termination edges and thus C(D, K) is maximized. So we just need to prove that at most one new termination edge is created by setting Si to K. We split the proof into the two cases 1. e does not equal the edge corresponding to Ui 2. e equals the edge corresponding to Ui If e does not equal the edge corresponding to Ui , then after setting Si to K, the sibling of Ui is the only new termination edge. Note that then UiMKBLIS is the sibling of Ui and thus after setting SiMKBLIS to D, the new set of incorporated known bits is the set of incorporated known bits before setting Si to K plus Ui . If e equals the edge corresponding to Ui , the sibling of Ui needs to have state K because Ui was a termination edge. Thus after setting Si to K, the parent edge p of e has state K as well. Let p0 be the sibling of p. 1. If Sp0 = K, then the same discussion applies recursively to the parent of p0 and its sibling. 2. If Sp0 6= K, then either p0 is the only new termination edge or there is no new termination edge. Thus, the proposition is proven. 150 A.9 Proofs Proof of Proposition 5.3 Because the MKBS determines iMKBS by always going along the edge with higher score, we have (j) iMKBS ∈ argmax∆KU (D, K). (A.70) j∈N If |D| = 0 before the MKBS starts, then the MKBS repeatedly sets SiMKBS to D. Let i0 , i1 , . . . , iDmax −1 be the sequence of these indexes iMKBS and let Dt , {i0 , i1 , . . . , it−1 } where D0 , ∅. Suppose that the set D = D0 maximizes |KU (D, K)|. We show now that |KU (D0 , K)| = |KU (DDmax , K)|. By (A.70), we have that (j) i0 ∈ argmax∆KU (D0 , K). (A.71) j∈N We call the path from the root of the FTFG to the edge corresponding to Ui the path of i. Let i00 be an index in D0 that shares the longest path with i0 . Let e and e0 denote the edge in the path of i0 and i00 , respectively, where the two paths divide first. Let Te and Te0 be the score of the edge e and e0 , respectively, for D = D0 . Let D00 , D0 \{i00 } ∪ {i0 }. If we use D00 instead of D0 , then Te known bits are additionally incorporated and at most Te0 are additionally ignored. But Te ≥ Te0 because otherwise it would contradict (A.71). Thus, we obtain that |KU (D00 , K)| ≥ |KU (D0 , K)|. (A.72) But because |KU (D0 , K)| is maximal, we get equality in (A.72). One can use a similar argument repeatedly until we are left with a set Dc that equals DDmax and with |KU (Dc , K)| = |KU (D0 , K)|. If |D| > 0 before the MKBS starts, then let i ∈ D denote the index of the bit from UD with highest reliability. We split the proof into the two cases 1. the state of the sibling of Ui has state 6= K 2. the state of the sibling of Ui has state K In the first case, after setting Si to K, the index iMKBS is the index of the sibling of Ui . After setting SiMKBS to D, the known bits that are incorporated is augmented by Ui . And the only state that changed is Si (without propagation), thus we can not incorporate more bits by setting another state Sj to D, because then we would have set Sj to D before. In the second case, after setting Si to K, the state of the parent edge p of e becomes K as well. Let p0 be the sibling of p. A.9 Proof of Proposition 5.3 151 1. If Sp0 = K, then the same discussion applies recursively to the parent of p0 and its sibling. 2. If Sp0 = D, then the bits with index Lp are already incorporated (note that i ∈ Lp ). Moreover, the number of bits that could be additionally incorporated by setting the state of an Uj with Sj = N to D did not change. Thus we would choose the same D when we would restart the MKBS from scratch. We have already shown, that the MKBS maximizes |KU (D, K)| if |D| = 0 before the MKBS starts. 3. If Sp0 = N, then by setting the state to D of one of the bits Uj , j ∈ Lp0 , with Sj = N, incorporates at least Lp . Moreover, the number of bits that could be additionally incorporated by setting the state of an Uj with Sj = N to D changed only for indexes ∈ Lp0 and is larger than it was for Ui . Thus, iMKBS is an index ∈ Lp0 . As before, we would choose the same D when we would restart the MKBS from scratch. We have already shown, that the MKBS maximizes |KU (D, K)| if |D| = 0 before the MKBS starts. Thus, the proposition is proven. Bibliography [1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, 2009. [2] E. Arıkan and E. Telatar, “On the rate of channel polarization,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2009, pp. 1493 –1495. [3] M. Bardet, V. Dragoi, A. Otmani, and J. P. Tillich, “Algebraic properties of polar codes from a new polynomial formalism,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2016, pp. 230– 234. [4] R. E. Blahut, Algebraic Codes for Data Transmission. Cambridge University Press, 2003. [5] K. Chen, K. Niu, and J. Lin, “A reduced-complexity successive cancellation list decoding of polar codes,” in Proc. IEEE Vehicular Technology Conference (VTC Spring), Jun. 2013, pp. 1–5. [6] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Mathematics of Computation, vol. 19, no. 90, pp. 297–301, 1965. [7] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley, 2006. [8] B. A. Davey and H. A. Priestley, Introduction to Lattices and Order, 2nd ed. Cambridge University Press, 2002. [Online]. Available: http://dx.doi.org/10.1017/CBO9780511809088 153 154 Bibliography [9] G. D. Forney, Jr., “Codes on graphs: normal realizations,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 520–548, 2001. [10] J. Honda and H. Yamamoto, “Polar coding without alphabet extension for asymmetric channels,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2012, pp. 2147–2151. [11] N. Hussami, R. Urbanke, and S. B. Korada, “Performance of polar codes for channel and source coding,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2009, pp. 1488–1492. [12] A. Joseph and A. R. Barron, “Fast sparse superposition codes have near exponential error probability for R < C,” IEEE Trans. Inf. Theory, vol. 60, no. 2, pp. 919–942, Feb. 2014. [13] S. B. Korada, E. Şaşoğlu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds, and constructions,” IEEE Trans. Inf. Theory, vol. 56, no. 12, pp. 6253–6264, 2010. [14] S. B. Korada, A. Montanari, I. E. Telatar, and R. Urbanke, “An empirical scaling law for polar codes,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jun. 2010, pp. 884–888. [15] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, EPFL, 2009. [16] S. Kudekar, T. Richardson, and R. L. Urbanke, “Spatially coupled ensembles universally achieve capacity under belief propagation,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7761–7813, Dec. 2013. [17] S. L. Lauritzen, Graphical Models. Clarendon Press, 1996. [18] B. Li, H. Shen, and D. Tse, “An adaptive successive cancellation list decoder for polar codes with cyclic redundancy check,” IEEE Commun. Lett., vol. 16, no. 12, pp. 2044–2047, Dec. 2012. [19] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal Processing Magazine, vol. 21, no. 1, pp. 28–41, 2004. [20] D. J. C. MacKay and M. C. Davey, Codes, Systems, and Graphical Models. New York, NY: Springer New York, 2001, pp. 113–130. [Online]. Available: http://dx.doi.org/10.1007/978-1-4613-0165-3_ 6 Bibliography 155 [21] M. Mondelli, S. H. Hassani, and R. L. Urbanke, “From polar to Reed-Muller codes: A technique to improve the finite-length performance,” IEEE Trans. Commun., vol. 62, no. 9, pp. 3084–3091, 2014. [22] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memoryless channels,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2009, pp. 1496–1500. [23] ——, “Performance of polar codes with the construction using density evolution,” IEEE Commun. Lett., vol. 13, no. 7, pp. 519–521, 2009. [24] ——, “Channel polarization on q-ary discrete memoryless channels by arbitrary kernels,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jun. 2010, pp. 894–898. [25] ——, “Source and channel polarization over finite fields and reedsolomon matrices,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2720–2736, May 2014. [26] N. Presman, O. Shapira, and S. Litsyn, “Mixed-kernels constructions of polar codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2, pp. 239–253, Feb. 2016. [27] N. Presman, O. Shapira, S. Litsyn, T. Etzion, and A. Vardy, “Binary polarization kernels from code decompositions,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2227–2239, May 2015. [28] N. Rengaswamy and H. D. Pfister, “Cyclic polar codes,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jun. 2015, pp. 1287–1291. [29] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press, 2008. [Online]. Available: http: //dx.doi.org/10.1017/CBO9780511791338 [30] G. Sarkis, I. Tal, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Flexible and low-complexity encoding and decoding of systematic polar codes,” 2015. [Online]. Available: http://arxiv.org/abs/1507.03614 [31] C. Schürch, “A partial order for the synthesized channels of a polar code,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2016. 156 Bibliography [32] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, Jul. 1948. [33] ——, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 4, pp. 623–656, Oct. 1948. [34] N. Stolte, “Rekursive Codes mit der Plotkin-Konstruktion und ihre Decodierung,” Ph.D. dissertation, TU Darmstadt, 2002. [35] I. Tal and A. Vardy, “List decoding of polar codes,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Aug. 2011, pp. 1–5. [36] ——, “How to construct polar codes,” IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6562–6582, 2013. [37] R. Tolimieri, M. An, and C. Lu, Algorithms for Discrete Fourier Transforms and Convolution, 2nd ed. New York: Springer, 1997. [38] P. Trifonov, “Efficient design and decoding of polar codes,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3221–3227, 2012. Series in Signal and Information Processing edited by Hans-Andrea Loeliger Vol. 1: Hanspeter Schmid, Single-Amplifier Biquadratic MOSFET-C Filters. ISBN 3-89649-616-6 Vol. 2: Felix Lustenberger, On the Design of Analog VLSI Iterative Decoders. ISBN 3-89649-622-0 Vol. 3: Peter Theodor Wellig, Zerlegung von Langzeit-Elektromyogrammen zur Prävention von arbeitsbedingten Muskelschäden. ISBN 3-89649-623-9 Vol. 4: Thomas P. von Hoff, On the Convergence of Blind Source Separation and Deconvolution. ISBN 3-89649-624-7 Vol. 5: Markus Erne, Signal Adaptive Audio Coding using Wavelets and Rate Optimization. ISBN 3-89649-625-5 Vol. 6: Marcel Joho, A Systematic Approach to Adaptive Algorithms for Multichannel System Identification, Inverse Modeling, and Blind Identification. ISBN 3-89649-632-8 Vol. 7: Heinz Mathis, Nonlinear Functions for Blind Separation and Equalization. ISBN 3-89649-728-6 Vol. 8: Daniel Lippuner, Model-Based Step-Size Control for Adaptive Filters. ISBN 3-89649-755-3 Vol. 9: Ralf Kretzschmar, A Survey of Neural Network Classifiers for Local Wind Prediction. ISBN 3-89649-798-7 Vol. 10: Dieter M. Arnold, Computing Information Rates of Finite State Models with Application to Magnetic Recording. ISBN 3-89649-852-5 Vol. 11: Pascal O. Vontobel, Algebraic Coding for Iterative Decoding. ISBN 3-89649-865-7 Vol. 12: Qun Gao, Fingerprint Verification using Cellular Neural Networks. ISBN 3-89649-894-0 Vol. 13: Patrick P. Merkli, Message-Passing Algorithms and Analog Electronic Circuits. ISBN 3-89649-987-4 Vol. 14: Markus Hofbauer, Optimal Linear Separation and Deconvolution of Acoustical Convolutive Mixtures. ISBN 3-89649-996-3 Vol. 15: Sascha Korl, A Factor Graph Approach to Signal Modelling, System Identification and Filtering. ISBN 3-86628-032-7 Vol. 16: Matthias Frey, On Analog Decoders and Digitally Corrected Converters. ISBN 3-86628-074-2 Vol: 17: Justin Dauwels, On Graphical Models for Communications and Machine Learning: Algorithms, Bounds, and Analog Implementation. ISBN 3-86628-080-7 Vol. 18: Volker Maximillian Koch, A Factor Graph Approach to ModelBased Signal Separation. ISBN 3-86628-140-4 Vol. 19: Junli Hu, On Gaussian Approximations in Message Passing Algorithms with Application to Equalization. ISBN 3-86628-212-5 Vol. 20: Maja Ostojic, Multitree Search Decoding of Linear Codes. ISBN 3-86628-363-6 Vol. 21: Murti V.R.S. Devarakonda, Joint Matched Filtering, Decoding, and Timing Synchronization. ISBN 3-86628-417-9 Vol. 22: Lukas Bolliger, Digital Estimation of Continuous-Time Signals Using Factor Graphs. ISBN 3-86628-432-2 Vol. 23: Christoph Reller, State-Space Methods in Statistical Signal Processing: New Ideas and Applications. ISBN 3-86628-447-0 Vol. 24: Jonas Biveroni, On A/D Converters with Low-Precision Analog Circuits and Digital Post-Correction. ISBN 3-86628-452-7 Vol. 25: Georg Wilckens, A New Perspective on Analog-to-Digital Conversion of Continuous-Time Signals. ISBN 3-86628-469-1 Vol. 26: Jiun-Hung Yu, A Partial-Inverse Approach to Decoding ReedSolomon Codes and Polynomial Remainder Codes. ISBN 3-86628-527-2 Vol. 27: Lukas Bruderer, Input Estimation and Dynamical System Identification: New Algorithms and Results. ISBN 3-86628-533-7 Vol. 28: Sarah Neff, A New Approach to Information Processing with Filters and Pulses. ISBN 3-86628-575-2 Hartung-Gorre Verlag Konstanz http://www.hartung-gorre.de
© Copyright 2025 ExpyDoc