On Successive Cancellation Decoding of Polar - ETH E

Diss. ETH No. 23608
On Successive
Cancellation Decoding
of Polar Codes and
Related Codes
A dissertation submitted to
ETH Zurich
for the degree of
Doctor of Sciences
presented by
Christian Schürch
MSc ETH
born on December 23, 1985
citizen of Heimiswil (Bern)
accepted on the recommendation of
Prof. Dr. Hans-Andrea Loeliger, examiner
Prof. Dr. Henry D. Pfister, co-examiner
2016
Series in Signal and Information Processing
Vol. 29
Editor: Hans-Andrea Loeliger
Bibliographic Information published by Die Deutsche Nationalbibliothek
Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available in the internet at http://dnb.d-nb.de.
Copyright © 2017 by Christian Schürch
First Edition 2017
HARTUNG-GORRE VERLAG KONSTANZ
ISSN 1616-671X
ISBN-10: 3-86628-580-9
ISBN-13: 978-3-86628-580-4
Aerodynamically, the bumble bee shouldn’t be able to fly, but
the bumble bee doesn’t know it so it goes on flying anyway.
Mary Kay Ash
Acknowledgments
Special thanks go to my supervisor, Andi Loeliger. I very much appreciate the freedom he gave me, even though it was not always easy to
handle. His skills in motivating people are incredible. His ability to
teach ideas in a clear and concise way is even more incredible.
I would like to thank Henry Pfister for taking such an interest in my
work and for being my co-examiner. It was a pleasure to meet him and
to get his advice on how to make the most out of my visit to California.
Many thanks to all the colleagues at the ISI lab for providing a relaxing and fun atmosphere. I very much enjoyed playing tabletop soccer
(a.k.a. baby foot), playing AoFE, participating in the SOLA or doing
other lab activities. I also made some good friends at the lab for which
I am deeply grateful. Thank you, Rita, for your kindness and help.
I am indebted to Fabienne, who supported and motivated me when
I was frustrated or stressed. Without her I am not sure if I would have
finished this thesis.
Finally, I would like to thank my family for their continuous support
and for their love. I am so happy to have you in my life.
Although doing my PhD took a lot of hard work and effort, I am,
however, very glad to have gone through this experience.
v
Abstract
Polar codes with a successive cancellation decoder were historically the
first practical coding scheme that was able to reliably transmit digital
data at the highest possible data rate. This thesis gives new insights
on successive cancellation decoding of polar codes and related codes. It
consists of three parts.
The first part presents a partial order for the synthesized channels
of a polar code that is independent of the transmitting channel. This
partial order is combined with a partial order from the literature. A
minimal representation of this combined partial order is derived that
enables us to practically use the combined partial order for simplifying
the construction of polar codes.
The second part investigates successive cancellation decoding of codes
based on the Discrete Fourier Transform over finite fields. Successive
cancellation decoding in the natural symbol order is impractical for reasonable blocklength. However, based on the Cooley-Tukey Fast Fourier
Transform, we found that successive cancellation decoding in the digitreversed symbol order has significantly lower complexity but comes along
with slower polarization. However, for most codes, even this reduced
complexity is still impractical. We found that for the reasonably high
blocklength of 256, successive cancellation decoding in the digit-reversed
order is feasible. Simulations showed that the synthesized channels polarize. Moreover, for that blocklength, we found a way to choose the
frozen symbols of the Discrete Fourier Transform code such that its performance under successive cancellation list decoding is very well. However, the decoding complexity is still much higher than for other coding
schemes with comparable performance.
The third part introduces the fat-tree decoder, which is applicable to
polar codes and related codes and which is a generalization to the sucvii
viii
Abstract
cessive cancellation decoder. The fat-tree decoder does message passing
on a cycle-free factor graph for the polar code and thus computes exact posterior probabilities. The fat-tree decoder is able to decode the
information symbols of a polar code in any desired order. Moreover, it
has the ability to take into account any frozen symbols even from the
future symbols. However, to keep the complexity manageable, only a
limited number of frozen symbols can be considered and the symbol decoding order is not completely arbitrary. We present two specific fat-tree
decoder strategies that perform substantially better than the successive
cancellation decoder and have manageable complexity. Moreover, these
strategies approach block maximum a posteriori performance for polar
codes of many parameters. The high flexibility of the fat-tree decoder
suggests that even better tradeoffs between performance and decoding
complexity exist.
Keywords: successive cancellation; polar codes; factor graphs; partial order; code construction; Discrete Fourier Transform; Fast Fourier
Transform; cycle-free; maximum likelihood decoding; maximum a posteriori decoding
Kurzfassung
Polar Codes mit einem Successive Cancellation Dekodierer waren historisch gesehen die erste praktikable Kodierungsmethode, die digitale Daten zuverlässig mit der höchstmöglichen Rate übertragen konnte. Diese
Dissertation gibt neue Einblicke in das Successive Cancellation Dekodieren von Polar und verwandten Codes. Sie besteht aus drei Teilen.
Der erste Teil präsentiert eine Halbordnung für die synthetischen
Kanäle eines Polar Codes, die unabhängig vom Übertragungskanal ist.
Diese Halbordnung wird kombiniert mit einer anderen Halbordnung aus
der Literatur. Eine minimale Darstellung dieser kombinierten Halbordnung wird hergeleitet, welche es uns ermöglicht die kombinierte Halbordnung praktisch zu nutzen um die Konstruktion eines Polar Codes zu
vereinfachen.
Der zweite Teil untersucht das Successive Cancellation Dekodieren
von Codes basierend auf der Diskreten Fourier-Transformation über endliche Körper. Das Successive Cancellation Dekodieren in der natürlichen
Symbol Reihenfolge ist unpraktikabel für sinnvolle Blocklängen. Basierend auf der schnellen Fourier-Transformation nach Cooley-Tukey, zeigen
wir, dass das Successive Cancellation Dekodieren in Ziffer-umgekehrter
Symbol Reihenfolge eine bedeutend tiefere Komplexität besitzt, jedoch
auch eine langsamere Polarisierung mit sich zieht. Trotzdem ist für viele Codes auch diese reduzierte Komplexität immer noch unpraktikabel. Jedoch ist das Successive Cancellation Dekodieren in der Zifferumgekehrten Symbol Reihenfolge praktikabel für die vernünftig hohe
Blocklänge von 256. Simulationen haben gezeigt, dass für diese Blocklänge die synthetischen Kanäle polarisieren. Wir haben ausserdem einen
Weg gefunden um die eingefrorenen Symbole des Diskreten FourierTransformations Codes so zu wählen, dass dessen Blockfehlerwahrscheinlichkeit für den Successive Cancellation Dekodierer mit Liste sehr gut ist.
ix
x
Kurzfassung
Jedoch ist die Dekodierkomplexität um einiges höher als für andere Codes mit vergleichbarer Blockfehlerwahrscheinlichkeit.
Der dritte Teil führt den Fetter-Baum Dekodierer ein, der sowohl auf
Polar wie auch auf verwandte Codes anwendbar ist und der den Successive Cancellation Dekodierer verallgemeinert. Der Fetter-Baum Dekodierer rechnet auf einem kreisfreien Faktorgraphen und berechnet daher exakte A-Posteriori-Wahrscheinlichkeiten. Der Fetter-Baum Dekodierer ist
imstande die Informationssymbole eines Polar Codes in jeder gewünschten Reihenfolge zu dekodieren. Er ist ausserdem imstande beliebige eingefrorene Symbole miteinzubeziehen. Damit jedoch die Komplexität immer noch zu bewältigen ist, kann nur eine limitierte Anzahl an eingefrorenen Symbolen miteinbezogen werden und die Dekodier Reihenfolge ist
nicht beliebig. Zwei konkrete Fetter-Baum Dekodier Strategien werden
präsentiert, die eine bedeutend tiefere Blockfehlerwahrscheinlichkeit als
der Successive Cancellation Dekodierer haben und die eine praktikable
Komplexität besitzen. Beide Strategien erreichen die Blockfehlerwahrscheinlichkeit eines Block-Maximum-A-Posteriori Dekodierers für viele
Parameter. Die hohe Flexibilität des Fetter-Baum Dekodierers legt nahe,
dass Strategien mit einem besseren Kompromiss zwischen Komplexität
und tiefer Blockfehlerwahrscheinlichkeit existieren.
Stichworte: Successive Cancellation; Polar Codes; Faktorgraphen;
Halbordnung; Code Konstruktion; Diskrete Fourier Transformation;
Schnelle Fourier Transformation; kreisfrei; Maximum-Likelihood Dekodierer; Maximum-A-Posteriori Dekodierer
Contents
Abstract
vii
Kurzfassung
ix
1 Introduction
1.1 Polar Codes . . . . . . . . . .
1.2 Contributions . . . . . . . . .
1.3 Notation . . . . . . . . . . . .
1.4 Important Concepts . . . . .
1.4.1 Maximum a Posteriori
1.4.2 Channels . . . . . . .
1.4.3 Factor Graphs . . . .
1.4.4 Permutations . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
. 2
. 3
. 5
. 6
. 6
. 8
. 11
. 17
2 Polar Codes
2.1 Gn -Codes . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Probability Setup . . . . . . . . . . . . . . . . . . . .
2.3 Genie-Aided Successive Cancellation Decoding . . . . . .
2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 The Tree Factor Graph for the Synthesized Channels
2.3.3 Max-Product on the Tree Factor Graph . . . . . .
2.4 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Definition of Polar Codes . . . . . . . . . . . . . .
2.4.2 Construction of Polar Codes . . . . . . . . . . . .
2.5 Successive Cancellation Decoder . . . . . . . . . . . . . .
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Complexity . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Successive Cancellation and Block MAP . . . . . .
2.6 Characteristics . . . . . . . . . . . . . . . . . . . . . . . .
xi
19
19
21
21
22
23
27
28
28
29
29
29
29
30
31
xii
Contents
2.7
Performance Improvements . . . . . . . . . . . . . . . . . 31
2.7.1 Improve the Decoder . . . . . . . . . . . . . . . . . 31
2.7.2 Improve the Code . . . . . . . . . . . . . . . . . . 32
3 A Partial Order for the Synthesized Channels of SCD
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .
3.2 A Sufficient Condition for Degradation . . . . . . . . . .
3.3 A Partial Order . . . . . . . . . . . . . . . . . . . . . . .
3.4 Another Partial Order from the Literature . . . . . . . .
3.5 The Combined Partial Order . . . . . . . . . . . . . . .
3.6 Code Construction with the Combined Partial Order . .
3.7 Generalizations . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
35
36
38
40
43
46
48
51
4 Successive Cancellation Decoding of DFT-Codes
4.1 Discrete Fourier Transform Codes . . . . . . . . . .
4.2 Digit Representation . . . . . . . . . . . . . . . . .
4.3 The Cooley-Tukey Fast Fourier Transform . . . . .
4.4 Successive Cancellation . . . . . . . . . . . . . . .
4.4.1 Partial Distances . . . . . . . . . . . . . . .
4.4.2 Natural Symbol Order . . . . . . . . . . . .
4.4.3 Digit-Reversed Symbol Order . . . . . . . .
4.5 An Example for the Blocklength 256 . . . . . . . .
4.5.1 Polar DFT-Code . . . . . . . . . . . . . . .
4.5.2 Improved Polar DFT-Code . . . . . . . . .
4.5.3 Doubly Improved Polar DFT-Code . . . . .
4.6 Conclusion and Outlook . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
55
56
57
59
61
61
61
62
68
69
72
72
72
5 A Generalized SCD with Look-Ahead
5.1 Abbreviations and Symbols . . . . . . . . . . . . . . . . .
5.2 Cycle-Free Factor Graphs for the Polar Transform . . . .
5.3 Fat-Tree Decoder . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 States . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Reduced Graph . . . . . . . . . . . . . . . . . . . .
5.3.3 Message Passing . . . . . . . . . . . . . . . . . . .
5.3.4 Decoding Process . . . . . . . . . . . . . . . . . . .
5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Successive Cancellation . . . . . . . . . . . . . . .
5.4.2 SCD with Different Bit Order . . . . . . . . . . . .
5.4.3 Block Maximum a Posteriori . . . . . . . . . . . .
5.4.4 The Benefits of Future Bits: A Numerical Example
79
80
81
81
84
84
86
92
94
94
95
95
103
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
5.5
5.6
5.7
5.8
xiii
Ideas for Good Strategies . . . . . . . . . . . . . . . .
5.5.1 “Decode” Frozen Bits First . . . . . . . . . . .
5.5.2 Taking Decisions: Benefits and Drawbacks . . .
5.5.3 Choose the Bits to Decode with Close Indexes
Implemented Strategies . . . . . . . . . . . . . . . . .
5.6.1 Scores . . . . . . . . . . . . . . . . . . . . . . .
5.6.2 Most Known Bits of Lowest Index Strategy . .
5.6.3 Most Known Bits Strategy . . . . . . . . . . .
Simulation Results . . . . . . . . . . . . . . . . . . . .
5.7.1 MKBLIS and MKBS . . . . . . . . . . . . . . .
5.7.2 Sum-Product and Max-Product . . . . . . . . .
5.7.3 Varying the Rate . . . . . . . . . . . . . . . . .
5.7.4 Varying the Blocklength . . . . . . . . . . . . .
5.7.5 Experiments . . . . . . . . . . . . . . . . . . .
5.7.6 Comparison to the SC List Decoder . . . . . .
Conclusion and Outlook . . . . . . . . . . . . . . . . .
A Proofs
A.1 Proof
A.2 Proof
A.3 Proof
A.4 Proof
A.5 Proof
A.6 Proof
A.7 Proof
A.8 Proof
A.9 Proof
Bibliography
of
of
of
of
of
of
of
of
of
Proposition
Proposition
Proposition
Proposition
Proposition
Proposition
Proposition
Proposition
Proposition
1.2
1.5
3.1
3.2
3.6
4.3
5.1
5.2
5.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
104
104
104
107
107
107
108
113
114
115
121
121
121
125
128
131
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
137
137
138
140
142
143
145
146
147
150
153
List of Figures
1.1
1.2
1.3
1.4
Digital data transmission . . .
Factor graph examples . . . . .
Message passing . . . . . . . .
Some factor graph equivalences
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1
. 11
. 12
. 16
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
The factor graph notation of G1 . . . .
The recursive construction of Gn . . . .
The decomposition of G3 . . . . . . . .
A factor graph for the probability setup
Useful factor graph equivalences for G1 .
(i)
. .
A factor graph for the channel Wn
(i)
Intermediate factor graph for Wn . . .
(4)
The tree factor graph of W3 . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
20
20
22
24
25
25
26
3.1
3.2
3.3
3.4
3.5
3.6
The Hasse-diagram of the PO 1 for N = 16 . . . . . .
The Hasse-diagram of the PO 1 for N = 64, wH (i) = 3
The Hasse-diagram of the PO c for N = 16 . . . . . .
Bounds on the union bound by the Maximin strategy .
A generalization of G2 . . . . . . . . . . . . . . . . . . .
Two tree factor graphs for the generalized G2 . . . . . .
.
.
.
.
.
.
43
44
47
50
51
52
4.1
4.2
4.3
4.4
4.5
4.6
f (i) . . . . . . . . . . . .
A factor graph for the channel W
N
Decomposition of the digit-reversed DFT transform . . . .
f (i) . . . . . . . . . . . . .
Intermediate factor graph for W
N
f (5) . . . . . . . . . . . . . . .
The tree factor graph for W
12
f (i) for N = 256
The mutual information of the channels W
N
f (i)
The digit-reversed mutual information of the channels W
N
xv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
65
66
69
70
xvi
List of Figures
4.7
4.8
4.9
4.10
Block error probability of the polar DFT-code . . . . . . .
The polar DFT-code compared to polar codes . . . . . . .
Block error probability of the improved polar DFT-code .
Error probability of the doubly improved polar DFT-code
71
73
74
75
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21
5.22
5.23
5.24
5.25
5.26
5.27
5.28
5.29
5.30
5.31
5.32
5.33
5.34
5.35
A factor graph of Gn with both copies of Gn−1 superposed 82
Two cycle-free factor graphs of G8 . . . . . . . . . . . . . . 83
The fat-tree factor graph for G8 with states . . . . . . . . 85
Message passing through an i-node of G1 . . . . . . . . . 86
The FV of the decoding factor graph of Figure 5.3. . . . . 89
A factor graph equivalent to the factor graph in Figure 5.5 90
Allowed state transitions . . . . . . . . . . . . . . . . . . . 93
The FV of the RG at time step 1 in Table 5.3. . . . . . . 96
The FV of the RG at time step 2 in Table 5.3. . . . . . . 96
The FV of the RG at time step 3 in Table 5.3. . . . . . . 97
The FV of the RG at time step 4 in Table 5.3. . . . . . . 97
The FV of the RG at time step 5 in Table 5.3. . . . . . . 98
The FV of the RG at time step 1 in Table 5.4. . . . . . . 99
The FV of the RG at time step 2 in Table 5.4. . . . . . . 99
The FV of the RG at time step 3 in Table 5.4. . . . . . . 100
The FV of the RG at time step 4 in Table 5.4. . . . . . . 100
The FV of the RG at time step 5 in Table 5.4. . . . . . . 101
The FV of the RG at time step 1 in Table 5.5. . . . . . . 102
The FV of the RG at time step 2 in Table 5.5. . . . . . . 102
The FV of the RG at time step 3 in Table 5.5. . . . . . . 103
An example showing the benefit of taking reliable decisions106
The FV of the FTFG at time step 0 in Table 5.7 . . . . . 110
The FV of the FTFG as in Figure 5.22 but with S1 = D . 111
The FV of the FTFG as in Figure 5.23 but with S2 = D . 111
Simulation results: MKBLIS and α = 1.0 . . . . . . . . . 116
Simulation results: MKBS and α = 1.0 . . . . . . . . . . . 117
Simulation results: MKBLIS and α = 0.7 . . . . . . . . . 118
Simulation results: MKBS and α = 0.7 . . . . . . . . . . . 119
Simulation results: MKBS in dependence of α . . . . . . . 120
Simulation results: Max-product message passing . . . . . 122
Simulation results: Low rate . . . . . . . . . . . . . . . . . 123
Simulation results: High rate . . . . . . . . . . . . . . . . 124
Simulation results: Low blocklength . . . . . . . . . . . . 126
Simulation results: High blocklength . . . . . . . . . . . . 127
Simulation results: First modified MKBS . . . . . . . . . 129
List of Figures
5.36
5.37
5.38
5.39
Simulation
Simulation
Simulation
Simulation
xvii
results:
results:
results:
results:
Second modified MKBS . . . . . . . . 130
Successive Cancellation List Decoder . 132
Comparison to SCLD . . . . . . . . . 133
Comparison to SCLD, high blocklength134
A.1 An example for termination edges
. . . . . . . . . . . . . 148
Chapter 1
Introduction
This thesis deals with practical schemes for the reliable transmission of
digital data over a noisy transmission link as depicted in Figure 1.1. A
transmitter wants to convey a message U , let us assume a sequence of
K symbols, to a receiver. The transmitter encodes the K symbols into
the sequence X containing N symbols and transmits them over N uses
of a noisy transmission link. The noisy transmission link is referred to
as the transmitting channel. It is a statistical model of what happens
to the sent signal before it is received. The receiver observes the output
sequence Y of the transmitting channel and generates an estimate Û of
the message U , which is called decoding. The goal is to reliably transmit
as many symbols per channel use as possible with practical encoding and
decoding operations.
The positive integer N is called the blocklength. The rate is defined
as R , K/N . The code is defined as the set of all sequences X of length
N that the encoder is able to produce.
Shannon showed in his seminal papers [32], [33] that there exists
a fundamental limit on the number of symbols per channel use that
can be reliably sent over a transmitting channel. He called this limit
U
X
Encoder
Y
Channel
Û
Decoder
Figure 1.1: A simplified representation of digital data transmission.
1
2
Introduction
the capacity of the transmitting channel. However, he did not give a
practical scheme that achieves capacity. Since then a huge number of
ideas have been created towards this goal. Some ideas led to practical
schemes that empirically got very close to capacity, as for example turbo
or low-density parity-check codes [29]. However, it took a long time until
the first practical scheme was invented that provably achieved capacity
on the important subclass of binary-input discrete memoryless channels:
polar codes under successive cancellation decoding (SCD) [1], [10].
In the meantime, other capacity-achieving codes with a practical encoder and decoder have been found: In [16] it was shown that also
spatially coupled low-density parity-check codes achieve the capacity of
symmetric binary-input discrete memoryless channels. Moreover, sparse
superposition codes [12] achieve capacity on the additive white Gaussian
noise channel. However, these codes are not considered in this thesis.
1.1
Polar Codes
Polar codes were introduced by Arıkan [1]. They were the first provably
capacity-achieving codes for binary-input discrete memoryless channels
with a low-complexity encoder and decoder. Although the rule to construct a polar code was already introduced and investigated in [34], the
proof that polar codes achieve the symmetric capacity was first given by
Arıkan [1].
Binary polar codes work as follows. Suppose that K out of the N
bits u0 , u1 , . . . , uN −1 contain the message bits and the remaining N − K
bits are fixed and known to both the transmitter and the receiver. A linear, bijective transform Gn is applied to u0 , u1 , . . . , uN −1 to get the bits
x0 , x1 , . . . , xN −1 , which are then sent over N uses of the transmitting
channel. After receiving the output sequence y0 , y1 , . . . , yN −1 , the SCD
successively decodes the bits u0 , u1 , . . . , uN −1 in the order of ascending
index. When decoding ui the SCD assumes the already decoded bits
u0 , u1 , . . . , ui−1 to be known and the future bits ui+1 , . . . , uN −1 to be
completely unknown. Encoding and SCD of polar codes can be performed with low complexity.
The set of K bits from the N bits u0 , u1 , . . . , uN −1 that are used
to transmit the message is determined as follows. Arıkan defined for
every ui a synthesized binary-input channel that corresponds to SCD of
ui . He showed that these N synthesized channels converge either to a
noiseless or a pure noise channel as the blocklength N goes to infinity.
1.2 Contributions
3
This phenomenon was called polarization. The bits corresponding to the
K best synthesized channels are used to transmit the message and if K
is not larger than the number of almost noiseless channels they have a
small error probability.
Although having a practical scheme that achieves capacity was a
milestone in coding theory, being capacity-achieving is a property that
holds asymptotically as the blocklength N goes to infinity. In practice,
the blocklength N is a finite quantity and it is important that the block
error probability is low at practical blocklengths. Polar codes under SCD
are not good at practical blocklengths [35]. As the SCD is not an optimal
decoder, it is worthwhile to search for practical decoders that improve
the performance of polar codes. The successive cancellation list decoder
(SCLD) from [35] is a promising practical decoder that substantially
reduces the block error probability of polar codes. The SCLD empirically approaches the performance of optimal decoding of polar codes
for many parameters. However, even optimal decoding of polar codes is
still not very good. Thus, to make polar codes competitive, polar codes
themselves need to be improved as well.
Several approaches to improve polar codes have been considered so
far. A well-performing but still simple approach was given in [35], where
a high-rate outer cyclic redundancy check code was added with the consequence that the performance under SCLD was substantially improved.
With these modifications, polar codes are competitive.
1.2
Contributions
The contributions of this thesis can be divided into the following three
parts.
A Partial Order for the Synthesized Channels (Chapter 3)
Code construction of polar codes consists of finding the K best among
all N synthesized channels of a polar code, which is dependent on the
transmitting channel. This is not a trivial problem and hence it is desirable to simplify it. We found a partial order on the set of N synthesized
channels that is independent of the transmitting channel and that orders
the synthesized channels according to their quality.
This partial order is combined with a partial order from the literature. The combined partial order is quite powerful. For example for the
4
Introduction
blocklength N = 220 , there exists a synthesized channel such that according to the combined partial order, 23.06% of all synthesized channels
are better and 23.07% are worse than this synthesized channel.
A minimal representation of the combined partial order is derived
that enables us to practically use the partial order for simplifying code
construction of polar codes. An example shows that a code construction
algorithm based on the combined partial order can significantly reduce
the number of synthesized channels for which the quality needs to be
determined.
SCD of Discrete Fourier Transform Codes (Chapter 4)
Codes based on the Discrete Fourier Transform are an important class
of codes, including the widely-used Reed-Solomon codes. However, SCD
of Discrete Fourier Transform codes in the natural symbol order is prohibitively complex for all except the shortest blocklengths. Based on the
Cooley-Tukey Fast Fourier Transform, we found a symbol order, called
the digit-reversed order, such that the SCD complexity is substantially
reduced. But this comes along with slower polarization. However, exact
SCD of Discrete Fourier Transform codes in the digit-reversed symbol
order is still only feasible if N can be factorized into small factors.
A specific example with a reasonably high blocklength of N = 256 is
given for which SCD in the digit-reversed symbol order is feasible. As for
standard polar codes, we empirically found that many of the synthesized
channels polarize to almost noiseless or almost useless channels. Simulation results for this blocklength show that under SCLD the constructed
Discrete Fourier Transform code has comparable performance to a polar
code, however with much higher decoding complexity. We substantially
improved the performance under SCLD by altering the rule to pick the
information symbols. Although this resulted in a block error probability
comparable to the well-performing polar code with outer cyclic redundancy check code from [35], the decoding complexity is still much higher.
A Generalized Successive Cancellation Decoder (Chapter 5)
SCD of polar codes is mainly suboptimal because when decoding ui , the
bits among ui+1 , . . . , uN −1 that are known by the receiver are assumed to
be unknown. We found two cycle-free factor graphs for the polar transform Gn . We defined the fat-tree decoder which does message passing
on a reduced version of one of the cycle-free graphs and which generalizes SCD. The fat-tree decoder is not only applicable to polar codes but
1.3 Notation
5
also to all codes based on Gn or based on transforms having a similar
structure as Gn . The fat-tree decoder is able to use the values of any
known bits even from the future bits ui+1 , . . . , uN −1 . Moreover, decoding can be performed in any desired bit order, not only in ascending
index order. However, to keep the complexity manageable, only a limited amount of fixed bits can be incorporated and the bit decoding order
is not completely arbitrary.
The flexibility of the fat-tree decoder is huge and thus it is a very
interesting problem to find good decoding strategies. We proposed two
decoding strategies with adjustable decoding complexity. For many parameters, both strategies are able to approach optimal decoding of polar
codes with manageable complexity. For other Gn -codes, the block error
probability is significantly lower than for SCD even though the performance of optimal decoding is not always reached with manageable complexity. A comparison to the SCLD shows an advantage for the SCLD.
However, our generalized SCD is very flexible and hence it is promising
to search for better strategies or for codes that are better matched to
the fat-tree decoder.
We think that the presented cycle-free factor graphs are a suitable
and intuitive way of teaching polar codes because the structure of polar
codes stands out clearly.
1.3
Notation
For an integer k, we denote by Zk the set {0, 1, . . . , k − 1} which is
empty if k ≤ 0. The cardinality of a finite set A is denoted by |A|. For
a function f with domain A and for B ⊂ A, we denote by f (B) the set
{f (b)|b ∈ B} and by argmaxb∈B f (b) the element b ∈ B that maximizes
f (b) where ties are broken arbitrarily. P
For a variable a taking values in
the finite set A, we use the notation a , maxa and argmaxa to mean
that these operators are performed over all a ∈ A.
We denote by dxe the ceiling (i.e. the smallest integer not smaller
than x) of the real number x and by bxc the flooring of x (i.e. the largest
integer not larger than x). We use the standard Landau notation O(.)
to denote an asymptotic upper bound on the growth rate of a function.
We use uppercase particularly but not exclusively for matrices and
random variables.
We denote by uk−1
the row vector (u0 , u1 , . . . , uk−1 ) and by uji for
0
0 ≤ i, j < k the vector (ui , ui+1 , . . . , uj ), which is empty if i > j. For
6
Introduction
A ⊂ Zk , we denote by uA the row vector (ui |i ∈ A). The element at
row i and column j of a matrix A is denoted by [A]i,j , where the indexes
i, j start both at 0. We denote by Im the m × m identity matrix. The
Kronecker product of a k × l matrix A and a m × q matrix B is denoted
by A ⊗ B and defined by


[A]0,0 B
...
[A]0,l−1 B


..
..
..
A⊗B ,
(1.1)

.
.
.
[A]k−1,0 B . . . [A]k−1,l−1 B
which is a km × lq matrix. For any positive integer n, the Kronecker
power A⊗n is defined by
A⊗n , A ⊗ A ⊗ . . . ⊗ A .
{z
}
|
n times
(1.2)
We denote by P (A) the probability of an event A. We denote by
p X |Y (x|y) the distribution of the random variable X given Y = y which
is often abbreviated to p(x|y).
The Hamming weight wH (i) of a non-negative integer i is defined as
the number
binary expansion of i. The Hamming weight
of ones in the
k−1
wH uk−1
of
a
vector
u
with elements in a finite field denotes the
0
0
number of elements of uk−1
that
are different from 0.
0
1.4
Important Concepts
There are some well-known concepts that we use over and over again in
this thesis. We discuss these concepts here by giving a compact summary.
1.4.1
Maximum a Posteriori
U0k−1
Let
be k discrete random variables and let Y be a random variable
that is observed. The block maximum a posteriori (block MAP) decision
of U0k−1 based on Y is defined as
y .
ûk−1
(1.3)
(y) = argmax p uk−1
0
0
uk−1
0
In general, dropping or adding a variable in the block MAP decision
(1.3) may change the decision. The block MAP decision minimizes the
block error probability
P U0k−1 6= Û0k−1 (Y )
(1.4)
1.4 Important Concepts
7
independently of how ties are broken in (1.3). The symbol-wise block
MAP decision of U0k−1 based on Y is defined by
y
ûi (y) = argmax
max
p uk−1
(1.5)
0
ui
u0 ,...,ui−1 ,ui+1 ,...,uk−1
for all i ∈ Zk . If there is a unique maximizer in (1.3) then performing
(1.5) for all i ∈ Zk is equivalent to (1.3) and is often used in practice
instead of (1.3). However, if there are several maximizers in (1.3) then
(1.5) is not equivalent to (1.3) in general. Interestingly, the example
below shows that the
rule (1.5) could even produce decisions that do not
y .
maximize p uk−1
0
Example 1.1. Let U0 , U1 be binary random variables and let
(
1/2, if u0 = u1 ,
p(u0 , u1 |y) =
0,
otherwise.
(1.6)
Then the rule (1.3) produces either (0, 0) or (1, 1). However, the rule
(1.5) applied to i = 0, 1 could produce all possibilities (0, 0), (1, 0), (0, 1)
or (1, 1). Clearly, the possibilities (u0 , u1 ) = (1, 0) and (u0 , u1 ) = (0, 1)
do not maximize p(u0 , u1 |y).
Thus, we have to be careful in applying (1.5). However, the symbolwise decisions obtained by Algorithm 1 are block MAP decisions.
Algorithm 1 Successive symbol-wise block MAP
1: for i = 0 to k − 1 do
k−1 2:
ûi (y) ← argmaxui maxui+1 ,...,uk−1 p ûi−1
y
0 (y), ui
3: end for
It is often not feasible to compute the block MAP decisions. However, it is very useful to know the block error probability of a block
MAP decoder as a benchmark. Suppose a practical decoder produces
the decisions ũk−1
(y). With the procedure that was used in [35], we can
0
obtain an empirical lower bound on the block error probability of the
block MAP decoder:
1. Generate m independent samples uk−1
(i), yi , i ∈ Zm , from the
0
distribution pU k−1 ,Y uk−1
,y .
0
0
2. Compute the decisions of the practical decoder ũk−1
(yi ) for every
0
i ∈ Zm .
8
Introduction
3. The number of samples where
(i) yi
(yi ) yi > p U k−1 | Y uk−1
p U k−1 | Y ũk−1
0
0
0
0
(1.7)
divided by m is the empirical lower bound.
Note that (1.7) implies that there is at least one uk−1
which has a higher
0
probability than uk−1
(i) and hence the block MAP decoder in (1.3)
0
would not have decided for the correct values uk−1
(i).
0
The MAP decision of Ui based on Y is defined as
X
y
ûi (y) = argmax
p uk−1
(1.8)
0
ui
u0 ,...,ui−1 ,ui+1 ,...,uk−1
which is equal to
ûi (y) = argmax p(ui |y).
(1.9)
ui
The MAP decision minimizes the error probability of Ui
P Ui 6= Ûi (Y ) .
1.4.2
(1.10)
Channels
A channel is usually defined as an input alphabet X , an output alphabet
Y and transition probabilities p Y |X (y|x), x ∈ X , y ∈ Y [7, pp. 183-184].
However, this thesis is about polar codes, where channels naturally arise
as pairs of random variables, the input X and the output Y . Therefore,
we use this unusual definition of the term channel. A channel with input
X and output Y is denoted by X → Y . Note that with this definition,
a prior on the input pX (x) is implicitly included in a channel as in the
following example.
Example 1.2 (Uniform and Binary-Input Additive White Gaussian
Noise Channel). Let X be uniformly distributed over Z2 . Let Z be
independent of X and Gaussian distributed with mean 0 and variance
σ 2 . The uniform and binary-input additive white Gaussian
noise chan
nel with noise variance σ 2 , denoted as UBAWGNC σ 2 , is the channel
X → Y with Y , µ(X) + Z where µ is the mapping
(
1,
if x = 0,
µ(x) ,
(1.11)
−1, if x = 1.
1.4 Important Concepts
9
The signal-to-noise ratio (SNR) of the UBAWGNC σ 2 is defined as the
ratio between the variance of µ(X) to the variance of the noise Z and
hence equals 1/σ 2 .
Definition 1.1 (MAP Decision of a Channel). The MAP decision of
the channel X → Y is the MAP decision of X based on Y .
Let I(X; Y ) and H(X |Y ) denote the mutual information and the
conditional entropy, respectively, as defined in [7, Ch. 2].
Definition 1.2 (Channel Parameters). The mutual information of the
channel W , X → Y is defined by
I(W ) , I(X; Y ),
(1.12)
H(W ) , H(X |Y ),
(1.13)
the entropy of W by
and the error probability of W by
Pe (W ) , P X 6= X̂(Y ) ,
(1.14)
where X̂(Y ) is the MAP decision of W .
In the following, our notation assumes that the output alphabet of
a channel is discrete. However, the case where it is continuous can be
handled analogously.
Definition 1.3 (Stochastic Degradation). Let W1 and W2 be the channels X1 → Y1 and X2 → Y2 , respectively and let both channels share the
same input alphabet. We say W1 is stochastically degraded with respect
to W2 , denoted as W1 W2 , if probability distributions p(y1 |y2 ) exist
such that
X
pX1 ,Y1 (x, y1 ) =
p(y1 |y2 )pX2 ,Y2 (x, y2 ).
(1.15)
y2
If we sum over y1 on both sides of (1.15), we see that a necessary
condition for W1 W2 is that both inputs X1 , X2 have the same distribution
pX1 (x) = pX2 (x).
(1.16)
10
Introduction
Example 1.3 (Omit to Degrade). Let W be the channel X → (Y1 , Y2 )
and W 0 be the channel X → Y1 . Then W 0 W which can be seen by
using Definition 1.3 with probability distributions
(
1, if y10 = y1 ,
0
p(y1 |y1 , y2 ) =
(1.17)
0, otherwise.
Thus, a channel gets stochastically degraded if parts of the output are
omitted.
Proposition 1.1 (Degradation is Transitive). Let W1 , W2 , W3 be three
channels. If W1 W2 and W2 W3 then W1 W3 .
Proof. Let Wi be the channel Xi → Yi for i = 1, 2, 3. According to the
assumptions, there are probability distributions p(y1 |y2 ) and p0 (y2 |y3 )
such that
X
pX1 ,Y1 (x, y1 ) =
p(y1 |y2 )pX2 ,Y2 (x, y2 ),
(1.18)
y2
pX2 ,Y2 (x, y2 ) =
X
p0 (y2 |y3 )pX3 ,Y3 (x, y3 ).
(1.19)
y3
We combine equations (1.18) and (1.19) to get
X
X
pX1 ,Y1 (x, y1 ) =
p(y1 |y2 )
p0 (y2 |y3 )pX3 ,Y3 (x, y3 )
y2
(1.20)
y3
!
=
X X
y3
0
p(y1 |y2 )p (y2 |y3 ) pX3 ,Y3 (x, y3 ) (1.21)
y2
which proves W1 W3 with distributions
X
p00 (y1 |y3 ) ,
p(y1 |y2 )p0 (y2 |y3 ).
(1.22)
y2
The following proposition is proved in the Appendix.
Proposition 1.2 (Degradation and Channel Parameters). If W1 W2
then
I(W1 ) ≤ I(W2 ),
(1.23)
H(W1 ) ≥ H(W2 ),
(1.24)
Pe (W1 ) ≥ Pe (W2 ).
(1.25)
1.4 Important Concepts
11
U1
U2
U1
f
g0
U0
g1
U3
(a)
U2
U3
U0
g2
g3
(b)
Figure 1.2: (a) The factor graph for (1.26). (b) The factor graph for
(1.27).
Proposition 1.2 shows that stochastic degradation can be reasonably
used to order channels by their quality. However, note that in general
no stochastic degradation order between two channels needs to exist.
1.4.3
Factor Graphs
An introduction to factor graphs can be found in [19]. We use Forneystyle factor graphs [9] with some minor changes. A factor graph is a
graphical representation of a factorization of a function. Since there is
no unique factorization of a function, there is no unique factor graph of
a function. For example, suppose f (u0 , u1 , u2 , u3 ) can be factorized in
the following two ways
f (u0 , u1 , u2 , u3 ) = f (u0 , u1 , u2 , u3 ),
(1.26)
f (u0 , u1 , u2 , u3 ) = g0 (u0 )g1 (u0 , u1 , u3 )g2 (u1 , u2 )g3 (u2 , u3 ). (1.27)
The factor graphs for these two factorizations are shown in Figure 1.2.
Every edge in a factor graph corresponds to a variable whereas every
node (rectangular box) corresponds to a factor in the factorization. An
edge is connected to a node if and only if the factor corresponding to
that node depends on the variable corresponding to that edge.
In this thesis, all functions take values in the real numbers and scale
factors are not important. Hence, we say that a factor graph represents
the function f (uk−1
) even if it is a factorization of the function c·f (uk−1
)
0
0
where c is a positive real number independently of uk−1
. Moreover,
0
almost all variables are discrete and thus our notation assumes discrete
alphabets.
Often factor graphs represent the factorization of a probability distribution. Then, factor graphs are a tool to understand and derive algorithms that efficiently solve many statistical inference problems. Such
12
Introduction
Zm−1
..
.
g
Z0
U
Figure 1.3: Auxiliary visualization for SPMP and MPMP.
algorithms are often based on message passing. In a message passing
→
−
algorithm, a forward message −
µ and a backward message ←
µ is assigned
to every edge in the factor graph which are functions of the variable
of that edge. Sum-product message passing (SPMP) and max-product
message passing (MPMP) are the two most prominent message passing
algorithms and are defined below.
Sum-product (See Figure 1.3) The message along the edge U in the
direction of the arrow is the function
X
X
→
−
→
→
µU (u) ,
...
g(u, z0 , . . . , zm−1 )−
µZ0 (z0 ) · · · −
µZm−1 (zm−1 ) (1.28)
z0
zm−1
→
where −
µZi (zi ) is the message along the edge Zi in the direction of the
→
arrow. If the edge U is an open edge then −
µU (u) , 1 for all u.
Max-product (See Figure 1.3) The message along the edge U in the
direction of the arrow is the function
−
→
→
→
µU (u) , max . . . max g(u, z0 , . . . , zm−1 )−
µZ0 (z0 ) · · · −
µZm−1 (zm−1 ) (1.29)
z0
zm−1
→
where −
µZi (zi ) is the message along the edge Zi in the direction of the
→
arrow. If the edge U is an open edge then −
µU (u) , 1 for all u.
Suppose we have a cycle-free factor graph that represents
the prob y . If we apply
ability distribution on U0k−1 given by p U k−1 | Y uk−1
0
0
SPMP to every edge in that factor graph in both directions we get that
−
→
−
µUi (ui )←
µUi (ui ) ∝ p Ui |Y (ui |y)
(1.30)
where ∝ means equality up to a positive scale factor that is independent
of ui (but possibly dependent on y). Therefore, the MAP decision of Ui
1.4 Important Concepts
13
given Y can be obtained by calculating
→
−
ûi = argmax −
µUi (ui )←
µUi (ui ).
(1.31)
ui
Similarly, if we apply MPMP we obtain
→
−
−
µUi (ui )←
µUi (ui ) ∝
max
u0 ,...,ui−1 ,ui+1 ,...,uk−1
y (1.32)
p U k−1 | Y uk−1
0
0
and thus symbol-wise block MAP (1.5) can also be obtained by (1.31).
Let δx,y denote the {0, 1}-valued function that is 1 if and only if x =
y. Table 1.1 shows some basic building blocks for larger factor graphs.
Note that in general the factors of the permutation and the parity-check
node are not symmetric with respect to the involved variables, which is
indicated by a black dot in the factor graph. For the binary field, the
black dot is omitted since all permutations are their own inverse and
x ⊕ y = z is equivalent to x ⊕ y ⊕ z = 0, which is symmetric in x, y, z.
Proposition 1.3 (Observations).
Let M
be a factor graph for the dis, y on U0k−1 , Y . If we add an observation node
tribution pU k−1 ,Y uk−1
0
0
with value y 0 to the edge corresponding to Y , then the resulting
factor
0
y on U k−1 .
graph represents the probability distribution p U k−1 | Y uk−1
0
0
0
Proof. The factor graph of M together
with the observation node rep
resent the function pU k−1 ,Y uk−1
, y δy,y0 . By summing over y, we get
0
0
X
, y0
(1.38)
pU k−1 ,Y uk−1
, y δy,y0 = pU k−1 ,Y uk−1
0
0
0
0
y
0
y up to the scale factor pY (y 0 ) which is
which equals p U k−1 | Y uk−1
0
0
independent of uk−1
.
0
Proposition 1.3 is very important, because it tells us how to use
observed values in a factor graph.
Definition 1.4 (Factor Graph Equivalence). Let the factor graphs M
and L represent the functions fM (uC , uM ) and fL (uC , uL ), respectively.
We say M and L are uC -equivalent if they represent the same marginal
with respect to uC , i.e.
X
X
fM (uC , uM ) ∝
fL (uC , uL )
(1.39)
uM
uL
for all uC , where ∝ means equality up to a positive scale factor that is
independent of uC .
14
Introduction
Node name
Graph
X
x0
observation
Factor
g(x) , δx,x0
(1.33)
where x0 is the observed value of X.
X
πσ
permutation
g(x, y) , δx,σ(y)
Y
(1.34)
where X and Y take values in the
same finite set and σ is a permutation on that set.
Y
parity-check
X
⊕
g(x, y, z) , δz,x⊕y
(1.35)
Z
where ⊕ denotes addition in a finite
field.
Y
equality
X
=
Z
g(x, y, z) , δx,y δy,z
(1.36)
X
channel
W
Y
g(x, y) , pX,Y (x, y) (1.37)
where W is the channel X → Y .
Table 1.1: The factor graph notation of some basic nodes.
1.4 Important Concepts
15
If the set of variables uC is understood from the context, we just say
M and L are equivalent.
Proposition 1.4. Let the two factor graphs M and L represent the
probability distribution pM (uC , uM |y) and pL (uC , uL |y), respectively. If
M and L are uC -equivalent then they represent the same probability distribution on uC . Moreover, for any i ∈ C, the MAP estimate of Ui based
on Y is identical for the probability distribution of M and L.
Proof. In this case, we can simplify (1.39) to pM (uC |y) = pL (uC |y) for
all uC , which proves the first part. The MAP estimate of Ui based on Y
for the distribution of M is given by
X
ûM
pM (uC , uM |y)
(1.40)
i (y) = argmax
ui
uC\{i} ,uM
which can be written as
ûM
i (y) = argmax
ui
X
pM (uC |y).
(1.41)
uC\{i}
The same calculation reveals the MAP estimate of Ui based on Y for
the distribution of L
X
ûL
pL (uC |y)
(1.42)
i (y) = argmax
ui
uC\{i}
and because pM (uC |y) = pL (uC |y) for all uC , the proposition is proven.
The following proposition is proved in the Appendix.
Proposition 1.5 (Some Equivalences). Every subfigure in Figure 1.4 is
an equivalence relative to all variables outside the dashed box. This is
still valid if the same factor graph is added on both sides of a subfigure
in Figure 1.4 whose represented function does only depend on variables
outside the dashed box.
The following proposition is useful to go from SPMP to MPMP and
vice versa.
Proposition 1.6 (Max Equals Sum). Let the function g(x, z) be nonnegative and x, z take values in the finite sets X , Z, respectively. If for
all z ∈ Z there is at most one x ∈ X such that g(x, z) > 0 then
X
g(x, z) = max g(x, z).
(1.43)
x
x
16
Introduction
Y
X
⊕
≡
Z
X
Z
(a)
X
y0
Y
⊕
≡
Z
X
πσ
Z
(b)
Y
X
=
≡
Z
X
=
Z
(c)
X
y0
Y
=
≡
Z
X
y0 y0
Z
(d)
Y
πσ
≡
X
X
(e)
y0
Y
πσ
≡
X
σ(y 0 )
X
(f)
Figure 1.4: Some equivalences according to Definition 1.4. Note that
the permutation σ in (b) is given by σ(x) = x ⊕ y 0 .
1.4 Important Concepts
17
Proof. Fix any z ∈ Z. If there is no x ∈ X such that g(x, z) > 0 then
both sides of (1.43) are zero. If there is one x ∈ X such that g(x, z) > 0,
which we denote by xz , then both sides of (1.43) are equal to g(xz , z).
1.4.4
Permutations
A permutation on m letters is a bijective mapping from Zm to itself and
is completely defined by its two-line notation
0
1
...
m−1
π=
.
(1.44)
π(0) π(1) . . . π(m − 1)
The permutation matrix of a permutation π on m letters is the unique
m × m matrix P with elements in {0, 1} that satisfies
(x0 , x1 , . . . , xm−1 )P = xπ(0) , xπ(1) , . . . , xπ(m−1)
(1.45)
for all vectors x. It follows that
T
P (x0 , x1 , . . . , xm−1 ) = xπ−1 (0) , xπ−1 (1) , . . . , xπ−1 (m−1)
T
. (1.46)
Let Pπ1 and Pπ2 be the permutation matrices of the permutations π1
and π2 on m letters, respectively. Then with Pπ being the permutation
matrix of π , π1 ◦ π2 , we get
Pπ 1 Pπ 2 = P π .
(1.47)
Chapter 2
Polar Codes
In this chapter, we review and discuss polar codes [1]. Note that throughout this chapter and also in Chapters 3 and 5, n is a non-negative integer,
the blocklength N equals 2n and arithmetic is over the binary field. Polar
codes belong to the family of Gn -codes.
2.1
Gn -Codes
Let the binary matrix G1 be defined by
G1 ,
1
1
0
.
1
(2.1)
The factor graph notation of G1 is shown in Figure 2.1. The N × N
matrix Gn is recursively defined according to Figure 2.2. As an example,
Figure 2.3 shows G3 . Note that Gn is its own inverse, i.e. G−1
n = Gn .
U0
U1
G1
X0
X1
Figure 2.1: The factor graph notation of G1 . The node G1 represents
the {0, 1}-valued factor that is 1 if and only if u10 = x10 G1 .
19
20
Polar Codes
U0
X0
G1
U1
..
.
..
.
..
.
..
.
Gn−1
..
.
XN/2−1
U2i
G1
U2i+1
XN/2
..
.
..
.
..
.
..
.
Gn−1
UN −2
UN −1
..
.
G1
XN −1
Gn
−1
uN
0
−1
Figure 2.2: The relation
= xN
Gn is shown, where Gn is re0
cursively constructed based on two copies of Gn−1
U0
U1
U2
U3
U4
U5
U6
U7
G1
G1
G1
G1
G1
G1
G1
G1
G1
G1
G1
G1
X0
X1
X2
X3
X4
X5
X6
X7
Figure 2.3: The relation u70 = x70 G3 is shown where G3 is decomposed
recursively according to Figure 2.2.
2.2 The Probability Setup
21
Definition 2.1 (Gn -codes). The family of Gn -codes contains all codes
of the form
n
o
|I |
−1
Cn (F) , uN
Gn uI ∈ Z2 , uF = 0
(2.2)
0
with F ⊂ ZN and I = ZN \F.
The bits uF are called frozen bits. The bits uI are called information
bits because they are used to transmit the K bits of the message. Note
that the values of the frozen bits uF are defined to be zero in (2.2) for
simplicity and because for symmetric transmitting channels and for many
decoders the performance is independent of the values of uF (see [1, Sec.
VI.B] for a proof for the SCD).
For example, Reed-Muller codes are a well-known subset of Gn -codes.
Definition 2.2 (Reed-Muller Code). A Reed-Muller code of blocklength
N and rate R is a Gn -code Cn (F) with F being a set of bN (1 − R)c
indexes i ∈ ZN of minimal Hamming weight wH (i).
Example 2.1. The Hamming weights of the eight indexes i ∈ Z8 are
(0, 1, 1, 2, 1, 2, 2, 3). Thus, a possible Reed-Muller code of blocklength 8
and rate 5/8 is the G3 -code C3 (F) with F = {0, 1, 4}.
The encoding complexity of a Gn -code is O(N log(N )) since Gn consists of nN/2 copies of G1 and of n − 1 permutations on N letters (see
for example Figure 2.3).
2.2
The Probability Setup
Let the channels Xi → Yi , i ∈ ZN , be independent and identically distributed to the uniform and binary-input channel W . We define
U0N −1 , X0N −1 Gn .
(2.3)
A factor graph for the probability distribution of U0N −1 , X0N −1 , Y0N −1 is
shown in Figure 2.4.
2.3
Genie-Aided Successive Cancellation
Decoding
The genie-aided SCD is a tool to understand and construct polar codes
[1]. It can not be used for decoding because it needs a genie that reveals
the bits U0N −1 .
22
Polar Codes
X0
U0
X1
U1
..
.
Gn
..
.
XN −2
UN −2
XN −1
UN −1
W
Y0
W
..
.
Y1
..
.
W
YN −2
W
YN −1
Figure 2.4: A factor graph for the probability distribution of
U0N −1 , X0N −1 , Y0N −1 . The node Gn represents the {0, 1}−1
valued factor that is 1 if and only if uN
= x0N −1 Gn .
0
2.3.1
Definition
(i)
For i ∈ ZN , let Wn denote the channel
Wn(i) , Ui → Y0N −1 , U0i−1 .
(2.4)
(i)
The genie-aided SCD successively computes the MAP decisions for Wn .
Note that U0i−1 are in general not known to the receiver, but are assumed
(i)
to be provided by a genie. The MAP decision for Wn
ûi = argmax p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1
(2.5)
0
0
0
ui
could be computed according to (2.8) given by the following derivation
ûi = argmax
ui
1 X
−1 N −1
p U N −1 | Y N −1 uN
y0
0
0
c N −1 0
(2.6)
ui+1
1 X
−1
= argmax
p X N −1 | Y N −1 uN
Gn y0N −1
0
0
0
c N −1
ui
(2.7)
ui+1
= argmax
ui
N −1
−1 1 X Y
p Xi |Yi uN
Gn i yi
0
c N −1 j=0
(2.8)
ui+1
N −1 y
where c = p U i−1 | Y N −1 ui−1
is independent of ui and hence ir0
0
0
0
relevant for the decision on Ui . However, computing ûi according to
(2.8) is not feasible for practical blocklengths N . But the MAP decision
2.3 Genie-Aided Successive Cancellation Decoding
23
(i)
for Wn can be computed efficiently [1] by making use of the recursive
construction of Gn in Figure 2.2. This can be seen by noting that the
(i)
factor graph for Wn is essentially a tree [22]. We elaborate on that in
the following subsection.
2.3.2
The Tree Factor Graph for Wn(i)
We need the equivalences in Figure 2.5, which are proved now. In Figure
2.5a, the left hand side and the right hand side represent
fl (x0 , x1 , u0 , u1 ) = δ(u0 ,u1 ),(x0 ,x1 )G1
(2.9)
fr (x0 , x1 , u0 , u1 , s) = δu1 ,x1 δx1 ,s δu0 ,x0 ⊕s
(2.10)
respectively. The definition of G1 in (2.1) implies
fl (x0 , x1 , u0 , u1 ) = δu1 ,x1 δu0 ,x0 ⊕x1
By summing fr over s, we get
X
fr (x0 , x1 , u0 , u1 , s) = δu1 ,x1 δu0 ,x0 ⊕x1
(2.11)
(2.12)
s
which equals fl in (2.11) and thus we proved Figure 2.5a. Figures 2.5b
and 2.5c follow from Figures 1.4e and 1.4f by noting that G1 is bijective
and therefore a permutation. In Figure 2.5d, the left hand side represents
again the function (2.11) and the right hand side represents
fr (x0 , x1 , u0 ) = δu0 ,x0 ⊕x1
By summing fl over u1 , we get
X
fl (x0 , x1 , u0 , u1 ) = δu0 ,x0 ⊕x1
(2.13)
(2.14)
u1
which equals fr and hence we proved Figure 2.5d. Finally, Figure 2.5e
follows from Figures 2.5a and 1.4b.
(i)
To derive the tree factor graph for Wn , we start with the factor
graph in Figure 2.6 which represents the distribution
−1
−1 N −1
p uN
, xN
y0 , ui−1
(2.15)
0
0
i
where y0N −1 , ui−1
are fixed values. Suppose we replace the node Gn by
0
the equivalent factor graph in Figure 2.2. We can use Figure 2.5 to get
24
Polar Codes
U0
X0
⊕
X0
U1
S
=
X1
≡
G1
U1
U0
X1
(a)
X0
X0
≡
G1
X1
X1
(b)
u0 u0 ⊕ u1 X0
X0
≡
G1
u1 X1
u1 X1
(c)
U0
U0
X0
⊕
X0
≡
G1
X1
X1
(d)
u0 X0
≡
G1
U1
X1
U1
πσ
X0
S
=
X1
(e)
Figure 2.5: Useful equivalences that are mostly based on Figure 1.4.
Note that the permutation in (e) is given by σ(x0 ) = x0 ⊕
u0 .
2.3 Genie-Aided Successive Cancellation Decoding
X0
u0 ..
.
ui−1 W y0
..
.
Gn
Ui
..
.
UN −1
25
XN −1
W yN −1
Figure 2.6: The factor graph from Figure 2.4 if additionally U0i−1 =
ui−1
and Y0N −1 = y0N −1 are known.
0
W y0
..
.
Gn−1
..
.
..
.
Ui
W yN/2−1
⊕
W yN/2
..
.
Gn−1
..
.
..
.
W yN −1
Figure 2.7: This factor graph and the one from Figure 2.6 are
Ui −equivalent. We assumed i to be even but the case
when i is odd can be handled analogously.
26
Polar Codes
⊕
U4
⊕
⊕
(4)
π
W y0
=
W y1
π
W y2
=
W y3
π
W y4
=
W y5
π
W y6
=
W y7
Figure 2.8: The tree factor graph of W3 . Note that the four permutations depend on the values u30 which is omitted in the
notation for the sake of clarity.
2.3 Genie-Aided Successive Cancellation Decoding
27
the factor graph of Figure 2.7. Using the recursion from Figure 2.2 and
the equivalences from Figure 2.5 over and over again leads to a cycle-free
factor graph that is tree-like and which is called the tree factor graph
(i)
of Wn . An example for n = 3 and i = 4 is shown in Figure 2.8. The
(i)
tree factor graph of Wn and the factor graph from Figure 2.6 are Ui equivalent. Therefore, Proposition 1.4 implies that the MAP decision of
Ui coincides in both factor graphs. Note that the MAP decision of Ui for
(i)
the distribution (2.15) equals the MAP decision for Wn . Thus, as the
(i)
tree factor graph of Wn is cycle-free and tree-like, the MAP decision
(i)
of Wn can be efficiently computed by SPMP on the tree factor graph
(i)
of Wn (see (1.30) and (1.31)). Note that although message passing
usually consists of computing the forward and the backward messages,
here we only need to compute the messages in the direction towards Ui
as we want to decide on Ui only.
2.3.3
Max-Product on the Tree Factor Graph
In this section, we show what the decision on Ui according to (1.31)
means if MPMP instead of SPMP is used on the tree factor graph of
(i)
denote the variables of all edges of the tree factor graph
Wn . Let sm−1
0
(i)
for Wn that do not correspond to Ui , X0N −1 or Y0N −1 . The
tree factor
(i)
−1 m−1 N −1
graph for Wn represents the distribution p ui , xN
,
s
y0 , u0i−1 .
0
0
(i)
Suppose we use MPMP on the tree factor graph for Wn and finally
decide for Ui according to (1.31). Then by (1.32), we get the decision
ûi = argmax
ui
max
−1 m−1
xN
,s0
0
−1 m−1 N −1
p ui , xN
, s0
y0 , ui−1
0
0
(2.16)
which is the same as
ûi = argmax
ui
max
x0N −1 ,s0m−1
X
N −1 i−1 −1
y
p uN
, x0N −1 , sm−1
, u0 . (2.17)
0
0
i
N −1
ui+1
N −1
We can use Proposition 1.6 to transform the sum over ui+1
in (2.17)
into a maximization, because probabilities are non-negative and because
−1
N −1
uN
i+1 is determined by x0
ûi = argmax
ui
max
N −1 N −1 m−1
ui+1
,x0
,s0
N −1 i−1 −1
y
p uN
, x0N −1 , sm−1
, u0 . (2.18)
0
0
i
28
Polar Codes
Moreover, we can transform the maximization over x0N −1 , sm−1
in (2.18)
0
−1 m−1
−1
into a sum, because xN
, s0
are determined by uN
, hence
0
0
X
−1
−1 m−1 N −1
y0 , ui−1
(2.19)
p uN
, xN
, s0
ûi = argmax max
0
0
i
ui
−1
uN
i+1
−1 m−1
,s0
xN
0
−1 N −1
= argmax max p uN
y0 , ui−1
.
0
i
ui
−1
uN
i+1
(2.20)
From (2.20), we see that ûi is the decision on Uiof the symbol-wise block
MAP decision of UiN −1 based on Y0N −1 , U0i−1 .
2.4
Polarization
It can be shown [1] that if W is a uniform and binary input channel with
finite output alphabet then for any δ ∈ (0, 1)
n
o
(i)
i ∈ ZN I(Wn ) ∈ [0, δ) = 1 − I(W ),
(2.21)
lim
n→∞
N
n
o
(i)
i ∈ ZN I(Wn ) ∈ (1 − δ, 1] = I(W ).
(2.22)
lim
n→∞
N
Thus, as the blocklength goes to infinity a fraction of I(W ) of the chan(i)
(i)
nels Wn will be almost noise-free (I(Wn ) almost one) and the remain(i)
ing fraction 1 − I(W ) of channels will be almost useless (I(Wn ) almost
zero). This phenomenon was called polarization.
2.4.1
Definition of Polar Codes
(i)
Let IPC be the index set of the dN Re channels with largest I(Wn ) and
let FPC = ZN \IPC .
Definition 2.3 (Polar Code). A polar code of blocklength N and rate R
is the Gn -code Cn (FP C ).
As IPC and FPC depend on the transmitting channel W , the polar
code itself depends on W as well. If R < I(W ) then according to (2.22)
for n → ∞ the polar code contains only almost noise-free channels.
Proofs that the convergence to noise-free channels happens fast enough
to counteract the growing number of information bits are given in [1]
and [2]. Therefore, polar codes achieve I(W ), which is the capacity for
a uniform input distribution.
2.5 Successive Cancellation Decoder
2.4.2
29
Construction of Polar Codes
An algorithm which determines the index set IPC of the information
bits of a polar code is called a code construction algorithm. An obvious
way for a code construction algorithm is based on Monte Carlo simulations [1]. Namely, do Monte Carlo simulations and use the genie-aided
(i)
SCD until you have for every i ∈ ZN an estimate of I(Wn ) that allows
(i)
to reliably decide if I(Wn ) belongs to the dN Re largest values. Unfortunately, the number of Monte Carlo samples needed can be intractably
high. Several other algorithms have been proposed to construct polar
codes: Density evolution in [22] and a Gaussian approximation for density evolution in [14], [38]. In [36], efficient degrading and upgrading
quantization methods were proposed.
2.5
Successive Cancellation Decoder
The SCD is very similar to the genie-aided SCD, but it does not require
a genie and can therefore be used to decode a Gn -code (and thus also a
polar code) in practice.
2.5.1
Definition
The SCD for the Gn -code Cn (F) computes decisions of the bits u0N −1
successively according to
(
argmaxui p Ui | Y N −1 ,U i−1 ui y0N −1 , ûi−1
, if i ∈ I,
0
0
0
ûi ,
(2.23)
0,
otherwise.
For i ∈ I, if the past decisions ûi−1
are correct then ûi corresponds to
0
(i)
the MAP decision of the channel Wn . Hence, the rule to choose the
information bits of a polar code is matched to SCD.
2.5.2
Complexity
(i)
The MAP decision of the channel Wn can be computed by doing SPMP
(i)
on the tree factor graph for Wn . Many messages can be shared between
(i)
message passing on the tree factor graphs for Wn for different i ∈ ZN .
−1
More precisely, for doing SCD of one block uN
, we need to compute
0
for every G1 the two messages along U0 and U1 in Figures 2.5d and 2.5e,
30
Polar Codes
respectively. As Gn consists of nN/2 copies of G1 , the complexity of
−1
decoding the whole block uN
is O(N log(N )).
0
2.5.3
Successive Cancellation and Block MAP
Block MAP decoding of the Gn -code Cn (F) is given by
ûI = argmax p UI | Y N −1 ,UF uI y0N −1 , 0 .
uI
(2.24)
0
It leads to the lowest block error probability among all decoders but
is not practical. The SCD on the other hand is practical but leads
to a higher block error probability than block MAP. The difference in
block error probability can be large [35], [21]. Hence, it is interesting
to understand the difference between block MAP decoding and SCD.
We discuss this difference based on the decision on one bit Ui , i ∈ I.
The SCD does a MAP decision of Ui based on Y0N −1 , U0i−1 and uses
the previous decisions ûi−1
as observed values of U0i−1 . A block
0
MAP
decoder does a block MAP decision of UI based on Y0N −1 , UF . Thus,
there are the following differences
1. The SCD computes MAP decisions (1.8). The block MAP decoder
computes block MAP decisions (1.3).
2. Compared to the block MAP decoder, the SCD uses the additional
observations U0i−1 \UF . However, the SCD uses previous decisions
and not necessarily the true values for them.
3. Compared to the SCD, the block MAP decoder uses the additional
observations UF \U0i−1 .
In Chapter 5, we will see that the ignorance of the known values of
UF \U0i−1 is the main culprit for the performance loss of SCD. Hence,
to come close to a block MAP decoder, we need to take into account at
least some of the known bits UF \U0i−1 when deciding on Ui . The reason
why the SCD ignores the known values of UF \U0i−1 is to maintain its low
complexity. Namely, the argument used to obtain a tree-like factor graph
in Section 2.3.2 would break down if the values of the bits UF \U0i−1 were
not ignored.
2.6 Characteristics
2.6
31
Characteristics
Polar codes have many favorable characteristics:
1. Low-complexity encoder (see Section 2.1)
2. Low-complexity decoder (see Section 2.5.2)
3. Capacity-achieving for binary-input discrete memoryless channels
[1], [10] and for much more channels
4. No error floor [21]
5. Explicit and efficient construction (see Section 2.4.2)
However, the major undesirable characteristic of polar codes is the
slow speed at which polarization takes place and hence they are weak
at practical blocklengths [35]. The following section gives an incomplete
overview of what can be done to improve the performance of polar codes
at practical blocklengths.
2.7
Performance Improvements
The measures to improve the performance of polar codes can be split
into the two groups:
1. Improve upon the SCD,
2. Improve polar codes themselves.
However, finally the combination of code and decoder needs to perform
well with an acceptable complexity.
2.7.1
Improve the Decoder
A practical decoder needs a good tradeoff between performance and complexity. Two practical decoders, both applicable to Gn -codes in general,
with a lower block error probability than the SCD are discussed roughly.
32
Polar Codes
Iterative Sum-Product Message Passing
Iterative SPMP can be used on a factor graph with cycles and hence the
known bits UF \U0i−1 that are ignored by the SCD (see Section 2.5.3)
can be used. The iterative SPMP decoder has a low complexity and
simulation results show that it can lead to a marked block error probability reduction [11]. However, improving the iterative SPMP is not
straightforward.
Successive Cancellation List Decoder [35]
The idea behind SCLD of Gn -codes is as follows. Let L be a positive
integer and let L−1 , ∅. The SCLD works successively through steps
i = 0, 1, . . . , N − 1. After every step i, a list of maximal L estimates
Li = ûi0 (l) l ∈ ZL
(2.25)
of U0i is generated according to:
1. If Ui is a frozen bit then Li = Li−1 × {0}.
i
2. If Ui is an information bit then Li contains the L estimates
û0
from Li−1 × Z2 with largest probabilities p U i | Y N −1 ûi0 y0N −1 .
0
0
After the last step i = N − 1, the most probable estimate from the list
−1
LN −1 is the final decision for uN
.
0
SCLD of Gn -codes has space complexity O(LN ) and time complexity
O(LN log(N )). SCLD leads to a remarkable block error probability
reduction. For many parameters, SCLD of polar codes achieves the
performance of a block MAP decoder even for moderate list sizes L.
However, even block MAP decoding of polar codes has still a quite poor
performance. Therefore, to make polar codes useful in practice the codes
themselves need to be improved as well.
2.7.2
Improve the Code
There are many ways of improving a polar code. Two thereof are discussed roughly.
Use Other Transforms Than G1
Other transforms than G1 can be used in the recursive construction of
(i)
Figure 2.2. To keep the property that the factor graph of Wn is treelike, the transform should be bijective.
2.7 Performance Improvements
33
Binary matrices of dimensions larger than 2 × 2 are used instead of
G1 in [13]. They showed that there exists no invertible, binary matrix
of dimension smaller than 15 × 15 with a better polarization rate than
G1 . Moreover, they constructed a 16 × 16 binary matrix with better
polarization rate than G1 , but SCD of a 16×16 matrix is computationally
demanding.
Non-linear transforms over the binary field are considered in [27].
Explicit non-linear transforms were presented with dimensions 14 × 14,
15 × 15 and 16 × 16 and better polarization rates than the best linear
transforms of the same dimensions.
Linear transforms over an arbitrary finite field are considered in [25].
They showed that Reed-Solomon matrices have the largest possible polarization rate for a given transform and field size. However, SCD of
Reed-Solomon matrices is impractical except for small matrices (see Section 4.4.2). It was empirically shown that replacing G1 by a reasonably
small Reed-Solomon matrix of size 4 × 4 reduces the block error probability substantially.
In [26], mixed transforms were presented, i.e., transforms that depend on the location in the recursive construction of Figure 2.2. They
empirically compared several explicit mixed transforms to standard polar codes under SCLD and they found a performance and complexity
advantage in favour of their mixed transforms.
Use Other Gn -Codes than Polar Codes
Polar codes are matched to the SCD. If a more powerful decoder is
used, it is manifest to conjecture that additionally to the gain due to
the more powerful decoder, choosing a Gn -code other than polar codes
would improve the performance even more.
For example for the binary erasure channel, it is empirically known
[21] that Reed-Muller codes perform better than polar codes under block
MAP decoding.
As another example, [21] introduced a family of Gn -codes that is
parametrized by α ∈ (0, 1], and interpolates between polar (α = 1) and
Reed-Muller codes (α = 0). They empirically showed that the block
error probability under SCLD or iterative SPMP is lowered by picking a
code with α between 0 and 1, i.e., between polar and Reed-Muller codes.
Chapter 3
A Partial Order for the
Synthesized Channels of
Successive Cancellation
Decoding
(i)
A partial order (PO) for the channels Wn is presented that is independent of the transmitting channel W . The PO is based on the observation
(j)
(i)
that a channel Wn is stochastically degraded to Wn if j is obtained by
swapping a more significant 1 with a less significant 0 in the binary ex(i)
(j)
pansion of i. As this order is only partial, not all channels Wn , Wn are
(i)
comparable. However, knowing the quality of one channel Wn gives us
(j)
information about the quality of all channels Wn that are comparable
(i)
to Wn through the PO.
(i)
In [23], a different PO for the channels Wn was given that is also
independent of the transmitting channel W . However, it is completely
different from the one in this chapter. The PO in [23] relates channels
(i)
(j)
Wn , Wn with indexes i, j of different Hamming weight. The PO of
this chapter relates channels with indexes of the same Hamming weight.
The PO of this chapter is combined with the PO from [23]. As an
example, for N = 220 and i = 436586, the combined PO implies that
(i)
23.06% of all synthesized channels are worse than Wn and 23.07% are
35
36
A Partial Order for the Synthesized Channels of SCD
(i)
better than Wn , independently of W . Finally, a minimal representation
of the combined PO is derived that enables us to practically use the PO
for simplifying code construction of polar codes.
Part of this chapter was published in [31]. The same combined PO
was independently derived by Bardet et al. [3].
This chapter is organized as follows. Section 3.1 introduces the necessary preliminaries. In Section 3.2, we derive different representations
of the generator matrix of a polar code. From these representations, we
get a sufficient condition for channel degradation, which is the basis of
the PO. In Section 3.3, we introduce the PO, derive a minimal representation of the PO and give examples for n = 4, 6. The PO from [23]
is reviewed in Section 3.4. In Section 3.5, we combine the PO from Section 3.3 with the PO in [23]. In Section 3.6, we comment on how the
combined PO can be used to simplify code construction of a polar code.
Finally, Section 3.7 concludes by discussing generalizations of the PO
from Section 3.3.
3.1
Preliminaries
The definition of a bit significance permutation is given below and an
explanatory example is given in Table 3.1.
Definition 3.1 (Bit Significance Permutation). Let π be a permutation
on m letters. The permutation sπ on 2m letters is called a bit significance
permutation and is defined as the mapping k 7→ k 0 where k 0 is the integer
with binary representation
0
k00 , k10 , . . . , km−1
= kπ(0) , kπ(1) , . . . , kπ(m−1)
(3.1)
where k0 , k1 , . . . , km−1 is the binary representation of k.
Example 3.1 (Bit-Reversal Permutation). The bit-reversal permutation
is a bit significance permutation sπ with
0
1
... m − 2 m − 1
π=
.
(3.2)
m − 1 m − 2 ...
1
0
For example, Table 3.1 shows that for m = 3 the bit-reversal permutation
is given by
0 1 2 3 4 5 6 7
sπ =
.
(3.3)
0 4 2 6 1 5 3 7
3.1 Preliminaries
binary
bit permutation
integer
37
0
↓
000
↓
000
↓
0
1
↓
001
↓
100
↓
4
2
↓
010
↓
010
↓
2
3
↓
011
↓
110
↓
6
4
↓
100
↓
001
↓
1
5
↓
101
↓
101
↓
5
6
↓
110
↓
011
↓
3
7
↓
111
↓
111
↓
7
Table 3.1: An example of a bit significance permutation on 8 letters.
First, the binary representation of every i ∈ Z8 is derived.
Second, the bits are permuted by an arbitrary permutation,
which reverses the bits in this example. Third, the integer
representation is derived.
The permutation matrix of the bit-reversal permutation on N letters
is denoted by Bn .
Definition 3.2 (Partial Order [8, p. 2]). A PO on a set S is a binary
relation on S that satisfies for all x, y, z ∈ S that
1. x x (Reflexivity),
2. x y and y x imply x = y (Antisymmetry),
3. x y and y z imply x z (Transitivity).
We write x ≺ y if and only if x y and x 6= y. The covering relation
of a PO is a minimal representation of the PO.
Definition 3.3 (Covering Relation [8, p. 11]). Let S be a set with a PO
and let x, y, z ∈ S. Then we say x is covered by z, denoted by x → z,
if x ≺ z and if x y ≺ z implies x = y.
If the set S is finite, the PO is determined by its covering relation [8,
p. 11] and hence the covering relation can be seen as a generator of the
PO. Namely, x y if and only if there exists a finite sequence (k ≥ 0)
of covering relations between x ∈ S and y ∈ S
x = a0 → a1 → . . . → ak = y.
(3.4)
Definition 3.4 (Channel Equivalence Relation [36]). Let W and W 0 be
two channels. If W W 0 and W 0 W then we say W and W 0 are
equivalent and denote this by W ≡ W 0 .
38
A Partial Order for the Synthesized Channels of SCD
Example 3.2. Let W be the channel X → Y and W 0 be the channel
X → f (Y ) where f is a bijective mapping. Then W ≡ W 0 .
3.2
A Sufficient Condition for Degradation
In this section, we derive a sufficient condition for stochastic degradation
(j)
(i)
between two channels Wn and Wn .
Lemma 3.1 (Gn is Bit Significance Invariant). We can write Gn in the
form
Gn = P 0 Gn P −1
(3.5)
where P is any bit significance permutation matrix of dimensions N × N
and P 0 , Bn P Bn−1 .
Proof. From [1, Prop. 16] we have Gn = Bn G⊗n
1 . From (1.1) follows
that
n−1
Y
⊗n [G1 ]ik ,jk
(3.6)
G1 i,j =
k=0
where i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 are the binary representations
of i and j, respectively. Let sπ be the bit significance permutation corresponding to P and let C , P −1 G⊗n
1 P . We get
⊗n Ci,j = G1 sπ (i),sπ (j)
(3.7)
=
=
=
Thus, we proved that
G⊗n
1
n−1
Y
[G1 ]iπ(k) ,jπ(k)
(3.8)
k=0
n−1
Y
[G1 ]il ,jl
l=0
⊗n G1 i,j
= P −1 G⊗n
1 P . It follows
⊗n −1
G⊗n
1 = P G1 P
(3.9)
(3.10)
that
(3.11)
and we can conclude that
Gn = Bn G⊗n
1
=
=
=
−1
Bn P G⊗n
1 P
−1
Bn P Bn−1 Bn G⊗n
1 P
Bn P Bn−1 Gn P −1
(3.12)
(3.13)
(3.14)
(3.15)
3.2 A Sufficient Condition for Degradation
39
where P 0 , Bn P Bn−1 is a product of permutation matrices and therefore
again a permutation matrix.
There exist n! different bit significance permutations on N letters.
Therefore, Lemma 3.1 gives us n! different representations of Gn .
From (2.3) and (3.5) we get
U0N −1 = X0N −1 P 0 Gn P −1
(3.16)
and therefore
U0N −1 P = X0N −1 P 0 Gn .
(3.17)
N −1
N −1 0
It follows that the
P has the same distribution
vector U0 P, Y0
N −1
N −1
as U0 , Y0
and therefore
Wn(i) ≡ Usπ (i) → Y0N −1 P 0 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1)
(3.18)
where sπ is the bit significance permutation corresponding to P . Moreover, as P 0 is a permutation matrix, there is a one-to-one correspondence
between Y0N −1 P 0 and Y0N −1 and according to Example 3.2 we get for all
i ∈ ZN that
Wn(i) ≡ Usπ (i) → Y0N −1 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1) . (3.19)
Before we state and prove the sufficient condition for stochastic degra(i)
dation of the channels Wn , we give an example that shows the basic
idea. Assume n = 2. There are only two bit significance permutations
for n = 2, namely the identity and the bit-reversal permutation. Clearly,
using the identity in (3.19) gives nothing new. However, using the bitreversal permutation implies
(0)
W2
(1)
W2
(2)
W2
(3)
W2
≡ U0 → Y03 ,
(3.20)
≡ U1 →
Y03 , U0 ,
Y03 , U0 , U2 ,
≡ U3 →
Y03 , U0 , U2 , U1
≡ U2 →
(3.21)
(3.22)
.
(3.23)
The equivalences (3.20) and (3.23) are trivial. However, the equivalences
(1)
(2)
(3.21) and (3.22) let us conclude that W2 W2 by
(2)
(3.24)
W2 ≡ U1 → Y03 , U0 , U2
U1 → Y03 , U0
(3.25)
(1)
≡ W2
(3.26)
40
A Partial Order for the Synthesized Channels of SCD
where we used Example 1.3 for (3.25). In general, we get the following
lemma.
Lemma 3.2 (Sufficient Condition for Degradation). Let i, j ∈ ZN and
let sπ be a bit significance permutation on N letters. If
1. j = sπ (i)
2. Zj ⊂ sπ (Zi )
(j)
then Wn
(i)
is stochastically degraded with respect to Wn .
Proof. The proof is a generalization of (3.24) - (3.26)
Wn(i) ≡ Uj → Y0N −1 , Usπ (0) , Usπ (1) , . . . , Usπ (i−1)
Uj → Y0N −1 , U0 , U1 , . . . , Uj−1
≡
Wn(j)
(3.27)
(3.28)
(3.29)
where we used (3.19) and the condition j = sπ (i) for (3.27) and the
condition Zj ⊂ sπ (Zi ) for (3.28).
3.3
A Partial Order
Let i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 be the binary representations of
i, j ∈ ZN , respectively.
Definition 3.5. We write j % i if there exists l, l0 ∈ Zn with l < l0 such
that
1. jl = 1 and jl0 = 0
2. il = 0 and il0 = 1
3. For all k ∈ Zn \{l, l0 }: jk = ik
Example 3.3. For n = 5, the following table shows that 22 % 28 with
l = 1 and l0 = 3.
22 in binary:
28 in binary:
10110
11100
Clearly, if j % i then j < i and wH (j) = wH (i). The following
theorem is the basis of the PO.
3.3 A Partial Order
41
(j)
Theorem 3.1. If j % i then Wn is stochastically degraded with respect
(i)
to Wn .
Proof. We make use of Lemma 3.2. Let sπ be the bit significance permutation that swaps the bits with index l and l0 , where l and l0 are defined
according to Definition 3.5. The first condition of Lemma 3.2 is satisfied
by j % i. The second condition is equivalent to
sπ (Zj ) ⊂ Zi
(3.30)
because sπ is its own inverse. Let k0 , k1 , . . . , kn−1 be the binary representation of k. We can write sπ as


if kl = kl0 ,
k,
0
(3.31)
sπ (k) = k − 2l + 2l , if kl = 1 and kl0 = 0,


l0
l
k − 2 + 2 , if kl = 0 and kl0 = 1
0
which is therefore upper bounded by k − 2l + 2l . Hence, for k ∈ Zj , we
get
sπ (k) ≤ k − 2l + 2l
< j − 2l + 2l
0
(3.32)
0
(3.33)
=i
(3.34)
and thus (3.30) is satisfied.
(i)
The PO 1 on the set Wn , i ∈ ZN , is defined as follows.
(j)
(i)
Definition 3.6 (The Partial Order 1 ). Wn 1 Wn if and only if
there exists a finite sequence a0 , a1 , . . . , ak ∈ ZN , k ≥ 0, such that a0 = j,
ak = i and am % am+1 for all m ∈ Zk .
It can be shown that 1 fulfills Definition 3.2. Theorem 3.1 and the
transitivity of imply
Wn(j) 1 Wn(i) ⇒ Wn(j) Wn(i)
(3.35)
and hence 1 orders the channels according to their quality.
The PO 1 is a consequence of Lemma 3.2. However, it is not obvious
whether the PO 1 fully exploits Lemma 3.2. It could be that there are
(j)
(i)
(j)
(i)
j, i ∈ ZN such that Lemma 3.2 implies Wn Wn but Wn 1 Wn .
The following proposition, whose proof is in the Appendix, shows that
this is not the case.
42
A Partial Order for the Synthesized Channels of SCD
W
BSC(0.11)
BEC(0.5)
(6)
(6)
Pe (W4 )
0.1802
0.2660
(9)
Pe (W4 )
0.2073
0.2338
(9)
Table 3.2: Pe (W4 ) and Pe (W4 ) for the binary symmetric channel
(BSC) with crossover probability 0.11 and for the binary
erasure channel (BEC) with erasure probability 0.5.
Proposition 3.1. Let j, i ∈ ZN and let the bit significance permutation
(j)
sπ be such that both conditions of Lemma 3.2 are satisfied. Then Wn 1
(i)
Wn .
The covering relation of the PO 1 is given in the following proposition, whose proof is in the Appendix.
(j)
(i)
Proposition 3.2 (Covering Relation of 1 ). Wn →1 Wn if and only
if j % i with l0 = l + 1, where l and l0 are defined as in Definition 3.5.
(j)
(i)
Proposition 3.2 implies that Wn can only be covered by a Wn , if
the binary expansion of j contains a one with a subsequent zero. As
the binary expansion of j has n bits, it contains at most n/2 ones with
a subsequent zero. Thus for any j ∈ ZN , there exists at most n/2
(i)
(j)
indexes i ∈ ZN such that Wn →1 Wn . Moreover, an analogue argument implies that there exists at most n/2 indexes i ∈ ZN such that
(i)
(j)
Wn →1 Wn . Thus the number of covering relations for the PO 1 is
O(N log(N )).
We give examples of the PO 1 for n = 4, 6. It is enough to specify
(i)
the covering relation of 1 because the set of channels Wn , i ∈ ZN ,
is finite and hence the PO 1 is generated by its covering relation. A
visual representation of a covering relation is the Hasse-diagram [8, pp.
10-11]. An example of a Hasse-diagram for n = 4 is shown in Figure
(i)
3.1. It can be seen that every subset of channels W4 with equal Hamming weight wH (i) is totally ordered [8, p. 11] except for the subset
(6)
(9)
with wH (i) = 2 where no order between W4 and W4 exists. Table
(6)
(9)
3.2 shows Pe (W4 ) and Pe (W4 ) for two different channels W . We
(6)
(9)
can conclude that there exists no order between W4 and W4 that is
independent of the transmitting channel W .
(i)
The Hasse-diagram of all channels W6 with wH (i) = 3 is shown in
Figure 3.2.
3.4 Another Partial Order from the Literature
43
(12)
W4
(8)
(14)
W4
W4
(10)
W4
(4)
(13)
W4
(0)
W4
(6)
W4
(9)
W4
(15)
W4
(2)
W4
(11)
W4
W4
(5)
W4
(1)
(7)
W4
W4
(3)
W4
(i)
Figure 3.1: The Hasse-diagram of the PO 1 on the set W4 , i ∈ Z16 .
3.4
Another Partial Order from the Literature
The following PO, which is independent of the transmitting channel W ,
was given in [23, Sec. V].
(j)
(i)
Definition 3.7 (The Partial Order 2 from [23]). Wn 2 Wn if and
only if for all k ∈ Zn we have jk = 1 ⇒ ik = 1 where j0 , j1 , . . . , jn−1 and
i0 , i1 , . . . , in−1 are the binary representations of j and i, respectively.
(20)
(29)
Example 3.4. For n = 5, the following table shows that Wn 2 Wn
because if a bit of 20 equals one, the corresponding bit for 29 is never zero.
20 in binary:
29 in binary:
10100
11101
It can be shown that the PO 2 fulfills Definition 3.2. If the transmitting channel W is symmetric then
Wn(j) 2 Wn(i) ⇒ Wn(j) Wn(i)
(3.36)
which is proved in [30, Sec. V]. For a not necessarily symmetric W , we
get the following proposition.
44
A Partial Order for the Synthesized Channels of SCD
(56)
W6
(52)
W6
(44)
(50)
W6
W6
(28)
W6
(26)
W6
(22)
W6
(14)
W6
W6
W6
W6
W6
(42)
W6
(38)
W6
(25)
W6
(21)
W6
(13)
(49)
(41)
(37)
(35)
(19)
W6
W6
(11)
W6
(7)
W6
(i)
Figure 3.2: The Hasse-diagram of the PO 1 on the set W6
i ∈ Z64 and wH (i) = 3.
with
3.4 Another Partial Order from the Literature
(j)
45
(i)
Proposition 3.3. If Wn 2 Wn then
I(Wn(j) ) ≤ I(Wn(i) ),
H(Wn(j) )
Pe (Wn(j) )
≥
≥
(3.37)
H(Wn(i) ),
Pe (Wn(i) ).
(3.38)
(3.39)
Proof. Let Ws be the symmetrized channel of W as in [15, Def. 1.3]. As
Ws is symmetric, (3.36) and Proposition 1.2 imply that if
(j)
(i)
(Ws )n 2 (Ws )n
(3.40)
then
(j)
(i)
I((Ws )n ) ≤ I((Ws )n ),
(j)
H((Ws )n )
(j)
Pe ((Ws )n )
≥
≥
(3.41)
(i)
H((Ws )n ),
(i)
Pe ((Ws )n ).
(3.42)
(3.43)
It can be shown that
(k)
I((Ws )n ) = I(Wn(k) ),
(k)
H((Ws )n )
(k)
Pe ((Ws )n )
=
=
(3.44)
H(Wn(k) ),
Pe (Wn(k) )
(3.45)
(3.46)
for all k ∈ ZN , which concludes the proof.
The covering relation of the PO 2 is given in the following proposition.
(j)
(i)
Proposition 3.4 (Covering Relation of 2 ). Wn →2 Wn if and only
(j)
(i)
if Wn ≺2 Wn and wH (i) = wH (j) + 1.
Proof. Let j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 be the binary representations of j and i, respectively. We start by proving the ⇒ part. If
(j)
(i)
(j)
(i)
Wn →2 Wn then by definition we obtain Wn ≺2 Wn . It follows
that jk = 1 ⇒ ik = 1, k ∈ Zn , and that there is at least one bit index o with jo = 0 and io = 1 and hence wH (i) − wH (j) > 0. Suppose
wH (i) − wH (j) ≥ 2, i.e. there are at least two bit indexes o, o0 such that
the corresponding bits are 0 for j and 1 for i. Let m ∈ ZN be the integer
that has the same binary representation as j but with the bit of index
(j)
(m)
(i)
o equal to 1. Then Wn ≺2 Wn ≺2 Wn , which is a contradiction to
(j)
(i)
Wn →2 Wn and hence wH (i) − wH (j) = 1 which proves the ⇒ part.
46
A Partial Order for the Synthesized Channels of SCD
In order to prove the ⇐ part, we need to prove
(j)
(i)
(j)
(m)
1. Wn ≺2 Wn
2. Wn 2 Wn
(i)
(m)
(j)
≺2 Wn ⇒ Wn = Wn
where the first condition is true by assumption. To prove the second con(i)
(m)
(j)
dition, assume Wn 2 Wn ≺2 Wn . This implies wH (j) ≤ wH (m) <
wH (i) and with wH (i) = wH (j) + 1 we get wH (j) = wH (m). Because
(m)
(j)
(m)
(j)
additionally Wn 2 Wn we can conclude Wn = Wn .
Proposition 3.4 implies that for any j ∈ ZN there are exactly n −
(i)
(j)
wH (j) indexes i ∈ ZN such that Wn →2 Wn and exactly wH (j)
(j)
(i)
indexes i ∈ ZN such that Wn →2 Wn . Thus the number of covering
relations for the PO 2 is O(N log(N )).
3.5
The Combined Partial Order
We combine 1 with 2 to get the more powerful PO c , which is still
independent of the transmitting channel W .
(j)
(i)
Definition 3.8 (The Combined Partial Order c ). Wn c Wn if and
only if there exists a finite sequence a0 , a1 , . . . , ak ∈ ZN , k ≥ 0, such that
(a
)
(a
)
(a )
(a )
a0 = j, ak = i and such that Wn m →1 Wn m+1 or Wn m →2 Wn m+1
for all m ∈ Zk .
It can be shown that the PO c fulfills Definition 3.2. The following proposition is a direct consequence of (3.35), Proposition 1.2 and
Proposition 3.3.
(j)
(i)
Proposition 3.5. If Wn c Wn then
I(Wn(j) ) ≤ I(Wn(i) ),
(3.47)
H(Wn(j) ) ≥ H(Wn(i) ),
(3.48)
Pe (Wn(j) ) ≥ Pe (Wn(i) ).
(3.49)
(i)
Hence the PO c orders the channels Wn according to their quality.
The covering relation of the PO c is given in the following proposition whose proof is in the Appendix.
3.5 The Combined Partial Order
47
(14)
W4
(12)
W4
(10)
W4
(13)
W4
(11)
W4
(8)
W4
W4
(2)
W4
W4
(0)
(6)
W4
(4)
W4
W4
(9)
W4
(15)
W4
(7)
W4
(5)
(3)
(1)
W4
(i)
Figure 3.3: The Hasse-diagram of the PO c on the set W4 , i ∈ Z16 .
(j)
(i)
Proposition 3.6 (Covering Relation of c ). Wn →c Wn if and only
if one of the following two conditions is satisfied
(j)
(i)
(j)
(i)
1. Wn →1 Wn
2. Wn →2 Wn with j0 = 0, i0 = 1
where j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 are the binary representations
of j and i, respectively.
Proposition 3.6 implies that for any j ∈ ZN there are at most n/2 + 1
(j)
(i)
indexes i ∈ ZN such that Wn →c Wn and at most n/2 + 1 indexes
(i)
(j)
i ∈ ZN such that Wn →c Wn . Thus the number of covering relations
for the PO c is O(N log(N )).
As an example, the Hasse-diagram of the combined PO c for n = 4
is shown in Figure 3.3.
48
3.6
A Partial Order for the Synthesized Channels of SCD
Code Construction with the Combined
Partial Order
(i)
This chapter presented the two POs 1 and c for the channels Wn , i ∈
ZN . We give a few remarks on how the more powerful PO c can be
used to simplify code construction of a polar code. Suppose the channels
(i)
(i)
with Pe (Wn ) ≤ γ are used to transmit information. Let Pel (Wn ) and
(i)
(i)
u
Pe (Wn ) denote the lower and upper bound on Pe (Wn ) obtained by the
method in [36], respectively. Algorithm 2 is a possible code construction
algorithm.
Algorithm 2 Code Construction with the Combined Partial Order
1: I ← {}, F ← {}.
2: while I ∪ F 6= ZN do
3:
Pick an i ∈ ZN \(I ∪ F ) .
(i)
(i)
4:
Determine Pel (Wn ) and Peu (Wn ).
(i)
5:
if Peu (Wn )n≤ γ then o
(i)
(j)
6:
I ← I ∪ j ∈ ZN \I Wn c Wn
7:
else
o
n
(j)
(i)
8:
F ← F ∪ j ∈ ZN \F Wn c Wn
9:
end if
10: end while
(i)
(i)
Note that only the channels Wn , for which Pe (Wn ) ≤ γ is guaranteed, are used to transmit information. Note further that the sets in Line
6 and 8 of Algorithm 2 can be efficiently determined with the covering
relations of c .
An open question is which i should be picked in Line 3 of Algorithm 2.
One possibility is the Maximin strategy which picks an i that maximizes
the minimum of
n
o
(3.50)
j ∈ ZN \I Wn(i) c Wn(j) and
n
o
j ∈ ZN \F Wn(j) c Wn(i) .
(3.51)
For example, let γ = 10−10 , n = 20 and let W be the UBAWGNC with
SNR = 2 dB. Then Algorithm 2 with the Maximin strategy requires the
3.6 Code Construction with the Combined Partial Order
(i)
(i)
49
(i)
computation of Pel (Wn ) and Peu (Wn ) for 10’859 channels Wn , which
is 1.0% of all N channels. This requires the execution of 59’435 times
each the degrading method and the upgrading method from [36], which
is 2.8% of the 2N − 2 executions needed in [36].
In [1], an upper bound on the block error probability of a Gn -code
under SCD was given
X
P U0N −1 6= Û0N −1 Y0N −1 ≤
Pe (Wn(i) ).
(3.52)
i∈I
(i)
(i)
We can use Pel (Wn ) and Peu (Wn ) to bound the right hand side of
(i)
(i)
(3.52). If we use the bounds Pel (Wn ) and Peu (Wn ) of all synthesized
channels, we get
X
1.2 · 10−7 ≤
Pe (Wn(i) ) ≤ 2.2 · 10−7
(3.53)
i∈I
(i)
(i)
By using only the 10’859 bounds Pel (Wn ) and Peu (Wn ) computed in
Algorithm 2 and the PO c to spread them, we get the bounds
X
8.7 · 10−8 ≤
Pe (Wn(i) ) ≤ 5.0 · 10−7
(3.54)
i∈I
which are not much looser than the bounds in (3.53). To elaborate on
that, we define as in [36] the minimal value of the right hand side of
(3.52) for any Gn -code of rate R = K/N
X
PW,n (K) , min
Pe (Wn(i) ).
(3.55)
|I |=K
i∈I
Figure 3.4 shows the upper and lower bound on PW,n (K) obtained by
the method in [36] and the upper and lower bound on PW,n (K) obtained
(i)
(i)
by using only the 10’859 bounds Pel (Wn ) and Peu (Wn ) computed in
Algorithm 2 and the PO c to spread them. It can be seen that near the
rate R = 0.568, which is the rate of the code constructed by Algorithm
(i)
2, tight bounds on PW,n (K) are obtained even though Pel (Wn ) and
(i)
Peu (Wn ) is computed for only 10’859 channels.
However, it is not clear whether a sufficiently efficient implementation
for the Maximin strategy exists such that Algorithm 2 is faster than the
method in [36].
Note that the relation between the parameter γ and the block error probability is not known. Therefore, to meet a target block error
50
A Partial Order for the Synthesized Channels of SCD
Bounds on PW,n (K)
104
10−2
10−8
Upper bound by Algorithm 2
Upper bound by [36]
Lower bound by [36]
Lower bound by Algorithm 2
10−14
10−20
0.5
0.55
0.6
0.65
0.7
R
Figure 3.4: Upper and lower bounds on PW,n (K) obtained by [36] or by
Algorithm 2 with the Maximin strategy. The vertical black
line indicates the rate of the code that was constructed by
Algorithm 2.
3.7 Generalizations
51
U0
U1
U2
U3
G1
G0
G1
G0
X0
X1
X2
X3
Figure 3.5: A generalization of G2 where the building blocks G1 and
G0 , I2 are used in the two layers.
probability, Algorithm 2 needs to be run several times with different
(i)
(i)
γ. However, the determined Pel (Wn ) and Peu (Wn ) from one run of
Algorithm 2 can be used in all subsequent runs.
3.7
Generalizations
It is an interesting question if and how the POs from this chapter can
be generalized if G1 is replaced by other transforms as in Section 2.7.2.
The following example shows that (3.35) does not hold if different
building blocks are used in the recursive construction of Gn . Let n = 2
and suppose we use G1 and G0 , I2 for the two layers as in Figure 3.5.
The tree factor graph for the channels
(1)
WG1 ,G0 , U1 → Y03 , U0 ,
(2)
WG1 ,G0
, U2 → Y03 , U0 , U1
(3.56)
(3.57)
are shown in Figure 3.6. Thus, we can conclude that
(1)
(0)
(1)
(1)
WG1 ,G0 ≡ W1 ,
(3.58)
(2)
WG1 ,G0
(3.59)
≡
(0)
W1 .
Because I(W1 ) < I(W1 ) holds for nontrivial W [1, Prop. 4], Proposi(1)
(2)
tion 1.2 implies that for nontrivial W we have WG1 ,G0 WG1 ,G0 . Therefore in this example, (3.35) does not hold.
e 1 as the
Suppose we use always the same m × m binary matrix G
52
A Partial Order for the Synthesized Channels of SCD
U1
W y0
G1
W y2
(a)
W y1
U2
G1
W y3
(b)
(1)
Figure 3.6: (a) The tree factor graph for the channel WG1 ,G0 . (b) The
(2)
tree factor graph for the channel WG1 ,G0 .
building block in the recursion of Figure 2.2 as in [13]. We define
N , mn ,
en , B
en G
e ⊗n ,
G
1
N −1
N −1 e
U
,X
Gn ,
0
(3.60)
(3.61)
(3.62)
0
fn(i) , Ui → Y N −1 , U i−1
W
0
0
(3.63)
en is the base-m digit-reversal permutation matrix of dimensions
where B
N × N . From the definition of the Kronecker product in (1.1), we get
h
e ⊗n
G
1
i
=
i,j
n−1
Yh
e1
G
i
k=0
(3.64)
ik ,jk
where i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1 are the base-m representations
of i and j, respectively. With (3.64), we can generalize Lemma 3.1.
e n is base-m Significance Invariant). We can write G
e n in
Lemma 3.3 (G
the form
en = P 0 G
e n P −1
G
(3.65)
where P is any base-m significance permutation matrix of dimensions
en P B
e −1 is a permutation matrix.
N × N and P 0 = B
n
3.7 Generalizations
53
Moreover, Lemma 3.2 can be generalized to the following lemma.
Lemma 3.4 (Sufficient Condition for Degradation). Let i, j ∈ ZN and
let sπ be a base-m significance permutation on N letters. If
1. j = sπ (i)
2. Zj ⊂ sπ (Zi )
fn(i) .
fn(j) is stochastically degraded with respect to W
then W
Theorem 3.1 can be generalized to the following theorem.
Theorem 3.2. For i, j ∈ ZN , let i0 , i1 , . . . , in−1 and j0 , j1 , . . . , jn−1
be the base-m representations of i and j, respectively. If there exists
l, l0 ∈ Zn with l < l0 such that
1. jl = m − 1 and jl0 = 0
2. il = 0 and il0 = m − 1
3. For all k ∈ Zn \{l, l0 } : jk = ik
fn(i) .
fn(j) is stochastically degraded with respect to W
then W
Proof. Let sπ be the base-m significance permutation on N letters that
swaps the digits with index l and l0 . It follows that sπ (i) = j. Let
k0 , k1 , . . . , kn−1 be the base-m representation of k. We can write sπ as
0
sπ (k) = k + (kl − kl0 )ml + (kl0 − kl )ml
0
= k + (kl − kl0 ) ml − ml
and because kl − kl0 ≤ m − 1 = jl − jl0 we get for k ∈ Zj that
0
sπ (k) ≤ k + (jl − jl0 ) ml − ml
0
< j + (jl − jl0 ) ml − ml
=i
(3.66)
(3.67)
(3.68)
(3.69)
(3.70)
Therefore sπ (Zj ) ⊂ Zi and since sπ is its own inverse, the conditions of
Lemma 3.4 are satisfied.
54
A Partial Order for the Synthesized Channels of SCD
The definition of a PO analogue to 1 and the derivation of the
covering relation of the PO is left to the interested reader. Note that
for m > 2, Theorem 3.2 does not fully exploit Lemma 3.4. For example,
if n = 2, m = 3, j = 1, i = 3 and if sπ is the base-m digit-reversal
permutation we obtain
fn(3) ≡ U1 → Y07 , U0 , U3 , U6
W
(3.71)
7
U1 → Y0 , U0
(3.72)
(1)
f
≡ Wn
(3.73)
which does not follow from Theorem 3.2. Improving Theorem 3.2 to
fully exploit Lemma 3.4 is left to the interested reader.
Chapter 4
Successive Cancellation
Decoding of Discrete
Fourier Transform Codes
SCD is conceptually applicable to codes based on any transform not
only on Gn . However, it is in general only practical for transforms of
small size. This chapter addresses SCD of Discrete Fourier Transform
(DFT) codes, which are an important class of error-correcting codes
that include the widely-used Reed-Solomon codes. Although SCD in
the natural symbol order of the DFT leads to fast polarization, it is
computationally feasible only for a DFT of small size. Based on the
Cooley-Tukey Fast Fourier Transform (FFT), we found that SCD in the
digit-reversed symbol order has significantly lower complexity but comes
along with slower polarization.
For a specific example with symbol blocklength N = 256, SCD of the
DFT-code in digit-reversed order is feasible and empirically the performance is comparable to a standard binary polar code. However, SCD
of the DFT-code is much more complex. With the SCLD from [35] and
by switching some frozen and information symbols, the performance is
substantially improved and is comparable to the well-performing polar
code with outer cyclic redundancy check code from [35]. However, the
DFT-code still has a much higher decoding complexity.
The result that the Cooley-Tukey FFT can be used to reduce the
SCD complexity of DFT-codes was independently derived in [28].
55
56
Successive Cancellation Decoding of DFT-Codes
This chapter is organized as follows. DFT-codes are introduced in
Section 4.1. In Section 4.2, we briefly discuss preliminaries that are
necessary for the subsequent sections. In Section 4.3, the Cooley-Tukey
FFT is discussed, which is the basis of the SCD complexity gain. In
Section 4.4, SCD of DFT-codes in the natural and the digit-reversed
order are discussed and polar DFT-codes are defined. In Section 4.5,
an explicit example for the blocklength N = 256 is given and the block
error probability under SCLD is presented for three DFT-codes. Finally,
Section 4.6 gives conclusions and an outlook.
4.1
Discrete Fourier Transform Codes
Let F be a finite field and let α be an element in F of multiplicative
order N . The DFT matrix VN (α) is the N × N matrix over F defined by
[VN (α)]i,j = αij [4, Def. 6.1.1]. Note that we often omit the dependency
of VN (α) on α by simply writing VN .
Example 4.1 (V6 ). Let F = Z7 , α = 5 and

1 1 1 1 1
1 5 4 6 2

1 4 2 1 4
V6 = 
1 6 1 6 1

1 2 4 1 2
1 3 2 6 4
N = 6 then

1
3

2
.
6

4
5
It can be shown [4, Th. 6.1.2] that VN is invertible with
−1 VN i,j = NF−1 α−ij
(4.1)
(4.2)
where NF is the sum of N ones in F .
Definition 4.1 (DFT-Codes). The family of DFT-codes contains all
codes of the form
n
o
−1 −1 DFT
CN
(F ) , uN
VN uI ∈ F |I | , uF = 0
(4.3)
0
with F ⊂ ZN and I = ZN \F.
Analogue to Gn -codes, the elements of uF and uI are called frozen
symbols and information symbols, respectively. An important special
case of DFT-codes are the Reed-Solomon codes [4, Sec. 6.2].
4.2 Digit Representation
57
Definition 4.2 (Reed-Solomon Code). A Reed-Solomon code of blockDFT
length N and rate R is the DFT-code CN
(F ) where F = ZbN (1−R)c .
Note that DFT-codes with F being any contiguous block of bN (1 −
R)c integers are usually also called Reed-Solomon codes. However, Definition 4.2 is more suitable for this chapter. Reed-Solomon codes are
maximum-distance separable [4, Th. 6.2.1], which means that their minimum Hamming distance equals N − K + 1, where K is the number of
information symbols, i.e., K = |I|.
4.2
Digit Representation
Qk−1
Let N = i=0 Ni with Ni , i ∈ Zk , being positive integers. The following
definition is a well-known generalization of the binary representation of
integers.
Definition 4.3 (Digit Representation). The N0k−1 -representation of an
integer l ∈ ZN is defined as the unique sequence
l0k−1 ∈ ZN0 × ZN1 × . . . × ZNk−1
(4.4)
such that
l=
k−1
X
i=0
li
i−1
Y
Nj .
(4.5)
j=0
Note that the N0k−1 -representation of an integer l depends on the
order of the numbers in N0k−1 . The digits l0 and lk−1 can be interpreted
as the least significant and the most significant digits, respectively.
Example 4.2 ((N0 = 3, N1 = 2)-Representation). The (3, 2)-representation of the integers 0, 1, 2, 3, 4, 5 are given by (0, 0), (1, 0), (2, 0), (0, 1),
(1, 1), (2, 1).
Let N ↓ji denote the vector (Nj , Nj−1 , . . . , Ni ), which is empty if i > j.
Definition 4.4 (Digit-Reversal Permutation). The digit-reversal permutation bN k−1 is a permutation on N letters which maps l ∈ ZN with
0
N0k−1 -representation l0k−1 to l0 with N ↓k−1
-representation l↓k−1
. The
0
0
permutation matrix of the digit-reversal permutation bN k−1 is denoted
0
by B N0k−1 .
58
Successive Cancellation Decoding of DFT-Codes
Example 4.3. The digit-reversal permutation b3,2 is a permutation on
6 letters given by
0 1 2 3 4 5
b3,2 =
.
(4.6)
0 2 4 1 3 5
Proposition 4.1 (Inverse of the Digit-Reversal). The inverse of the
digit-reversal permutation bN k−1 is given by
0
b−1
N0k−1
= bN ↓k−1 .
(4.7)
0
Moreover, we have
B N0k−1
−1
= B N ↓k−1
.
0
(4.8)
Proof. By Definition 4.4, the permutation bN ↓k−1 maps l with N ↓k−1
0
0
k−1
k−1
0
representation equal to l↓0 to the integer l with N0 -representation
equal to l0k−1 . Hence
(4.9)
bN k−1 ◦ bN ↓k−1 (l) = l
0
0
which proves (4.7). Finally ,(4.9) and (1.47) imply (4.8).
Qk−1
Proposition 4.2. Let N 0 = j=1 Nj . Then
B N0k−1 = IN0 ⊗ B N1k−1 B(N0 , N 0 )
(4.10)
k−1
0
= B(N0 , N ) B N1
⊗ IN0 .
(4.11)
Proof. Let b, b0 be the permutations of IN0 ⊗ B N1k−1 and B N1k−1 ⊗
IN0 , respectively. With (1.47), equations (4.10) and (4.11) are equivalent
to
bN k−1 = b ◦ bN0 ,N 0
(4.12)
0
= bN0 ,N 0 ◦ b0
(4.13)
Suppose l ∈ ZN has N0k−1 -representation l0k−1 . The N1k−1 , N0 -rep
resentation of bN0 ,N 0 (l) is l1k−1 , l0 and therefore the N ↓k−1
-represen0
k−1
tation
of (b ◦ bN0 ,N 0 )(l) is l↓0 , which
proves (4.12). Moreover, the
N0 , N ↓k−1
-representation of b0 (l) is l0 , l↓k−1
and thus the N ↓k−1
1
1
0
representation of (bN0 ,N 0 ◦ b0 )(l) is l↓k−1
, which proves (4.13).
0
4.3 The Cooley-Tukey Fast Fourier Transform
4.3
59
The Cooley-Tukey Fast Fourier Transform
Let M, L be positive integers such that N = M L. The Cooley-Tukey
−1
FFT [6] is a way of efficiently computing the DFT u0N −1 = xN
VN of
0
N −1
x0 . It is based on the recursive application of the decomposition given
in [37, Th. 3.4]. However, we use the slightly different decomposition [37,
p. 68, eq. (b2)] that is given by
VN = B(L, M ) IM ⊗ VL αM B(M, L)
·TM,L IL ⊗ VM αL B(L, M )
(4.14)
where TM,L is the N × N diagonal matrix defined by [TM,L ]i,i , αiM iL
with (iM , iL ) being the (M, L)-representation of i. Note that this decomposition is valid for VN over any field F even though [37] discusses
only the complex field.
The following theorem is the key to more efficient SCD of DFT-codes.
Theorem 4.1 (Decomposition of a Permuted DFT-matrix). Let N0k−1
Qk−1
be positive integers such that N = i=0 Ni and let
N0 ,
k−1
Y
Nj ,
(4.15)
j=1
VeN , B N0k−1 VN (α)B N0k−1 ,
VeN 0 , B N1k−1 VN 0 αN0 B N1k−1 ,
−1
TeN0 ,N 0 , B N1k−1
⊗ IN0 TN0 ,N 0 B N1k−1 ⊗ IN0
(4.16)
(4.17)
(4.18)
then we have
0 VeN = IN0 ⊗ VeN 0 B(N0 , N 0 )TeN0 ,N 0 IN 0 ⊗ VN0 αN
. (4.19)
Moreover, TeN0 ,N 0 is a N × N diagonal matrix with
h
TeN0 ,N 0
i
=α
i0 ·b
N
k−1
1
( i0 )
i,i
where (i0 , i0 ) is the (N0 , N 0 )-representation of i.
(4.20)
60
Successive Cancellation Decoding of DFT-Codes
Proof. We first prove (4.19). We get with (4.14) for M = N0 , L = N 0
and (4.16) that
VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 )
(4.21)
k−1
0
·TN0 ,N 0 (IN 0 ⊗ VN0 )B(N , N0 )B N0
.
(4.22)
Propositions 4.2 and 4.1 imply B(N 0 , N0 )B N0k−1 = B N1k−1 ⊗
IN0 , which commutes with IN 0 ⊗ VN0 (see [37, eq. (2.5)]) and therefore
VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 )
(4.23)
k−1
·TN0 ,N 0 B N1
⊗ IN0 (IN 0 ⊗ VN0 ).
(4.24)
With (4.18) we obtain
VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 )B(N0 , N 0 )
· B N k−1 ⊗ IN TeN ,N 0 (IN 0 ⊗ VN ).
1
0
0
0
(4.25)
(4.26)
Another use of Proposition 4.2 gives us
VeN = B N0k−1 B(N 0 , N0 )(IN0 ⊗ VN 0 ) IN0 ⊗ B N1k−1
(4.27)
0 e
·B(N0 , N )TN0 ,N 0 (IN 0 ⊗ VN0 ).
(4.28)
Propositions 4.2 and 4.1 imply B N0k−1 B(N 0 , N0 ) = IN0 ⊗ B N1k−1
and by applying [37, Th. 2.2] twice, we finally get
VeN = IN0 ⊗ B N1k−1 VN 0 B N1k−1 B(N0 , N 0 )TeN0 ,N 0 (IN 0 ⊗ VN0 )(4.29)
which proves (4.19). To prove that TeN0 ,N 0 is a diagonal matrix satisfying
N −1 e
(4.20), let z0N −1 ,
x0 TN0 ,N 0 and let σ be the permutation correspondk−1
ing to B N1
⊗ IN0 . By using (4.18) and twice (1.45), we get for
all i ∈ ZN that
zi = xi [TN0 ,N 0 ]σ(i),σ(i)
and hence TeN0 ,N 0 is again a diagonal matrix with
h
i
TeN0 ,N 0
= [TN0 ,N 0 ]σ(i),σ(i) .
(4.30)
(4.31)
i,i
If (i0 , i0 ) denotes the (N0 , N 0 )-representation of i, we get that
(i0 , bN k−1 (i0 )) is the (N0 , N 0 )-representation of σ(i). Thus the defini1
tion of TN0 ,N 0 implies (4.20).
4.4 Successive Cancellation
4.4
61
Successive Cancellation
Let W , X → Y be a channel with input alphabet F and with uniform
distribution on the input X. Let the channels Xi → Yi , i ∈ ZN , be
independent and identically distributed to W . We define
U0N −1 , X0N −1 VN .
4.4.1
(4.32)
Partial Distances
e N −1 , U N −1 P and let SCD
Let P be a N ×N permutation matrix, let U
0
0
e
e
eN −1 . The partial distance
be applied in the symbol order U0 , U1 , . . . , U
(i)
DN (P ) corresponds to the minimum Hamming distance of the step i ∈
e N −1 and is defined analogously to D(i) in [24] as
ZN in SCD of U
0
(i)
DN (P ) , min wH 0, 0, . . . , 0, ũiN −1 P −1 VN−1 .
(4.33)
N −1
| {z }
ũ
i
i times
ũi 6=0
In [24] the partial distances were used to measure the asymptotic rate of
polarization, however we use it to measure the (non-asymptotic) errorcorrecting abilities of SCD, even though they are only a crude measure
for that.
4.4.2
Natural Symbol Order
(i)
For i ∈ ZN , let WN denote the channel
(i)
WN , Ui → Y0N −1 , U0i−1 .
(i)
(4.34)
(i)
Let DN , DN (IN ), i ∈ ZN , denote the partial distances associated with
(i)
the natural decoding order U0N −1 . Note that WN is linked to ReedSolomon codes in the sense that in both cases the contiguous block of
symbols U0i−1 is known. Thus, as Reed-Solomon codes are maximumdistance separable codes, it is not surprising [24] that
(i)
DN = i + 1
(4.35)
which is the highest possible value for a partial distance of the i-th SCD
(i)
step. It follows that DN is strictly increasing with index i and hence it
(i)
is tempting to conjecture that the best channels among WN , i ∈ ZN , are
the ones with highest index. It would follow that freezing the symbols
62
Successive Cancellation Decoding of DFT-Codes
with lowest index is a reasonable choice which results in a Reed-Solomon
code.
SCD in the natural order U0N −1 consists of successive MAP decisions
(i)
of the channels WN , i ∈ I, which in turn needs the probabilities
(4.36)
p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1
0
0
0
for all ui ∈ F . An obvious way of calculating the
in (4.36)
N probabilities
y −1 is independent of
is given in (4.39) where c = p U i−1 | Y N −1 ui−1
0
0
0
0
ui and therefore irrelevant for the decision on Ui .
p Ui | Y N −1 ,U i−1 ui y0N −1 , ui−1
0
0
0
1 X
−1 N −1
=
y0
(4.37)
p U N −1 | Y N −1 uN
0
0
0
c
N −1
ui+1
=
1 X
−1 −1 N −1
V N y0
(4.38)
p X N −1 | Y N −1 uN
0
0
0
c N −1
ui+1
=
N −1
−1 −1 1 X Y
p Xi |Yi uN
V N i yi
0
c N −1 i=0
(4.39)
ui+1
However, we need
N −i−1
A = |F | |F |
−1
(4.40)
additions,
N −i
M = (N − 1)|F |
(4.41)
N −i
multiplications and |F |
applications of VN−1 to compute (4.39) for all
ui ∈ F . This is only feasible for all i ∈ ZN if the blocklength N is small.
There exists tricks to decrease the SCD complexity, however it is still
exponential in the blocklength N and polynomial in |F |.
4.4.3
Digit-Reversed Symbol Order
Qk−1
e N −1 ,
Let N0k−1 be positive integers such that N = i=0 Ni . Let U
0
(i)
N −1
k−1
f
U0 B N0
be the symbol order of SCD. For i ∈ ZN , let WN denote
the channel
f (i) , U
ei → Y N −1 , U
e i−1
W
(4.42)
0
0
N
4.4 Successive Cancellation
63
e0
X
ũ0 ..
.
W ỹ0
ũi−1 ei
U
..
.
VeN
..
.
eN −1
X
eN −1
U
W ỹN −1
e N −1 , X
e N −1 given
Figure 4.1: A factor graph for the distribution on U
0
i
e i−1 = ũi−1 are known.
that Y0N −1 = y0N −1 , U
0
0
where the dependency on α and on the ordered factorization of N into
N0 , . . . , Nk−1 is omitted in the notation for the sake of brevity.
fn(i) )
Let IePC be the index set of the dN Re channels with largest I(W
e
e
and let FPC , ZN \IPC .
Definition 4.5 (Polar DFT-Code). A polar DFT-code of blocklength N
and rate R is the DFT-code
DFT
(4.43)
CN
bN k−1 FePC .
0
Note that a polar DFT-code depends on α (the chosen element in
F of multiplicative order N ), on the ordered factorization of N into
N0 , . . . , Nk−1 and on the transmitting channel W .
Let VeN be defined as in Theorem 4.1. We can rewrite (4.32) to get
e N −1 = X
e N −1 VeN
U
0
0
(4.44)
e N −1 , X N −1 B N k−1 −1 is a permuted version of X N −1 . If
where X
0
0
0
0
e i−1 = ũi−1 are observed, then we
we assume that Y0N −1 = y0N −1 , U
0
0
get the factor graph in Figure 4.1. By using the decomposition of VeN
from (4.19), the relation (4.44) can be represented as shown in Figure
4.2. Suppose we replace the node VeN in Figure 4.1 by the equivalent
factor graph in Figure 4.2. Then by using equivalences analogue to
those in Figure 2.5, we obtain the equivalent factor graph in Figure 4.3.
Since also VeN 0 can be decomposed by (4.19), the steps that were used
to obtain the simplified graph in Figure 4.3 can be applied recursively.
64
Successive Cancellation Decoding of DFT-Codes
π
..
.
e0
U
..
.
VN0
π
..
.
..
.
..
.
π
..
.
ekN
U
0
VN0
π
..
.
..
.
VeN 0
..
.
π
..
.
e 0
U
(N −1)N0
VN0
..
.
e(j+1)N 0 −1
X
..
.
eN −1
U
ejN 0
X
..
.
π
e(k+1)N −1
U
0
..
.
eN 0 −1
X
..
.
..
.
VeN 0
π
eN −1
U
0
..
.
e0
X
..
.
e(N −1)N 0
X
0
..
.
π
..
.
..
.
VeN 0
π
..
.
eN −1
X
VeN
Figure 4.2: The figure shows the relation (4.44), where VeN is decomposed according to (4.19). Note that in total there are N 0
copies of VN0 and N0 copies of VeN 0 . The permutation nodes
correspond to the multiplication by the entries of the diagonal matrix TeN0 ,N 0 from Theorem 4.1. More explicitly, if the
permutation nodes are numbered from top to bottom
starth
i
e
ing by zero, then the permutation σi (x) , x TN0 ,N 0
i,i
corresponds to the permutation node i.
4.4 Successive Cancellation
65
W ỹ0
.
..
VeN 0
..
.
..
.
W ỹN 0 −1
..
.
ei
U
.
..
..
.
π
..
.
VN0
π
..
.
π
W ỹjN 0
.
..
VeN 0
..
.
..
.
W ỹ(j+1)N 0 −1
..
.
W ỹ(N0 −1)N 0
.
..
VeN 0
..
.
..
.
W ỹN −1
Figure 4.3: This factor graph and the factor graph in Figure 4.1 are
Ui -equivalent which follows from Figures 1.4e and 1.4f by
noting that VN0 is a bijective transform and thus a permutation.
66
Successive Cancellation Decoding of DFT-Codes
V3
e5
U
V2
π
V3
V2
W y0
W y6
V2
W y2
W y8
V2
W y4
W y10
V2
W y1
W y7
V2
W y3
W y9
V2
W y5
W y11
(5)
f
Figure 4.4: The tree factor graph for W
12 with the ordered factorization N = 2 · 3 · 2. The permutation nodes whose corresponding permutation is the identity are omitted. The
only remaining permutation node corresponds to the permutation σ(x) , xα4 .
f (i) . An example
This leads to the tree factor graph for the channel W
N
for N = 12, i = 5 with the ordered factorization N = 2 · 3 · 2 is shown
in Figure 4.4. Hence, SCD in the digit-reversed order consists of SCD
of the transforms VNi , i ∈ Zk . Note that the SCD complexity depends
exponentially on the transform sizes (see Section 4.4.2) which decreased
from N to N0 , N1 , . . . , Nk−1 . Thus the complexity of SCD in the digite N −1 is substantially lower as compared to the natural
reversed order U
0
symbol order. To reduce the complexity the most, the blocklength N
should be factorized into prime numbers. Note that the Good-Thomas
FFT [37, Ch. 5] could also be used to obtain a similar decomposition of
4.4 Successive Cancellation
67
VN as in Theorem 4.1. However, the factors N0 , . . . , Nk−1 would need
to be relatively prime, which results in general in larger transforms and
therefore in a higher SCD complexity.
e (i) , D(i) B N k−1 , i ∈ ZN , denote the partial distances
Let D
0
N
N
e N −1 . The following proposition, whose
associated with the SCD order U
0
e (i) .
proof is in the Appendix, specifies D
N
Qk−1
0
Proposition 4.3. Let N = j=1 Nj and let (i0 , i0 ) be the (N0 , N 0 )representation of i ∈ ZN . Then
0
e (i) = (i0 + 1)D
e (i 0)
D
N
N
(4.45)
e (j)0 , j ∈ ZN 0 , are the partial distances of SCD of X N 0 −1 VeN 0 .
where D
0
N
By using Proposition 4.3 recursively, we get that
e (i) =
D
N
k−1
Y
(ij + 1)
(4.46)
j=0
where ik−1
is the N0k−1 -representation of i ∈ ZN .
0
Proposition 4.4. Let ik−1
be the N0k−1 -representation of i ∈ ZN , then
0
k−1
Y
(ij + 1) ≤ i + 1.
(4.47)
j=0
Proof. We prove by induction that
k−1
Y
(ij + 1) = 1 +
k−1
X
it
t=0
j=0
t−1
Y
(ij + 1).
(4.48)
j=0
Clearly, (4.48) is true for k = 1. Assume (4.48) is true for k = M , then
for k = M + 1 we have
M
Y
(ij + 1) = iM
M
−1
Y
j=0
(ij + 1) +
j=0
= iM
M
−1
Y
M
−1
Y
(ij + 1) + 1 +
M
X
t=0
M
−1
X
t=0
j=0
=1+
(ij + 1)
(4.49)
j=0
it
t−1
Y
j=0
(ij + 1)
it
t−1
Y
(ij + 1)
(4.50)
j=0
(4.51)
68
Successive Cancellation Decoding of DFT-Codes
where we used the induction hypothesis (4.48) for k = M to justify
(4.50). This concludes the induction step and thus (4.48) is true. By the
upper bound ij + 1 ≤ Nj for all j ∈ Zk and by (4.5), we get (4.47).
Therefore, Proposition 4.4 implies that the partial distances corresponding to the digit-reversed order are smaller or equal to the partial
distances corresponding to the natural symbol order. This suggests that
SCD in the natural order polarizes faster than SCD in digit-reversed
order. Thus, we pay a prize for having a reduction in SCD complexity.
An extreme example is N = 256 with factorization N = 28 , where for
i = 128 we have
(i)
DN = 129,
e (i) = 2.
D
N
4.5
(4.52)
(4.53)
An Example for the Blocklength 256
For DFT-codes, the alphabet size |F | has to be larger than the blocklength N (otherwise no element of multiplicative order N would exist)
and thus is already large. Moreover, as SCD in digit-reversed order still
needs SCD of the transforms VNi , i ∈ Zk , the discussion on SCD complexity in Section 4.4.2 strongly suggests that Ni should be as small as
possible. Thus, a reasonable choice is to choose N as a power of two.
However, as a DFT-code also needs a finite field with an element of multiplicative order N , the choice is restricted. Let N = 256 and let F be
the finite field with 257 elements, denoted by GF(257). There exists an
element in F of multiplicative order N . The prime factorization of 256
is 28 and hence all remaining transforms are of dimensions 2 × 2. SCD
of the 2 × 2 transform V2 with an alphabet size of 257 is computationally
demanding but feasible.
We present simulation results of SCLD for the transmitting channel
W , X → Y defined as follows. Let X be uniformly distributed over
GF(257) and let X0 , X1 , . . . , X8 be the binary representation of X. We
transmit X0 , X1 , . . . , X7 over eight uses of a BAWGNC and we denote
by Y the vector containing the eight outputs. Note that we did not use
the bit X8 and hence X = 0 and X = 256 are mapped to the same
transmitted sequence X0 , X1 , . . . , X7 . This implies that the binary inputs to the eight BAWGNC are not exactly uniformly and independently
distributed, which makes W worse than eight uses of UBAWGNC. However, for example for a SNR of 2 dB, we get I(W ) = 5.136 bits, which
4.5 An Example for the Blocklength 256
69
1
(i)
f )
I(W
256
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
i
Figure 4.5: The mutual information (with logarithm to the base 257) of
f (i) , i ∈ ZN , with blocklength factorization
the channels W
N
N = 28 , α = 3 and SNR = 2 dB.
is almost equal to the mutual information of eight uses of UBAWGNC
which is 5.137 bits.
We use the procedure from Section 1.4.1 to obtain a lower bound on
the block error probability of the block MAP decoder and refer to this
bound as the MAP bound.
4.5.1
Polar DFT-Code
(i)
f ), i ∈ ZN , for a SNR of 2 dB. It can be seen that
Figure 4.5 shows I(W
N
many channels are very close to a noiseless or useless channel and hence
are polarized. Figure 4.6 shows the same as Figure 4.5 but the order of
the channels is changed such that it corresponds to the natural symbol
order U0N −1 . We see that this polar DFT-code is far away from being
a Reed-Solomon code because there exists no long consecutive block of
bad channels.
Figure 4.7 shows the block error probability for the rate R = 1/2
polar DFT-code under SCLD. For high enough SNR, block MAP per-
70
Successive Cancellation Decoding of DFT-Codes
1
(i)
f )
I(W
256
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
0
i
Figure 4.6: The same as shown in Figure 4.5 but the x-coordinate is
permuted such that it corresponds to the natural symbol
order U0N −1 , i.e., i equals i0 bit-reversed.
4.5 An Example for the Blocklength 256
71
Block error probability
100
L=1
L=2
L=4
L=8
L = 16
L = 32
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Figure 4.7: The block error probability of the rate R = 1/2 polar DFTcode with blocklength factorization N = 28 , α = 3 under
SCLD with list size L. Code construction was carried out
for SNR = 2 dB which corresponds to using the inputs to
the channels with highest mutual information in Figure 4.5
as information symbols.
formance is achieved with a moderate list size L. In Figure 4.8a, the
block error probability of our rate R = 1/2 polar DFT-code is compared
to a polar code with similar blocklength and rate. Both codes show similar performance with an advantage for polar codes at high SNR. Note
that this disappointing result might be attributed to the “low” partial
distances of our polar DFT-code. Let us compare the complexity of doing SCLD for both codes by counting the number of real multiplications
needed on average per block. For the polar DFT-code, to perform SCD
for a 2 × 2 kernel, according to (4.41) we need 2572 real multiplications
for i = 0 and 257 for i = 1. For a polar code, SCD for the 2 × 2 binary
kernel G1 needs 4 real multiplications for i = 0 and 2 for i = 1. The
results are shown in Figure 4.8b. It can be seen that the complexity of
SCLD of the polar DFT-code is higher by a factor of about 103 . Note
that we can use the FFT trick from [20, Sec. 3.3] to lower the message
72
Successive Cancellation Decoding of DFT-Codes
passing complexity of SCD of the polar DFT-code. However, it is not
easy to get a clear and meaningful complexity measure in that case.
4.5.2
Improved Polar DFT-Code
ei
Suppose we attribute every block error in Figure 4.7 to the symbol U
with lowest index i whose final decision was wrong. Then by increasing
e224 . If we
the list size L, the errors start to concentrate on the symbol U
e224 and use the input to the next best frozen channel as additional
freeze U
e168 ), then we get a DFT-code of rate R = 1/2
information symbol (i.e. U
which we call the improved polar DFT-code. Figure 4.9 shows the block
error probability for the improved polar DFT-code under SCLD. Compared to the polar DFT-code the performance is substantially improved
without altering the decoding complexity.
4.5.3
Doubly Improved Polar DFT-Code
We can repeat this procedure to obtain the doubly improved polar DFTe102 and using
code. This results in additionally freezing the symbol U
e
U208 as an additional information symbol. Figure 4.10 shows the block
error probability for the doubly improved polar DFT-code under SCLD.
Compared to the improved polar DFT-code the performance gain is not
very pronounced. The performance for L = 32 is similar to polar codes
with an outer cyclic redundancy check code from [35], however Figure
4.8b shows that our polar DFT-code needs a much higher decoding complexity.
4.6
Conclusion and Outlook
We investigated SCD of DFT-codes. Based on the Cooley-Tukey FFT,
we found a symbol order that substantially decreased the SCD complexity compared to the natural symbol order. Although this complexity
is still impractically high for many parameters, there are DFT-codes
with reasonable blocklength such that SCD is feasible. An example is
the polar DFT-code of blocklength N = 256 considered in this chapter.
However, the SCD complexity of this code is still much higher than the
one for standard polar codes of similar blocklength and rate without
having a performance advantage (at least not for the considered transmitting channel). There are two culprits for the high SCD complexity
of DFT-codes in the digit-reversed order:
4.6 Conclusion and Outlook
73
Block error probability
100
polar DFT-code
polar code
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of multiplications per block
(a)
1010
polar DFT-code
polar code
109
108
107
106
105
104
1
2
4
8
16
32
L
(b)
Figure 4.8: (a) The block error probability under SCLD with list size
L = 32 for the polar DFT-code used in Figure 4.7 compared to a polar code of rate R = 1/2 and blocklength
N = 2048 bits. (b) The number of real multiplications
used on average to decode one block.
74
Successive Cancellation Decoding of DFT-Codes
Block error probability
100
L=1
L=2
L=4
L=8
L = 16
L = 32
MAP bound
10−1
10−2
10−3
10−4
10−5
1
1.5
2
2.5
3
SNR (dB)
Figure 4.9: The block error probability of the improved polar DFTcode of rate R = 1/2 under SCLD.
4.6 Conclusion and Outlook
75
Block error probability
100
L=1
L=2
L=4
L=8
L = 16
L = 32
MAP bound
10−1
10−2
10−3
10−4
10−5
1
1.5
2
2.5
3
SNR (dB)
Figure 4.10: The block error probability of the doubly improved polar
DFT-code of rate R = 1/2 under SCLD.
76
Successive Cancellation Decoding of DFT-Codes
1. the transform sizes N0 , N1 , . . . , Nk−1 can not be chosen freely,
2. the large alphabet size.
For the Cooley-Tukey FFT to be applicable, the kernel sizes need to
factorize the blocklength N and can thus not be chosen arbitrary. The
best we can do is to factorize N into prime numbers. Moreover, the
alphabet size has to be larger than the blocklength because we need an
element in the field of multiplicative order N to construct a DFT-code.
Hence the alphabet size is necessarily large.
One way of decreasing the high SCD complexity of DFT-codes is to
use an approximate SCD for the transforms of size N0 , N1 , . . . , Nk−1 . In
general, all decoders that are applicable to decode Reed-Solomon codes
can be used to do SCD of the transforms of size N0 , N1 , . . . , Nk−1 . For
example, it was proposed in [28] to use the Berlekamp-Massey algorithm
instead of exact SCD. Alternatively, one could try to approximate the
messages such that the storing and SPMP complexity is reduced without
deteriorating the performance too much.
Another direction for future research is to use the decoder from Chapter 5 for DFT-codes. More details are given in Section 5.8.
Finally, we give another starting point to reduce the computational
complexity caused by the high alphabet size. To that end, we consider
an idea from BCH codes [4, Sec. 6.5]. Let Fs be a subfield of F . A
BCH code is a Reed-Solomon code but with the additional requirement
that the codewords need to have all elements in the subfield Fs . This is
translated to our setting by defining the input distribution of the channel
W = X → Y to be
(
1
, if x ∈ Fs ,
pX (x) = |Fs |
(4.54)
0,
otherwise.
We can generalize the discussion by allowing Fs to be any subset of
F . Let the channels Xi → Yi , i ∈ ZN , be independent and identically
distributed to W . We define
U0N −1 , X0N −1 VN .
(4.55)
Clearly, the transmitting channel W is not a uniform input channel. But
this does not change the structure of the factor graph for the distribution
U0N −1 , X0N −1 , Y0N −1 and therefore the factor graph for SCD in the digitreversed order still decomposes into transforms of sizes N0 , N1 , . . . , Nk−1 .
The objective of restricting the input of X to values in Fs is to decrease
4.6 Conclusion and Outlook
77
the complexity of SCD. This can be achieved by storing and processing
only those elements of a message that are nonzero. However, only X0N −1
is restricted to take values in Fs and not necessarily the variables connecting the different transforms of sizes N0 , N1 , . . . , Nk−1 . Hence, it is
not obvious how much decoding complexity can be saved.
Note that if the input distribution of W is not uniform, then the
random variables U0N −1 are not independent anymore. For example, if
Fs is a subfield of F , the dependencies between the symbols U0N −1 are
given in [4, Th. 6.3.1]. These dependencies imply that not all symbols
U0N −1 can be used to transmit log2 (F ) bits of information. The case
where the input distribution is not uniform was addressed in [10] for
polar codes.
Chapter 5
A Generalized Successive
Cancellation Decoder
with Look-Ahead
In this chapter, we describe the fat-tree decoder (FTD) which is a generalization of the SCD. The FTD is applicable to any Gn -code and also
to codes based on a transform with a similar structure as Gn . The FTD
does message passing on a cycle-free factor graph of Gn that is structured
like a binary tree. Thus, it computes exact (up to numerical imprecisions) posterior probabilities. The FTD is able to decode the bits u0N −1
in any desired order. Moreover, it has the ability to take into account
N −1
the values of any frozen bits even from the future bits ui+1
. However,
to keep the complexity manageable, only a limited number of frozen bits
can be considered and the bit decoding order is not completely arbitrary.
We give two FTD strategies with a decoding complexity that is adjustable by the parameter Dmax . The time complexity
of the FTD with
2Dmax
either of the two strategies
is
O
2
N
log(N
)
and
the space com
plexity is O 2Dmax N . Simulations show that both strategies perform
significantly better than the SCD already for small Dmax .
This chapter is organized as follows. Two cycle-free factor graphs
of Gn are derived in Section 5.2 that are both necessary to describe
the FTD. Section 5.3 describes the FTD in full generality. Section 5.4
gives examples of the decoding process of a FTD and shows that SCD,
block MAP and SCD with different bit order are special cases of the
79
80
A Generalized SCD with Look-Ahead
FTD. Section 5.5 discusses ideas for getting good FTD strategies. In
Section 5.6, two specific FTD strategies are introduced. In Section 5.7,
we present simulation results for the FTD with the two strategies from
Section 5.6 and we compare them to the SCLD. Finally, Section 5.8 gives
conclusions and an outlook.
5.1
Abbreviations and Symbols
For future reference, we summarize all important abbreviations and symbols that are introduced in this chapter.
Abbreviations
FTD
FV
TV
FTFG
RG
K
N
D
MKBS
MKBLIS
fat-tree decoder
front view
top view
fat-tree factor graph
reduced graph
known
noise
decode
most known bits strategy
most known bits of lowest index strategy
Symbols
Se
S0N −1
K
N
D
µ(uD )
Le
KU
KI
li
Te
the state of the super-edge e in the FV of the FTFG
states of the edges corresponding to U0N −1
the set {i ∈ ZN |Si = K}
the set {i ∈ ZN |Si = N}
the set {i ∈ ZN |Si = D}
message along the edge UD in the TV of the RG
index set of all U0N −1 that belong to the subtree of the
edge e in the FV of the FTFG
index set of all known and incorporated U0N −1
index set of all known and ignored U0N −1
log-likelihood ratio of Ui
the score of the super-edge e in the FV of the FTFG
5.2 Cycle-Free Factor Graphs for the Polar Transform
5.2
81
Cycle-Free Factor Graphs for Gn
In this section, we derive two cycle-free factor graphs for Gn , the socalled front view (FV) and top view (TV). Although message passing is
performed on the TV, the FV is necessary to describe the FTD.
Suppose we superpose both copies of Gn−1 in Figure 2.2 and bundle
superposed edges or nodes to one super-edge or one super-node as shown
in Figure 5.1a. The same graph is shown from the top in Figure 5.1b.
Both views in Figure 5.1 are cycle-free graphs. We can use this idea
again to decompose both copies of Gn−1 in Figure 5.1a according to
the recursion in Figure 2.2 and superpose the resulting four copies of
Gn−2 . The result is again a graph with a cycle-free FV and TV. We
can repeat this procedure until we are left with a factor graph that has
a cycle-free FV and TV consisting only of G1 nodes. An example for
N = 8 is shown in Figure 5.2. We refer to both views of such a factor
graph as the fat-tree factor graph (FTFG) of Gn . Although both views
of the FTFG are cycle-free graphs and thus exact posterior probabilities
could be obtained by SPMP, the SPMP complexity is impractical for
reasonable blocklengths.
Note that a similar idea was used in [9, Sec. VI.D] to get cycle-free
factor graphs for Reed-Muller codes. Moreover, the FTFG is a particular
example of a junction tree [17, Sec. 2.2.3].
We end this section by introducing some terms. We call a super-edge
that bundles i edges an i-edge, and a super-node that bundles i copies
of the node G an i-node of G. The super-edges in both the FV and TV
of the FTFG form a binary tree. Hence the terms root, subtree, children,
sibling, descendant and ancestor are defined as in a binary tree.
5.3
Fat-Tree Decoder
This section is organized as follows. First in Section 5.3.1, states are
assigned to all super-edges in the FV of the FTFG. Then in Section
5.3.2, based on the states, a reduced graph is derived in order to decrease
the message passing complexity. Message passing on the TV of the
reduced graph is explained in Section 5.3.3. Finally in Section 5.3.4,
−1
the decoding process of one block uN
is described which consists of
0
alternately performing message passing and updating states according
to the messages and a strategy.
82
A Generalized SCD with Look-Ahead
U0
G1
U1
..
.
2
U2i
G1
U2i+1
2
..
.
G1
UN −1
..
.
Gn−1
..
.
UN −2
X0 , XN/2
2
..
.
2
2
XN/2−1 , XN −1
2
(a) Front view
Gn−1
U0N −1
N/2−1
N
2
X0
N
2
N
G1
N
2
N
2
Gn−1
N
2
N −1
XN/2
(b) Top view
Figure 5.1: The graph of Figure 2.2 but with both copies of Gn−1
superposed. An integer i below an edge or a node indicates
that i copies are bundled, i.e., the alphabet size of such an
edge is 2i .
5.3 Fat-Tree Decoder
U0
83
G1
U1
2
U2
G1
U3
G1
2
4
2
U4
G1
U5
2
G1
G1
4
8
X07
4
2
U6
G1
U7
2
(a) Front view
G1
X0
X1
2
G1
U07
2
4
8
X2
X3
2
G1
4
G1
4
G1
2
G1
X4
X5
2
2
G1
X6
X7
(b) Top view
Figure 5.2: Two cycle-free factor graphs of G8 .
84
A Generalized SCD with Look-Ahead
5.3.1
States
For each super-edge in the FV of the FTFG, we assign a state equal to
one of the following three possibilities
1. KNOWN (K) All bits of the super-edge are assumed to be known.
2. NOISE (N) All bits of the super-edge are assumed to be unknown.
3. DECODE (D) All bits of the super-edge are assumed to be unknown. We perform message passing for that super-edge.
For i ∈ ZN , let Si denote the state of the edge corresponding to Ui and
let
K , {i ∈ ZN |Si = K},
(5.1)
N , {i ∈ ZN |Si = N},
(5.2)
D , {i ∈ ZN |Si = D}.
(5.3)
The state Se of any other super-edge e is determined by the states Su
and Sd of the two children edges u, d of e in the FV of the FTFG through


D, if Su = D or Sd = D,
(5.4)
Se , K, if Su = K and Sd = K,


N, otherwise.
Hence all states are determined by S0N −1 . An example is shown in Figure
5.3a.
Additionally, bit values are assigned to every super-edge with state
equal to K. The bit value of a known Ui is its frozen value if i ∈ F or the
decision on Ui if Ui is already decoded. For any super-edge e in state
K that does not correspond to an Ui , the children edges of e are also in
state K. Then the bit values of e are obtained by applying G1 to the bit
values of the children of e according to the FTFG.
5.3.2
Reduced Graph
We remove all edges from the FTFG with state not equal to D and we
remove all G1 -nodes that have no directly connected edge of state D.
The remaining graph is called the reduced graph (RG). An example is
shown in Figure 5.3. The rule (5.4) implies that the FV of the RG is
a connected graph. Note that the structure of the TV of the RG did
5.3 Fat-Tree Decoder
U0
U1
U2
U3
85
K
N G1
2
D
D G1
2
N
D G1
4
2
U4
U5
K
K
K G1
2
D G1
4
D
D 7
D G1 8 X0
4
2
U6
U7
D
N G1
2
(a) Front view
G1
X0
X1
2
G1
U2
U3
U6
2
2
3
X2
X3
2
G1
2
G1
2
G1
2
G1
X4
X5
2
2
G1
X6
X7
(b) Top view
Figure 5.3: The FTFG for G8 with states. (a) The FV with states
satisfying (5.4). The RG is shown in bold lines. (b) The
TV of the RG. The arrow indicates the message we are
finally interested in.
86
A Generalized SCD with Look-Ahead
µ(s)
j
G1
i
i
µt (t)
µb (b)
i
Figure 5.4: An i-node of G1 . We denote by µ(s), s ∈ Zj2 , the message
along the j-edge in the direction of the arrow. We denote
by µt (t), t ∈ Zi2 , and µb (b), b ∈ Zi2 , the messages along the
top and bottom i-edge, respectively, in the direction of the
arrows.
not change compared to the FTFG but the number of edges or nodes in
one bundle decreased (for example compare Figure 5.2b to Figure 5.3b).
Therefore, the alphabet size decreased as well and this is one reason why
message passing on the TV of the RG can be feasible even for reasonable
blocklengths.
5.3.3
Message Passing
Let the messages along the edges corresponding to X0N −1 come from
the transmitting channel W as usual. From these messages, we want to
compute the message corresponding to the U s in the TV of the RG. For
example in Figure 5.3b, this corresponds to the message in the direction
of the arrow. To derive this message, we need to compute for every superedge in the TV of the RG only the message in the direction towards the
U s. Note that all nodes in the TV of the RG are of the same kind as
the node in Figure 5.4 and thus it is enough to discuss message passing
through such a node.
Message Passing Through the Node in Figure 5.4
The j-edge in Figure 5.4 was a 2i-edge before reduction. Let T02i−1 be
the states of these 2i edges inherited from the FV of the FTFG and let
us define
TK , {k ∈ Z2i |Tk = K},
(5.5)
TN , {k ∈ Z2i |Tk = N},
(5.6)
TD , {k ∈ Z2i |Tk = D}.
(5.7)
The j-edge in Figure 5.4 contains only those edges with state Tk = D
and therefore j = |TD |. Let the two edges with states T2l and T2l+1
5.3 Fat-Tree Decoder
87
belong to the l-th copy of G1 in Figure 5.4. By the construction of the
RG, at least one of T2l and T2l+1 equals D (otherwise the l-th copy of
G1 would not be in the RG) and hence we get
|TD | ≥ i.
(5.8)
T02i−1 .
Let c ∈ Z2i
The
2 represent the 2i bits of the 2i edges with states
FTFG implies that the bits c are in one-to-one correspondence with t
and b of Figure 5.4 according to
(c2k , c2k+1 ) = (tk , bk )G1 ,
k ∈ Zi .
(5.9)
As the bits cTN are assumed to be unknown, we sum over all possibilities
|T |
cTN ∈ Z2 N . As the bits cTK are assumed to be known, we plug in the
known values. We define message passing through the node in Figure
5.4 by
X
µ(cTD ) ,
µt (t)µb (b)
(5.10)
|T |
cTN ∈Z2 N
where the known bit values are plugged in for cTK and t, b are determined
by the one-to-one correspondence between c and t, b in (5.9). By replacing the summation in (5.10) by a maximization, we obtain an alternative
message passing rule given by
µ(cTD ) ,
max
|T |
cTN ∈Z2 N
µt (t)µb (b).
(5.11)
Complexity
The summation or maximization in (5.10) or (5.11), respectively, goes
over 2|TN | elements. Therefore, to compute µ(cTD ) for all cTD we need
M , 2|TD |+|TN |
(5.12)
A , 2|TD | 2|TN | − 1
(5.13)
real multiplications and
real additions or maximizations, respectively. Let us lower and upper
bound M and A. As |TN | ≥ 0, we get
M ≥ 2|TD | ,
A≥0
(5.14)
(5.15)
88
A Generalized SCD with Look-Ahead
Although the bound on A is trivial, it is the best one can obtain because
A = 0 can occur. On the other hand, with |TK | + |TN | + |TD | = 2i and
(5.8) we get |TN | ≤ |TD | and hence
M ≤ 22|TD |
A ≤ 2|TD | 2|TD | − 1 ≤ 22|TD |
(5.16)
(5.17)
As a result, increasing |TD | by one increases the number of multiplications or additions by a factor of two in the best case and a factor of four
in the worst case. Recall the definition of the set D in (5.3). For all
i-nodes of G1 in the TV of the RG, Equation (5.8) is equivalent to j ≥ i
and thus the number of edges in any super-edge is upper bounded by
|D| (see for example Figure 5.3b where |D| = 3). Combining this with
(5.16), we get that
M ≤ 22|D| ,
(5.18)
A ≤ 22|D| .
(5.19)
This suggests that D should be a “small” set and if so the message
passing complexity through the node in Figure 5.4 is manageable.
Message Interpretation
Let µ(uD ) denote the message along the edge corresponding to UD in the
TV of the RG obtained by using one of the two message passing rules
(5.10) or (5.11). In this paragraph, we specify what the rules (5.10) and
(5.11) and the message µ(uD ) represent.
We call an edge adjacent to the RG if it does not belong to the RG
but is directly connected to it. If we add to the RG
1. all adjacent edges,
2. an observation node (see Table 1.1) with value equal to the assigned
bit value for every adjacent edge with state K,
3. a channel node (see Table 1.1) with observed value yi to every edge
corresponding to a Xi
then the resulting graph is called the decoding factor graph. Figure 5.5
shows the FV of the decoding factor graph corresponding to the RG in
Figure 5.3. It is not very difficult to see that the two message passing
rules (5.10) and (5.11) correspond to SPMP (1.28) and MPMP (1.29)
5.3 Fat-Tree Decoder
U2
U3
89
G1
G1
2
4
2
G1
G1
4
4
X07
8
W y07
8
2
U6
G1
2
Figure 5.5: The FV of the decoding factor graph corresponding to the
RG in Figure 5.3.
in the TV of the decoding factor graph, respectively. Hence, we refer to
(5.10) as SPMP and to (5.11) as MPMP.
Consider the factor graph in Figure 5.6. It represents the distribution
7
y0 , û4 , û5
p u30 , u76 , sm−1
0
(5.20)
where sm−1
are the variables of all edges in Figure 5.6 not correspond0
ing to U03 or U67 . By using Figures 2.5b and 2.5c, we see that the factor
graphs in Figures 5.5 and 5.6 are (U2 , U3 , U6 )-equivalent. By Proposition 1.4, both factor graphs represent the same distribution on U2 , U3
and U6 , namely p u2 , u3 , u6 y07 , û4 , û5 . Therefore (1.30) implies that if
µ(u2 , u3 , u6 ) was obtained by SPMP then
µ(u2 , u3 , u6 ) ∝ p u2 , u3 , u6 y07 , û4 , û5
(5.21)
where we used that the message in the opposite direction in (1.30) is
constant because the (u2 , u3 , u6 )-edge is open. Note that the known
value û0 for U0 is ignored in (5.21), however the values û4 , û5 are used.
The same arguments used to obtain (5.21) can be used to prove the
following theorem which is the key result of this section.
90
A Generalized SCD with Look-Ahead
U0
G1
U1
2
U2
G1
U3
G1
2
4
2
û4 û5 G1
G1
G1
2
4
X07
8
4
W y07
8
2
U6
U7
G1
2
Figure 5.6: A factor graph that is equivalent to the factor graph in
Figure 5.5.
Theorem 5.1. Let Le be the index set of all bits U0N −1 that belong to
the subtree of the edge e in the FV of the FTFG. Let Ai denote the set
of all super-edges in the FV of the FTFG that are adjacent to some edge
in the path from Ui to the root of the FTFG. Let
[ [
KU ,
Le ,
(5.22)
i∈D e∈Ai
Se =K
c
KU
, ZN \KU ,
KI , K\KU .
(5.23)
(5.24)
If SPMP was used, then
µ(uD ) ∝ p UD | Y N −1 ,UK
0
U
uD y0N −1 , ûKU .
(5.25)
If MPMP was used, then
µ(uD ) ∝ max p U
uN ,uKI
Kc
U
N −1
N −1
Y0 ,UKU uKUc y0 , ûKU .
(5.26)
Proof. Note that the set of super-edges ∪i∈D {e ∈ Ai |Se = K} is exactly
the set of all super-edges of state K that are adjacent to the RG. We
obtain (5.25) by the same argumentation as in the preceding example.
5.3 Fat-Tree Decoder
91
We prove (5.26). The probability distribution on UD in the decoding factor graph is given by the right hand side of (5.25) and thus the
decoding factor graph represents the distribution
−1 m−1 N −1
y0 , ûKU
p uD , xN
, s0
(5.27)
0
where sm−1
denotes the variables of all edges not corresponding to UD
0
or X0N −1 . If MPMP is used, then we have by (1.32) that
−1 m−1 N −1
, s0
y0 , ûKU
(5.28)
µ(uD ) ∝ max p uD , xN
0
−1 m−1
,s0
xN
0
which is the same as
µ(uD ) ∝
max
X
x0N −1 ,s0m−1
N −1
y
p uKUc , x0N −1 , sm−1
, ûKU . (5.29)
0
0
uN ,uKI
We can use Proposition 1.6 to transform the sum over uN , uKI into a
maximization, because uN , uKI is determined by x0N −1 , and obtain
N −1
y
p uKUc , x0N −1 , sm−1
, ûKU . (5.30)
µ(uD ) ∝
max
0
0
uN ,uKI ,x0N −1 ,s0m−1
Moreover, we can transform the maximization over x0N −1 , sm−1
into a
0
−1 m−1
N −1
sum because xN
,
s
is
determined
by
u
,
therefore
we
get
0
0
0
X
N
−1
p uKUc , x0 , s0m−1 y0N −1 , ûKU (5.31)
µ(uD ) ∝ max
uN ,uKI
x0N −1 ,s0m−1
which is the same as the right hand side of (5.26).
It is important to note that the set KU depends on the sets D and
K and that we can use the values of all known bits, i.e. we can achieve
KU = K, by choosing the set D accordingly. For example, the known
value for U0 in Figure 5.3 could be used if additionally S1 is set to D.
For SPMP, we can compute for every bit Ui , i ∈ D, the log-likelihood
ratio


p Ui | Y N −1 ,UK 0 y0N −1 , ûKU
0
U

li , log
(5.32)
p Ui | Y N −1 ,UK 1 y0N −1 , ûKU
0
U
by

P
,ui =0
 uD\{i}
P
li = log
uD\{i} ,ui =1
µ(uD )

µ(uD )

.
(5.33)
92
A Generalized SCD with Look-Ahead
For MPMP, we can compute for every bit Ui , i ∈ D, the max-product
log-likelihood ratio


li , log
max
pU
max
pU
N −1

N −1
Y0 ,UKU uKUc y0 , ûKU

N −1
 (5.34)
N −1
c y
u
,
û
K
K
U
Y0 ,UKU
0
U

max
µ(uD )
max
µ(uD )
uKc \{i} ,ui =0
U
uKc
U
\{i} ,ui =1
Kc
U
Kc
U
by
li = log
uD\{i} ,ui =0
uD\{i} ,ui =1

.
(5.35)
Decisions can be made according to
(
0, if li ≥ 0,
ûi ,
1, otherwise.
(5.36)
If ûKU are correct decisions for UKU then for SPMP, (5.36) is a MAP
decision of the channel
Ui → Y0N −1 , UKU
(5.37)
whereas for MPMP, (5.36) is the decision on U
i of the symbol-wise block
MAP decision of UKUc based on Y0N −1 , UKU (see (1.5)).
5.3.4
Decoding Process
Let I and F be the indices of the information and frozen bits of the
Gn -code, respectively. Let r be the root edge of the FV of the FTFG.
Decoding the block U0N −1 works according to Algorithm 3. Note that
the FTD terminates if Sr = K, which is satisfied if and only if all bits
U0N −1 are known.
5.3 Fat-Tree Decoder
NOISE
93
DECODE
KNOWN
Figure 5.7: Allowed state transitions. Known bits remain known and
noise bits are not allowed to become known directly.
Algorithm 3 Fat-Tree Decoder
1: for i = 0 to N − 1 do
2:
if i ∈ F then
3:
Si ← K with decision ûi = 0
4:
else
5:
Si ← N
6:
end if
7: end for
8: while Sr 6= K do
9:
Compute the message µ(uD ) by SPMP or MPMP.
10:
Update the states S0N −1 according to µ(uD ) and a strategy.
11: end while
A specific choice of state transitions in Line 10 of Algorithm 3 is
called a FTD strategy. The state transitions need to satisfy Figure 5.7.
The performance under a specific FTD strategy depends on how good
the channels in (5.37) are.
Time Complexity
We divide the edges of the FV and the TV of the FTFG into n+1 vertical
layers. We label these layers with numbers in Zn+1 where the layer 0
consists of all edges corresponding to X0N −1 and the layer n consists of
all edges corresponding to U0N −1 . The FTD is able to reuse messages
between iterations similar to the SCD. Namely, if no states changed in
a certain layer of the FTFG, the RG stays the same in that layer and
we do not need to recompute the messages in that layer. The number
of reused messages in a decoding process depends heavily on the FTD
strategy and hence we can not generally specify it. However, the reuse
of messages is essential for the FTD to have manageable complexity.
94
A Generalized SCD with Look-Ahead
Space Complexity
There are 2n−l super-edges at layer l in the TV of the RG. Therefore,
we need to store
n
X
2n−l = 2N − 1
(5.38)
l=0
messages. The number of edges in every super-edge in the TV of the
RG is at most |D| and hence the alphabet size of every message is at
most 2|D| . Thus, the space complexity is O 2Dmax N where Dmax is the
maximal cardinality of |D| in the decoding process.
The space complexity for the states is O(N ) because there are 2N −1
states.
The space complexity for the assigned bit values is O(N ). To see
this, suppose that at the beginning of the decoding process, we use one
bit of storage for every Ui , i ∈ ZN , which contains the assigned bit value
for a known bit or an arbitrary value for a not yet known bit. As soon as
two sibling super-edges in the FV of the FTFG are both in state K, the
state of their parent super-edge is also K. Then the storage needed for
the bit values of the two siblings can be used to store the bit values of
the parent super-edge because the assigned bit values of the two siblings
are not needed anymore. Therefore, N bits are enough. Thus, the space complexity for the FTD is O 2Dmax N .
5.4
Examples
In this section, we give examples of a full decoding process for a Gn -code
of blocklength N = 8. In all examples, we assume the bits U0 , U4 and
U5 to be frozen and the remaining bits to be information bits. Moreover, in Section 5.4.4 we give a realistic numerical example to show that
incorporating future bits can significantly decrease the error probability
(i)
of a channel Wn .
5.4.1
Successive Cancellation
The SCD is essentially a special case of the FTD. To get a SCD, we
use SPMP and the states in Table 5.3. The FV of the RG for the
states at time 1-5 are shown in Figures 5.8 - 5.12. Decisions (state
transitions from D to K) are made according to (5.36). The strategy used
5.4 Examples
t
0
1
2
3
4
5
6
95
S0
K
K
K
K
K
K
K
S1
N
D
K
K
K
K
K
S2
N
N
D
K
K
K
K
S3
N
N
N
D
K
K
K
S4
K
K
K
K
K
K
K
S5
K
K
K
K
K
K
K
S6
N
N
N
N
D
K
K
S7
N
N
N
N
N
D
K
Table 5.3: An example for the states of a full decoding process of the
FTD leading to SCD. The states that changed relative to
the previous time step are shown in bold.
in this example is called the SC strategy and can easily be generalized to
arbitrary blocklengths N and an arbitrary set of frozen bits F. However,
there is a subtle difference between the SCD and a FTD with SC strategy.
Suppose for example that U3 is additionally frozen. Then when decoding
U2 in Figure 5.9, the known value of U3 would be taken into account,
but the SCD ignores this value.
5.4.2
SCD with Different Bit Order
The FTD is able to decode the bits U0N −1 of a Gn -code in any order. For
example, the bit decoding order U2 , U0 , U1 , U4 , U5 , U7 , U6 , U3 is obtained
in this example. We use SPMP and the states in Table 5.4. The FV of the
RG for the states at time 1-5 are shown in Figures 5.13 - 5.17. Decisions
are made according to (5.36) but only for the bit that is supposed to be
decoded. Note that although U3 is not decoded before time step 5, the
state S3 is equal to D to consider the known value for U2 in the decisions
on U1 , U7 , U6 . It is not difficult to generalize this strategy to arbitrary bit
decoding orders. However, the complexity of the FTD strongly depends
on the bit decoding order.
5.4.3
Block Maximum a Posteriori
To get a block MAP decoder (2.24), we need to use MPMP and we need
to incorporate all frozen bits in every decision. Although the complexity
for a block MAP decoder is in general impractically high, for this small
example it is easily feasible. One possibility of states leading to a block
96
A Generalized SCD with Look-Ahead
U0
U1
U2
U3
K
D G1
N
N G1
2
2
D
N G1
4
2
U4
U5
K
K G1
K
2
N G1
D
D 7
N G1 8 X0
4
4
2
U6
U7
N
N G1
2
Figure 5.8: The FV of the RG at time step 1 in Table 5.3.
U0
U1
U2
U3
K
K G1
D
N G1
2
2
K
D G1
4
2
U4
U5
K
K
K G1
2
N G1
4
D
D 7
N G1 8 X0
4
2
U6
U7
N
N G1
2
Figure 5.9: The FV of the RG at time step 2 in Table 5.3.
5.4 Examples
U0
U1
U2
U3
97
K
K G1
K
D G1
2
2
K
D G1
4
2
U4
U5
K
K G1
K
2
N G1
4
D
D 7
N G1 8 X0
4
2
U6
U7
N
N G1
2
Figure 5.10: The FV of the RG at time step 3 in Table 5.3.
U0
U1
U2
U3
K
K G1
K
K G1
2
2
K
K G1
4
2
U4
U5
K
K
K G1
2
D G1
4
K
D 7
D G1 8 X0
4
2
U6
U7
D
N G1
2
Figure 5.11: The FV of the RG at time step 4 in Table 5.3.
98
A Generalized SCD with Look-Ahead
U0
U1
U2
U3
K
K G1
2
K
K G1
2
K
K G1
4
2
U4
U5
K
K G1
K
2
D G1
K
D 7
D G1 8 X0
4
4
2
U6
U7
K
D G1
2
Figure 5.12: The FV of the RG at time step 5 in Table 5.3.
t
0
1
2
3
4
5
6
S0
K
K
K
K
K
K
K
S1
N
N
D
K
K
K
K
S2
N
D
K
K
K
K
K
S3
N
N
D
D
D
D
K
S4
K
K
K
K
K
K
K
S5
K
K
K
K
K
K
K
S6
N
N
N
N
D
K
K
S7
N
N
N
D
K
K
K
Channel
U2
U1
U7
U6
U3
→ Y07
→ Y07 , U0 , U2 → Y07 , U02 , U45
→ Y07 , U02 , U45 ,U7
→ Y07 , U02 , U47
Table 5.4: An example for the states of a full decoding process of the FTD leading to the bit decoding order
U2 , U0 , U1 , U4 , U5 , U7 , U6 , U3 . Note that because U0 , U4 and
U5 are assumed to be frozen, we do not need to decode them.
The states that changed relative to the previous time step
are shown in bold.
5.4 Examples
U0
U1
U2
U3
99
K
N G1
D
N G1
2
2
N
D G1
4
2
U4
U5
K
K G1
K
2
N G1
4
D
D 7
N G1 8 X0
4
2
U6
U7
N
N G1
2
Figure 5.13: The FV of the RG at time step 1 in Table 5.4.
U0
U1
U2
U3
K
D G1
K
D G1
2
2
D
D G1
4
2
U4
U5
K
K
K G1
2
N G1
4
D
D 7
N G1 8 X0
4
2
U6
U7
N
N G1
2
Figure 5.14: The FV of the RG at time step 2 in Table 5.4.
100
A Generalized SCD with Look-Ahead
U0
U1
U2
U3
K
K G1
K
D G1
2
2
K
D G1
4
2
U4
U5
K
K G1
K
2
D G1
D
D 7
D G1 8 X0
4
4
2
U6
U7
N
D G1
2
Figure 5.15: The FV of the RG at time step 3 in Table 5.4.
U0
U1
U2
U3
K
K G1
K
D G1
2
2
K
D G1
4
2
U4
U5
K
K
K G1
2
D G1
4
D
D 7
D G1 8 X0
4
2
U6
U7
D
K G1
2
Figure 5.16: The FV of the RG at time step 4 in Table 5.4.
5.4 Examples
101
K
U0
K G1
U1
2
K
U2
D G1
U3
K
D G1
2
D
4
D 7
K G1 8 X0
2
K
U4
K
K G1
U5
K G1
2
4
4
2
K
U6
2
K G1
U7
Figure 5.17: The FV of the RG at time step 5 in Table 5.4.
t
0
1
2
3
4
S0
K
K
K
K
K
S1
N
D
K
K
K
S2
N
N
D
K
K
S3
N
N
N
D
K
S4
K
K
K
K
K
S5
K
K
K
K
K
S6
N
D
K
K
K
S7
N
N
D
K
K
Table 5.5: An example for the states of a full decoding process of
the FTD leading to block MAP decoding. The states that
changed relative to the previous time step are shown in bold.
MAP decoder are given in Table 5.5. The FV of the RGs for the states
at time 1-3 are shown in Figures 5.18 - 5.20. If decisions are made for
all i ∈ D according to (5.36) then by Example 1.1 it might happen that
we decide for a uD that does not maximize µ(uD ). Therefore, we need
to use either
ûD = argmax µ(uD )
(5.39)
uD
which corresponds to (1.3) or we successively decide on the bits Ui , i ∈ D,
according to Algorithm 1.
102
A Generalized SCD with Look-Ahead
U0
U1
U2
U3
K
D G1
N
N G1
2
2
D
N G1
4
2
U4
U5
K
K G1
K
2
D G1
D
D 7
D G1 8 X0
4
4
2
U6
U7
D
N G1
2
Figure 5.18: The FV of the RG at time step 1 in Table 5.5.
U0
U1
U2
U3
K
K G1
D
N G1
2
2
K
D G1
4
2
U4
U5
K
K
K G1
2
D G1
4
D
D 7
D G1 8 X0
4
2
U6
U7
K
D G1
2
Figure 5.19: The FV of the RG at time step 2 in Table 5.5.
5.4 Examples
U0
U1
U2
U3
103
K
K G1
2
K
D G1
2
K
D G1
4
2
U4
U5
K
K
K G1
2
K G1
D
D 7
K G1 8 X0
4
4
2
U6
U7
K
K G1
2
Figure 5.20: The FV of the RG at time step 3 in Table 5.5.
5.4.4
The Benefits of Future Bits: A Numerical Example
Let W be the UBAWGNC with SNR = 2 dB. For n = 11, N = 2048
(i)
and i = 127, we get from simulations that Pe (Wn ) = 0.26. This is
too high for Ui to be an information bit. However, this channel has a
large potential for improvement because it is the worst channel among
(j)
all Wn , j ∈ ZN , with wH (j) = 7 (see Chapter 3). By incorporating
some future bits UA , we obtain the improved channel
f (i) = Ui → Y N −1 , U i−1 , UA .
W
n
0
0
(5.40)
fn(i) ).
By using the FTD, we are able to get a numerical estimate of Pe (W
As an example, suppose F is the index set of the frozen bits of a polar
code of blocklength N = 2048 and rate R = 0.5 designed for SNR = 2 dB.
Let A be the largest subset of {j ∈ F |j > 127} such that the FTD with
given cardinality of D can incorporate the known bits U0126 , UA , i.e.
KU = Z127 ∪ A (see Section 5.6.3 for how such a D can be determined).
fn(i) for several |D| are shown in Table 5.6.
The error probabilities of W
We see that significant gains can be obtained. However, note that not
all synthesized channels have such a large potential for improvement.
104
A Generalized SCD with Look-Ahead
|D|
(127)
f
Pe (W11 )
1
2.6 · 10−1
2
4.8 · 10−3
3
8.8 · 10−5
4
9.5 · 10−6
(127)
Table 5.6: The error probabilities of an improved version of W11
5.5
.
Ideas for Good Strategies
The number of possible FTD strategies is huge, thus this section is supposed to give some intuition on how to design efficient FTD strategies.
5.5.1
“Decode” Frozen Bits First
Suppose we can either decode Ui now or first Uj and then Ui . Decoding
Uj first has a smaller error probability for the decision on Ui , because
the channel
Ui → Y0N −1 , UKU
(5.41)
is stochastically degraded with respect to
Ui → Y0N −1 , UKU , Uj .
(5.42)
However, decoding Uj first has a larger error probability for the decision
on Uj , because the channel
Uj → Y0N −1 , UKU
(5.43)
is stochastically degraded with respect to
Uj → Y0N −1 , UKU , Ui .
(5.44)
But if Uj is a frozen bit, then it is impossible to make errors on Uj and
hence it is beneficial to “decode” Uj first. In general, it is beneficial to
“decode” as many frozen bits as early as possible. In the extreme case,
if all frozen bits are “decoded” before all information bits and if we use
MPMP, then we obtain a block MAP decoder (see Section 5.4.3).
5.5.2
Taking Decisions: Benefits and Drawbacks
In this section, we try to qualitatively understand when it is beneficial
to take a decision and when not. Assume Ui and Uj are information bits
5.5 Ideas for Good Strategies
105
and we can choose between decoding Ui now or first Uj and then Ui . For
the sake of brevity, all probability distributions in this section should be
understood to be conditioned on Y0N −1 = y0N −1 , UKU = ûKU although
this is omitted in the notation.
Suppose that deciding first on Uj leads to the reliable decision ûj , i.e.
the probability pUj (ûj ) is close to 1. We obtain the following bound on
the absolute difference between the probabilities of Ui in the two cases.
pUi (ui ) − p U |U (ui | ûj )
i
j
X
=
p Ui |Uj (ui |uj )pUj (uj ) − p Ui |Uj (ui | ûj )
(5.45)
uj
= p Ui |Uj (ui | ûj ) pUj (ûj ) − 1 + p Ui |Uj (ui | ûj ⊕ 1)pUj (ûj ⊕ 1)(5.46)
(5.47)
= p Ui |Uj (ui | ûj ⊕ 1) − p Ui |Uj (ui | ûj )pUj (ûj ⊕ 1)
≤ pUj (ûj ⊕ 1)
(5.48)
As pUj (ûj ) is almost 1, pUj (ûj ⊕ 1) = 1 − pUj (ûj ) is almost zero and
therefore the additional known value Uj = ûj changes the distribution
on Ui only minor. Therefore, roughly speaking, a reliable decision does
not substantially improve the reliability of future decisions.
On the other hand, we show by the example in Figure 5.21 that
taking reliable decisions can be indirectly beneficial for future decisions.
Suppose N = 8, KU = {0, 1, 3} and suppose we can choose between
decoding U4 now (see Figure 5.21a) or first U2 and then U4 (see Figure
5.21b). Let us assume the decision on U2 is reliable. Although the decision on U2 does not substantially improve the reliability of the decision
on U4 , it reduces the message passing complexity. Or in other words,
taking reliable decisions may bundle known bit values such that they are
accessible with lower message passing complexity. Then the freed complexity can be used to incorporate more known bits (as for example U7 in
Figure 5.21b). Thus taking decisions that are reliable is recommended.
Now suppose that deciding first on Uj leads to an unreliable decision
ûj , i.e. the probability pUj (ûj ) is not close to 1. In this case, the decision
on Ui might be substantially improved by considering the additional
known value Uj = ûj . However, the decision on Uj is not reliable and
hence we should postpone it until its reliability is improved or until
there is no hope to improve its reliability without exceeding the given
complexity.
106
A Generalized SCD with Look-Ahead
U0
U1
U2
U3
K
K G1
D
K G1
2
2
K
D G1
D
D 7
D G1 8 X0
4
2
U4
U5
D
N G1
D
2
N G1
4
4
2
U6
U7
N
K G1
2
(a) U4 → Y07 , U0 , U1 , U3
U0
U1
U2
U3
K
K G1
K
K G1
2
2
K
K G1
4
2
U4
U5
D
N G1
D
2
N G1
4
K
D 7
D G1 8 X0
4
2
U6
U7
N
K G1
2
(b) U4 → Y07 , U03
Figure 5.21: An example that shows the benefit of taking reliable decisions. The decision on U2 was taken in (b) but not in (a).
If we assume that the decision on U2 was reliable, then
the reliability of the decision on U4 is in both graphs very
similar but message passing in the TV of the RG corresponding to (b) is more efficient than for (a).
5.6 Implemented Strategies
5.5.3
107
Choose the Bits to Decode with Close Indexes
The integers i, j in Figure 5.4 correspond to the number of parallel superedges in a current layer in the FV of the RG (for example see Figure
5.3). Therefore, in view of the decoding complexity, it is desirable to
choose the set D such that there are as few as possible parallel edges in
the FV of the RG. Suppose D = {0, N − 1}, then we have two parallel
edges at all layers. On the other hand, if D = {0, 1}, then we have one
parallel edge except at the last stage. We can give the rule of thumb
that the decoding complexity tends to be lower if the elements of D are
close to each other, i.e. |i − j| is small for many i, j ∈ D.
5.6
Implemented Strategies
This section describes two specific FTD strategies, i.e. specifications of
Line 10 in Algorithm 3. Note that both strategies can be used with
SPMP or MPMP.
Let Le , KU and KI be defined as in Theorem 5.1. We use the notation
KU (D, K) and KI (D, K) to make the dependency of KU and KI on the
sets D and K visible.
In Section 5.6.1, scores are assigned to all super-edges in the FV
of the FTFG. They are needed in the description of both strategies.
Section 5.6.2 introduces the most known bits of lowest index strategy
(MKBLIS), which incorporates as many known bits of lowest index as
possible without exceeding a given complexity. Section 5.6.3 presents the
most known bits strategy (MKBS), which incorporates as many known
bits as possible without exceeding a given complexity.
5.6.1
Scores
We define for any j ∈ N the set
(j)
∆KU (D, K) , KU (D ∪ {j}, K)\KU (D, K)
(5.49)
which is the index set of all known but ignored bits UKI (D,K) that would
be used if the state Sj would be set to D. We assign to every super-edge
e in the FV of the FTFG the non-negative score value Te defined by
(j)
Te , max ∆KU (D, K) ∩ Le (5.50)
j∈N ∩Le
108
A Generalized SCD with Look-Ahead
where the result of the maximization is defined to be zero if N ∩ Le = ∅.
Therefore, the score Te equals the number of known bits from the subtree
of the super-edge e that can be incorporated maximally by setting the
state of one bit Uj in the subtree of the edge e to D. Note that the score
Te is zero if the super-edge e is in state K, because then N ∩ Le = ∅.
The scores can be computed recursively as is shown in the following
proposition whose proof is given in the Appendix.
Proposition 5.1 (Recursion for Te ). If the super-edge e corresponds to
an Ui , then we have Te = 0. Otherwise, let u, d denote the two children
edges of e with states Su , Sd and scores Tu , Td . Then we have that


Tu + |Ld |,
Te = |Lu | + Td ,


max{Tu , Td },
if (Su , Sd ) = (N, K),
if (Su , Sd ) = (K, N),
otherwise.
(5.51)
Note that |Lu | or |Ld | are equal to 2n−m where m is the depth of the
edge u or d in the FV of the FTFG, respectively.
5.6.2
Most Known Bits of Lowest Index Strategy
Let Dmax be a positive integer. The MKBLIS incorporates as many
known bits of lowest index as possible under the constraint |D| ≤ Dmax .
Definition
Let r be the root edge in the FV of the FTFG. Note that Tr = 0 if and
only if all known bits are incorporated, i.e. KU = K. The strategy is
given in Algorithm 4 where iMKBLIS is defined as follows. We start at
the root edge r and sequentially go
• to the up child edge u, if Tu > 0 or if Tu = 0,Td = 0 and Su 6= K,
• to the down child edge d, otherwise.
Then iMKBLIS is the index of the bit Ui we finally end up. A step-by-step
example is given in the next paragraph.
5.6 Implemented Strategies
109
Algorithm 4 Most Known Bits of Lowest Index Strategy
1: if |D| > 0 then
2:
Compute lj for all j ∈ D according to (5.33) or (5.35).
3:
i ← argmaxj∈D |lj |
(
0, if li ≥ 0,
.
4:
Si ← K with decision ûi =
1, otherwise.
5: end if
6: while |D| < Dmax and Tr > 0 do
7:
SiMKBLIS ← D
8: end while
In every except the first iteration of the FTD in Algorithm 3, the
MKBLIS takes a decision only on the most reliable bit among Ui , i ∈ D,
and thus the FTD terminates after |I| + 1 iterations.
Example
Suppose N = 8, Dmax = 2 and U0 , U3 , U6 and U7 are frozen bits. The
MKBLIS leads to the state transitions in Table 5.7. The states and scores
after every state transition of the first execution of the MKBLIS (i.e. the
state transitions between time step 0 and 1) are shown in Figures 5.22 5.24. Note that in Figure 5.24, the set D reached its maximal cardinality
of Dmax = 2 and therefore the first execution of the MKBLIS terminates.
Note that Dmax would need to be at least 3 to additionally incorporate
the known bits U6 and U7 . As claimed, the most known bits with lowest
index U0 , U3 are incorporated without |D| exceeding Dmax .
Complexity
We divide the edges of the FV and the TV of the FTFG into n+1 vertical
layers. We label these layers with numbers in Zn+1 where the layer 0
consists of all edges corresponding to X0N −1 and the layer n consists
of all edges corresponding to U0N −1 . Recall from Section 5.3.4 that all
messages at one layer can be reused if no states change in that layer. All
states either equal K already at the beginning of the decoding process
or do the state transitions N → D → K. Thus, for every state at most
two state transitions occur in the decoding process. But the number of
states at layer l equals 2l and hence at most 2 · 2l state transitions can
occur at layer l. Therefore we need to recalculate the messages at layer l
at most 2 · 2l times. From the TV of the FTFG follows that the number
110
A Generalized SCD with Look-Ahead
t
0
1
2
3
4
5
S0
K
K
K
K
K
K
S1
N
D
K
K
K
K
S2
N
D
D
K
K
K
S3
K
K
K
K
K
K
S4
N
N
D
D
K
K
S5
N
N
N
N
D
K
S6
K
K
K
K
K
K
S7
K
K
K
K
K
K
Table 5.7: An example for the states of a full decoding process of the
FTD with MKBLIS. We arbitrarily assumed that the most
reliable bit from UD is always the bit with lowest index.
The states that changed relative to the previous time step
are shown in bold.
U0 (K,0)
G1
U (N,0)
1
U2 (N,0)
G1
U (K,0)
3
U4
U5
U6
U7
(N,0)
(N,1)
G1
(N,1)
(K,0) G1
G1
(N,0)
(N,0) G1
(K,0)
(N,1)
G1
(N,2)
X07
(N,2)
(K,0)
Figure 5.22: The FV of the FTFG at time step 0 in Table 5.7. All
pairs, as for example (N,1), refer to the state and the
score of its super-edge. It follows that iMKBLIS = 1 and
thus S1 is set to D, which incorporates the known bit U0 .
5.6 Implemented Strategies
U0 (K,0)
G1
U (D,0)
1
U2 (N,0)
G1
U (K,0)
3
U4
(N,0)
(D,0)
G1
(D,1)
(N,1)
G1
(N,0)
G1
U5 (N,0)
U6 (K,0)
G1
U (K,0)
111
G1
(D,2)
X07
(N,2)
(K,0)
7
Figure 5.23: The FV of the FTFG as in Figure 5.22 but with S1 = D.
The quantities that changed relative to Figure 5.22 are
shown in bold. It follows that iMKBLIS = 2 and thus S2 is
set to D, which additionally incorporates the known bit
U3 .
U0 (K,0)
G1
U (D,0)
1
U2 (D,0)
G1
U (K,0)
3
U4
U5
U6
U7
(N,0)
(D,0)
G1
(D,0)
(K,0) G1
G1
(N,0)
G1
(N,0) G1
(K,0)
(D,0)
(D,2)
X07
(N,2)
(K,0)
Figure 5.24: The FV of the FTFG as in Figure 5.23 but with S2 =
D. The quantities that changed relative to Figure 5.23
are shown in bold. It follows that iMKBLIS = 4, but as
|D| = 2, the bound Dmax = 2 is already reached and
hence the first execution of the MKBLIS terminates.
112
A Generalized SCD with Look-Ahead
of messages at layer l equals 2n−l . Therefore, to decode one block we
need to calculate at most
n
X
2 · 2l · 2n−l = (n + 1)2N
(5.52)
l=0
messages. If we combine this result with the bounds (5.18) and (5.19),
we get a message passing complexity of O 22Dmax N log(N ) .
We derive the complexity of determining iMKBLIS . As the states and
scores can be determined recursively (see (5.4) and (5.51)), a change of
a state Si implies a change of the states and scores only of the ancestor
super-edges of Ui . Thus, at most n + 1 changes of states and scores
are necessary. Because there are at most Dmax + 1 state changes in one
execution of the MKBLIS and exactly |I|+1 executions of the MKBLIS,
the complexity of updating the states and scores is O(Dmax N log(N )).
Moreover, the complexity of determining iMKBLIS once according to the
states and scores is O(log(N )) and this has to be performed at most
Dmax (|I| + 1) times. Thus, updating the states and scores and finding
iMKBLIS has complexity O(Dmax N log(N )).
The space complexity for the scores is O(N ) because there are 2N −1
scores.
Discussion
The bias towards known bits of lowest index comes from going up even
if Tu and Td are both positive. Note further that we go up in the case
Tu = 0, Td = 0, Su 6= K, Sd 6= K, but we could also go down and would
still incorporate the most known bits of lowest index.
The following proposition, whose proof is given in the Appendix,
justifies why we call this strategy MKBLIS.
Proposition 5.2. Let C(D, K) be the number of contiguous known bits
of lowest index in KU (D, K):
K<m , {i ∈ K|i < m},
<m K C(D, K) ,
max
(5.53)
(5.54)
m≥0
K<m ⊂KU (D,K)
After every execution of the MKBLIS, the set D maximizes C(D, K)
among all D ⊂ ZN \K with |D| ≤ Dmax .
5.6 Implemented Strategies
113
The MKBLIS corresponds to SCD but with a possibly different bit
decoding order (as in the example of Section 5.4.2). This follows from
the proof of Proposition 5.2 which implies that KU in dependence of
time is a sequence of increasing sets and the decisions are incorporated
just after they are made. However, the bit decoding order is not fix
but depends on y0N −1 because the reliabilities of the bits UD depend on
y0N −1 .
Note that if the Gn -code is a polar code for some transmitting channel
W 0 , then the MKBLIS with Dmax = 1 is the same as the SC strategy in
Section 5.4.1.
Intuition Behind MKBLIS
When the SCD decides on Ui , then all bits U0i−1 are incorporated. When
the FTD with MKBLIS decides on Ui , then all known bits from U0i−1
are incorporated and often more, i.e. K ∩ Zi ⊂ KU . But the bits Uj , j ∈
Zi \KU , which are incorporated by the SCD but not by the FTD, were
either unreliable decisions or if they were reliable then they would not
significantly improve the reliability of the decision on Ui (see Section
5.5.2). Thus the FTD with MKBLIS can be viewed as an improved
SCD where the improvement comes from the additionally incorporated
bits KU \Zi and from only deciding on the bit Ui , i ∈ D, with highest
reliability.
Moreover, as the set D for the MKBLIS tends to contain close indexes, Section 5.5.3 indicates that the decoding complexity should be
relatively low for a FTD with |D| = Dmax .
5.6.3
Most Known Bits Strategy
The most known bits strategy (MKBS) incorporates as many known bits
as possible under the constraint |D| ≤ Dmax .
Definition
The MKBS equals the MKBLIS in Algorithm 4 but with iMKBLIS replaced by iMKBS , which is defined as follows. We start at the root edge
r and sequentially go
• to the up child edge u, if Tu > Td or if Tu = Td and Su 6= K,
• to the down child edge d, otherwise.
114
A Generalized SCD with Look-Ahead
Then iMKBS is the index of the bit we finally end up. As for the MKBLIS,
the MKBS terminates after |I| + 1 iterations.
Complexity
As for the MKBLIS, updating the scores and states and determining
iMKBS has complexity O(Dmax
N log(N )) and the message passing complexity is O 22Dmax N log(N ) .
Discussion
The bias towards incorporating the most known bits comes from going
up if Tu > Td and going down if Td > Tu . Note that we go up in the
case Tu = Td , Su 6= K, Sd 6= K, but we could also go down and would
still incorporate the most known bits.
The following proposition, whose proof is given in the Appendix,
justifies why we call this strategy MKBS.
Proposition 5.3. After every execution of the MKBS, the set D maximizes |KU (D, K)| among all D ⊂ ZN \K with |D| ≤ Dmax .
The MKBS corresponds to SCD but with a possibly different bit
decoding order (as in the example of Section 5.4.2). This follows from
the proof of Proposition 5.3 which implies that KU in dependence of
time is a sequence of increasing sets and the decisions are incorporated
just after they are made. However, the bit decoding order is not fix
but depends on y0N −1 because the reliabilities of the bits UD depend on
y0N −1 .
Note that if the Gn -code is a polar code for some transmitting channel
W 0 , then the MKBS with Dmax = 1 is the same as the SC strategy in
Section 5.4.1.
Intuition Behind MKBS
The MKBS exploits the idea in Section 5.5.1. Note that in contrast to
the MKBLIS, when the MKBS decides on the bit Ui there is no guarantee
that all known bits from U0i−1 are incorporated.
5.7
Simulation Results
We present simulation results for the FTD with MKBLIS and MKBS for
the transmitting channel UBAWGNC. In order to improve the perfor-
5.7 Simulation Results
115
mance under the FTD, we use the idea from [21] and consider also polar
codes designed for the channel UBAWGNC ασ02 for various α ∈ (0, 1].
Note that for α → 0, such a code approaches
a Reed-Muller code [21].
We call 1/σ02 the base SNR and 1/ ασ02 the design SNR. Theterm SNR
refers to the SNR of the transmitting channel UBAWGNC σ 2 for which
the polar code is used.
We measure the decoding complexity by counting the average number
of multiplications (5.12) and additions (or maximizations) (5.13) that
are needed in total to obtain the message µ(uD ). Moreover, we use the
procedure from Section 1.4.1 to obtain a lower bound on the block error
probability of the block MAP decoder and refer to this bound as the
MAP bound.
5.7.1
MKBLIS and MKBS
In this section, we compare the MKBLIS and the MKBS. We use a polar
code of rate R = 0.5, blocklength N = 2048 and base SNR = 2 dB.
For α = 1.0, the block error probability and the decoding complexity
of the FTD with MKBLIS and MKBS are shown in Figures 5.25 and
5.26, respectively. For high enough SNR, both strategies approach the
MAP bound. The decoding complexity grows exponentially with Dmax
for both strategies. Increasing Dmax by one increases the complexity by
a factor of about 2.5 which is less than the factor of 4 suggested by (5.18)
and (5.19). Note that the decoding complexity does not depend heavily
on the SNR. For the same Dmax , the MKBS strategy performs better
than the MKBLIS strategy but the MKBS has also higher decoding
complexity.
For α = 0.7, the block error probability and the decoding complexity
of the FTD with MKBLIS and MKBS are shown in Figures 5.27 and
5.28, respectively. Compared to α = 1.0, we see a marked performance
improvement at high SNR without a significant change in decoding complexity. Moreover, if we use Dmax = 5 for the MKBLIS and Dmax = 4
for the MKBS, then both have a similar decoding complexity but the
MKBS has lower block error probability. The same could also be observed for other parameters and hence the MKBS has a better trade-off
between complexity and performance.
Figure 5.29 shows the block error probability and the decoding complexity for the FTD with MKBS for various α. We see that for high
SNRs it is better to use a code with a lower α. Note that the decoding
complexity does not depend heavily on α.
116
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
1
2
3
4
5
6
Dmax
Figure 5.25: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 2 dB under the FTD with MKBLIS
and SPMP.
5.7 Simulation Results
117
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
1
2
3
4
5
6
Dmax
Figure 5.26: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 2 dB under the FTD with MKBS and
SPMP.
118
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
1
2
3
4
5
6
Dmax
Figure 5.27: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under the FTD with MKBLIS and SPMP.
5.7 Simulation Results
119
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
1
2
3
4
5
6
Dmax
Figure 5.28: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under the FTD with MKBS
and SPMP.
120
A Generalized SCD with Look-Ahead
Block error probability
100
α = 1.0
α = 0.9
α = 0.8
α = 0.7
α = 0.6
α = 0.5
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
multiplications, SNR = 2 dB
additions, SNR = 2 dB
106
0.5
0.6
0.7
0.8
0.9
1
α
Figure 5.29: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and base SNR = 2 dB under the FTD with MKBS,
Dmax = 5 and SPMP.
5.7 Simulation Results
5.7.2
121
Sum-Product and Max-Product
In this section, we compare SPMP in (5.10) with MPMP in (5.11). We
use a polar code of rate R = 0.5, blocklength N = 2048, base SNR = 2
dB and α = 0.7. The block error probability and the complexity of the
FTD with MKBS and MPMP are shown in Figure 5.30. We see that the
block error probability is slightly worse than for SPMP in Figure 5.28,
whereas the decoding complexity is almost identical (if we assume that
an addition has the same complexity as a maximization).
5.7.3
Varying the Rate
In this section, we compare the FTD with MKBS for polar codes of rates
R = 0.05 and R = 0.95. The block error probability and the decoding
complexity of the FTD with MKBS are shown in Figure 5.31 for the rate
R = 0.05 with base SNR = -9 dB and α = 0.3 and in Figure 5.32 for the
rate R = 0.95 with base SNR = 8 dB and α = 0.7. In both cases, the
parameter α was chosen to optimize the performance. It can be seen
that also for low and high rates the FTD achieves substantially better
performance than SCD. Moreover, for R = 0.05, the decoding complexity
does not grow as fast with Dmax as for R = 0.5 or R = 0.95. One possible
explanation is that the FTD with MKBS or MKBLIS increases the set D
only if not all known bits are incorporated (i.e. Tr > 0). When the FTD
reaches the phase where most of the bits are known, then all known bits
can be incorporated also with a D of cardinality smaller than Dmax . A
code with a small rate is relatively longer in that phase than a code with
a high rate. Another possible explanation is that more bits are frozen for
a code with low rate and thus the message passing complexity is closer
to the lower bounds in (5.14) and (5.15).
Table 5.8 shows the minimal Dmax that is needed by the FTD to be
a block MAP decoder. We see that the minimal Dmax increases with
decreasing α. This is not a big surprise as Reed-Muller codes, which are
approached for α → 0, are more powerful than polar codes. Moreover,
we can see that the minimal Dmax is significantly larger for R = 0.5 than
for high or low rates.
5.7.4
Varying the Blocklength
In this section, we compare the FTD with MKBS for polar codes of
various blocklengths. We use polar codes of rate R = 0.5 and base SNR
= 2 dB.
122
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
maximizations, SNR = 1 dB
maximizations, SNR = 3 dB
1
2
3
4
5
6
Dmax
Figure 5.30: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under the FTD with MKBS
and MPMP.
5.7 Simulation Results
123
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
−10
−9.5
−9
−8.5
−8
SNR (dB)
Number of operations per block
106
105
104
103
multiplications, SNR = -10 dB
multiplications, SNR = -8 dB
additions, SNR = -10 dB
additions, SNR = -8 dB
1
2
3
4
5
6
Dmax
Figure 5.31: The block error probability and the decoding complexity
for a polar code of rate R = 0.05, blocklength N = 2048
and design SNR = -3.7712 dB under the FTD with MKBS
and SPMP.
124
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
7
7.5
8
8.5
9
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 7 dB
multiplications, SNR = 9 dB
additions, SNR = 7 dB
additions, SNR = 9 dB
1
2
3
4
5
6
Dmax
Figure 5.32: The block error probability and the decoding complexity
for a polar code of rate R = 0.95, blocklength N = 2048
and design SNR = 9.5490 dB under the FTD with MKBS
and SPMP.
5.7 Simulation Results
R = 0.05
R = 0.5
R = 0.95
α = 1.0
24
93
28
125
α = 0.9
24
102
32
α = 0.8
26
110
36
α = 0.7
27
120
41
α = 0.6
28
135
44
Table 5.8: The minimal Dmax that is needed by the FTD to be a block
MAP decoder for a polar code of blocklength N = 2048.
The base SNRs are -9 dB, 2 dB and 8 dB for the rates 0.05,
0.5 and 0.95, respectively.
N = 256
N = 2048
N = 8192
α = 1.0
20
93
302
α = 0.9
20
102
341
α = 0.8
21
110
385
α = 0.7
21
120
431
α = 0.6
22
135
487
Table 5.9: The minimal Dmax that is needed by the FTD to be a block
MAP decoder for a polar code of rate R = 0.5 and base
SNR = 2 dB.
The block error probability and the decoding complexity of the FTD
with MKBS for N = 256 and α = 0.1 are shown in Figure 5.33. We see
that even for α = 0.1 the FTD with MKBS approaches the MAP bound.
The block error probability and the decoding complexity of the FTD
with MKBS for N = 8192 and α = 1.0 are shown in Figure 5.34. We see
that for high enough SNR the FTD with MKBS is able to approach the
MAP bound for α = 1.0. However, the higher the blocklength, the more
difficult it is to achieve the MAP bound with manageable complexity.
But it can also be seen in Figure 5.34 that the performance gain obtained
by increasing Dmax is much larger than for smaller blocklengths.
Table 5.9 shows the minimal Dmax that is needed by the FTD to be
a block MAP decoder. It can be seen that the minimal Dmax increases
markedly with increasing blocklength.
5.7.5
Experiments
In this section, we modify the MKBS slightly and see what happens. We
use a polar code of rate R = 0.5, blocklength N = 2048, base SNR = 2
dB and α = 0.7.
126
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1.5
2
2.5
3
3.5
SNR (dB)
Number of operations per block
106
105
104
103
multiplications, SNR = 1.5 dB
multiplications, SNR = 3.5 dB
additions, SNR = 1.5 dB
additions, SNR = 3.5 dB
1
2
3
4
5
6
Dmax
Figure 5.33: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 256
and design SNR = 12 dB under the FTD with MKBS
and SPMP.
5.7 Simulation Results
127
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
MAP bound
10−1
10−2
10−3
10−4
10−5
1
1.5
2
2.5
SNR (dB)
Number of operations per block
107
106
105
104
multiplications, SNR = 1 dB
multiplications, SNR = 2.5 dB
additions, SNR = 1 dB
additions, SNR = 2.5 dB
1
2
3
4
Dmax
Figure 5.34: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 8192
and design SNR = 2 dB under the FTD with MKBS and
SPMP.
128
A Generalized SCD with Look-Ahead
Our first experiment is to decide on the bit Ui , i ∈ D, with lowest
index i instead of highest reliability. Figure 5.35 shows the block error probability and the decoding complexity of the FTD with modified
MKBS. It can be seen that the modification applied to the MKBS substantially deteriorates the performance. Thus, it is very beneficial for the
FTD to be able to postpone the decisions on the less reliable bits until
more frozen bits are incorporated. Note that the modification applied to
the MKBS reduced the decoding complexity by a factor of almost two
for high Dmax .
Our second experiment is to decide on the two most reliable bits from
Ui , i ∈ D. The intention is to get a better trade-off between performance
and decoding complexity. Figure 5.36 shows the block error probability
and the decoding complexity for the FTD with modified MKBS. It can
be seen that the modified MKBS with Dmax = k has an almost identical
block error probability as the MKBS with Dmax = k − 1 (see Figure
5.28). However, the decoding complexity of the modified MKBS with
Dmax = k is higher than the decoding complexity of the MKBS with
Dmax = k − 1. Thus, the modified MKBS has a worse trade-off between
decoding complexity and performance than the MKBS.
5.7.6
Comparison to the Successive Cancellation
List Decoder
In this section, we compare the SCLD and the FTD with MKBS. The
SCLD can also achieve block MAP performance of polar codes with a
manageable complexity [35]. We give two extreme examples to show
that the SCLD and the FTD have different strengths and weaknesses
Suppose that at some stage in the FTD, the message µ(uD ) is {0, 1}valued and equals 1 if and only if
uD ∈ {(0, 0, . . . , 0), (1, 1, . . . , 1)}.
(5.55)
From (5.33) or (5.35), we obtain that li = 0 for all i ∈ D. Thus, although
only two out of 2|D| probabilities are positive, all possible decisions are
as unreliable as possible. However, in the same situation, the SCLD
with list size L = 2 would be able to proceed without taking unreliable
decisions.
On the other hand, suppose ûi−1
are correct decisions and that there
0
are m consecutive information bits Uii+m−1 with probabilities
1
p uii+m−1 y0N −1 , ûi−1
= m
0
2
(5.56)
5.7 Simulation Results
129
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
Number of operations per block
SNR (dB)
107
106
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
105
104
1
2
3
4
5
6
Dmax
Figure 5.35: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under the FTD with modified MKBS and SPMP. The modified MKBS decides on
the bit Ui , i ∈ D, with lowest index i instead of highest
reliability.
130
A Generalized SCD with Look-Ahead
Block error probability
100
Dmax = 1
Dmax = 2
Dmax = 3
Dmax = 4
Dmax = 5
Dmax = 6
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
multiplications, SNR = 1 dB
multiplications, SNR = 3 dB
additions, SNR = 1 dB
additions, SNR = 3 dB
105
104
1
2
3
4
5
6
Dmax
Figure 5.36: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under the FTD with modified
MKBS and SPMP. The modified MKBS decides on the
two most reliable bits from Ui , i ∈ D.
5.8 Conclusion and Outlook
131
m
for all ui+m−1
∈ Zm
is likely to
2 . Then, the SCLD with list size L < 2
i
prune the correct path. However, the FTD is likely to have only one state
of the bits Uii+m−1 equal to D and can use the remaining complexity to
incorporate additional frozen bits which make the decisions on Uii+m−1
more reliable.
We compare SCLD and the FTD with MKBS in terms of performance
and decoding complexity. We use polar codes of rate R = 0.5 and base
SNR = 2 dB. The block error probability and the decoding complexity of
the SCLD for the blocklength N = 2048 and α = 0.7 are shown in Figure
5.37. Note that the decoding complexity for the SCLD is independent of
the SNR. The SCLD with L = 16 needs 830’080 real multiplications and
259’940 real additions per block. The FTD with MKBS and Dmax = 4
needs on average 844’960 real multiplications and 640’150 real additions
per block. Thus, the two decoders have roughly the same decoding
complexity. Figure 5.38 shows the block error probability of the two
decoders. It can be seen that the SCLD is better than the FTD with
MKBS.
Figure 5.39 shows the block error probability of the two decoders
for the blocklength N = 8192 and α = 0.8. It can be seen that the
performance difference is similar to Figure 5.38. Qualitatively the same
could be observed also for other parameters.
5.8
Conclusion and Outlook
We introduced the FTD, which is a generalization to SCD. We gave two
specific strategies for the FTD which led to substantially better performance than SCD. For many parameters, the FTD approaches block
MAP performance of polar codes with manageable complexity. For a
fixed decoding complexity, the performance of the FTD is not quite as
good as the one of the SCLD. Nevertheless, as the FTD is very flexible,
it is promising to do more research.
We think that designing a Gn -code specifically for a FTD strategy
is a very promising way of improving the performance of the FTD. For
example, suppose we first fix a bit decoding order such that the FTD is
able to decode in this order without violating |D| ≤ Dmax . Then we determine the quality of the resulting bit channels in (5.37) by simulations
and use the best dN Re channels to transmit information.
An interesting but very complicated problem is to determine which
known but ignored bits UKI should be incorporated such that the relia-
132
A Generalized SCD with Look-Ahead
Block error probability
100
L=1
L=2
L=4
L=8
L = 16
L = 32
MAP bound
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Number of operations per block
107
106
105
multiplications
additions
104
1
2
4
8
16
32
L
Figure 5.37: The block error probability and the decoding complexity
for a polar code of rate R = 0.5, blocklength N = 2048
and design SNR = 3.549 dB under SCLD with various
list sizes L.
5.8 Conclusion and Outlook
Block error probability
100
133
FTD with MKBS and Dmax = 4
SCLD with L = 16
10−1
10−2
10−3
10−4
1
1.5
2
2.5
3
SNR (dB)
Figure 5.38: The block error probability for a polar code of rate R =
0.5, blocklength N = 2048 and design SNR = 3.549 dB
under SCLD and FTD with MKBS of similar decoding
complexity.
134
A Generalized SCD with Look-Ahead
Block error probability
100
FTD with MKBS and Dmax = 4
SCLD with L = 16
10−1
10−2
10−3
10−4
10−5
1
1.5
2
2.5
SNR (dB)
Figure 5.39: The block error probability for a polar code of rate R =
0.5, blocklength N = 8192 and design SNR = 2.9691 dB
under SCLD and FTD with MKBS of similar decoding
complexity.
5.8 Conclusion and Outlook
135
bility of the bits UD is increased the most. Solving this problem would
be very useful for improving the performance of the FTD. Note that
this problem is related to the problem of determining the quality of the
channels in (5.37). One could try to use the combined partial order from
Chapter 3 to obtain bounds for these channels.
The SCLD and the FTD work very different and one could call them
orthogonal to each other. The discussion in Section 5.7.6 suggests that
a combination of the FTD and the SCLD would result in a decoder
superior to both.
The SPMP complexity of the FTD can be reduced by using the FFT
trick mentioned in [20, Sec. 3.3]. It would be interesting to determine
how much can be saved and whether this would close the gap to the
SCLD. Moreover, we think that the FTD is easier to build if we use this
FFT trick.
If the bit decisions for |D| = k are reliable, then it is not necessary to
use |D| > k. Thus, one could adjust |D| to the reliability of the decisions
which would save some decoding complexity. It would be interesting to
determine how much can be saved. Note that such kind of mechanism
was also given in [18] and [5] for the SCLD.
The space complexity and the decoding complexity of the FTD grows
exponentially with Dmax . However, there might be a way to approximate
messages that live on a large alphabet such that the space complexity and
the time complexity is reduced without deteriorating the performance
too much. As a result, we could use a higher Dmax .
The FTD is applicable also to other codes than Gn -codes. For example, consider the DFT-codes from Chapter 4. One can derive a cycle-free
front view and top view of the factor graph in Figure 4.2 and thus the
FTD is also applicable to DFT-codes. Although the high alphabet size
of a DFT-code of practical blocklength prohibits to have |D| > 1, there
is still the freedom of which i ∈ I is chosen to be in D. Moreover, it
might even be possible to use a set D with more than one element by
approximating the messages as mentioned in the previous item. Note
that the FTD is applicable also to all codes discussed in the first part of
Section 2.7.2.
Finally, the trade-off between decoding complexity and performance
might be better if we use many FTD with different strategies and with
lower Dmax in parallel. For example, one could even use the FTD many
times with the same strategy but every time on a different representation
of the Gn -code obtained from Lemma 3.1 (see also [11, Sec. II.B]).
Appendix A
Proofs
A.1
Proof of Proposition 1.2
This proposition is proved along the same line as [36, Lemma 3]. Let Wi
be the channel Xi → Yi for i = 1, 2. The condition W1 W2 implies
the existence of distributions p(y1 |y2 ) such that
X
pX1 ,Y1 (x, y1 ) =
p(y1 |y2 )pX2 ,Y2 (x, y2 ).
(A.1)
y2
Define
Y10
such that
pX2 ,Y2 ,Y10 (x2 , y2 , y10 ) = p(y10 |y2 )pX2 ,Y2 (x2 , y2 ).
(A.2)
Then X2 , Y2 , Y10 form a Markov chain in that order according to [7, p.
34]. The data-processing inequality [7, Th. 2.8.1] implies
I(W2 ) = I(X2 ; Y2 ) ≥ I(X2 ; Y10 ).
(A.3)
But X2 , Y10 have the same joint distribution as X1 , Y1 and thus we get
I(X2 ; Y10 ) = I(W1 ), which proves (1.23). The relation (1.24) follows from
(1.23) because (1.16) implies H(X1 ) = H(X2 ) and thus
H(W1 ) = H(X1 |Y1 )
(A.4)
= H(X1 ) − I(W1 )
(A.5)
≥ H(X1 ) − I(W2 )
(A.6)
= H(X2 ) − I(W2 )
(A.7)
= H(W2 )
(A.8)
137
138
Proofs
Finally, (1.25) follows from
Pe (W1 ) = P X2 6= X̂2 (Y10 )
= 1 − P X2 = X̂2 (Y10 )
(
)
X
X
0
=1−
max
pX2 ,Y2 ,Y10 (x2 , y2 , y1 )
x2
y10
≥1−
XX
y10
=1−
y2
X
=1−
max pX2 ,Y2 ,Y10 (x2 , y2 , y10 )
x2
x2
X
(A.11)
X
p(y10 |y2 )
(A.12)
(A.13)
y10
max{pX2 ,Y2 (x2 , y2 )}
x2
y2
(A.10)
y2
max{pX2 ,Y2 (x2 , y2 )}
y2
(A.9)
(A.14)
= 1 − P X2 = X̂2 (Y2 )
(A.15)
= Pe (W2 )
(A.16)
where we used for (A.9) that X2 , Y10 have the same joint distribution as
X1 , Y1 and we used for (A.12) the inequality
X
X
g(x, y) ≤
max g(x, y).
(A.17)
max
x
A.2
y
y
x
Proof of Proposition 1.5
We first prove all equivalences without assuming an additional factor
graph. Let fl and fr denote the functions that are represented by the
left hand side and the right hand side in Figure 1.4, respectively. In
Figure 1.4a, we have
fl (x, y, z) = δz,x⊕y
fr (x, z) = 1
(A.18)
(A.19)
For every pair (x, z), there is exactly one y such that x ⊕ y = z and
therefore
X
fl (x, y, z) = 1
(A.20)
y
A.2 Proof of Proposition 1.5
139
which proves Figure 1.4a. In Figure 1.4b, we have
fl (x, y, z) = δy,y0 δz,x⊕y
fr (x, z) = δz,σ(x)
where σ(x) = x ⊕ y 0 . We get
X
fl (x, y, z) = δz,x⊕y0
(A.21)
(A.22)
(A.23)
y
which proves Figure 1.4b. In Figure 1.4c, we have
fl (x, y, z) = δx,y δy,z
fr (x, z) = δx,z
(A.24)
(A.25)
We get
X
fl (x, y, z) = δx,z
(A.26)
y
which proves Figure 1.4c. In Figure 1.4d, we have
fl (x, y, z) = δy,y0 δx,y δy,z
(A.27)
fr (x, z) = δx,y0 δz,y0
(A.28)
We get
X
fl (x, y, z) = δx,y0 δy0 ,z
(A.29)
y
which proves Figure 1.4d. In Figure 1.4e, we have
fl (x, y) = δx,σ(y)
fr (x) = 1
(A.30)
(A.31)
For every x, there is exactly one y such that x = σ(y) and therefore
X
fl (x, y) = 1
(A.32)
y
which proves Figure 1.4e. In Figure 1.4f, we have
fl (x, y) = δy,y0 δx,σ(y)
fr (x) = δx,σ(y0 )
(A.33)
(A.34)
140
Proofs
We get
X
fl (x, y) = δx,σ(y0 )
(A.35)
y
which proves Figure 1.4f.
The second claim of this proposition follows readily by noting that
an additional factor independent of y can be factored out of the sum
over y.
A.3
Proof of Proposition 3.1
Let j0 , j1 , . . . , jn−1 and i0 , i1 , . . . , in−1 denote the binary representations
of j and i, respectively. We define Jb , {o ∈ Zn |jo = b} and Ib , {o ∈
Zn |io = b} for b ∈ {0, 1}. The first condition of Lemma 3.2 implies that
jo = iπ(o) for o ∈ Zn and thus
π(J1 ) = I1 ,
(A.36)
π(J0 ) = I0 .
(A.37)
Let t , wH (i) = wH (j). Let l0 , l1 , . . . , lt−1 and m0 , m1 , . . . , mt−1
denote the elements of J1 and I1 in ascending order, respectively. We
first prove that lk ≤ mk for all k ∈ Zt by contradiction. Assume there
exists a k ∈ Zt such that lk > mk and consider the four sets
J1≥lk , {o ∈ J1 |o ≥ lk },
(A.38)
I1≥lk , {o ∈ I1 |o ≥ lk },
(A.39)
J0<lk
I0<lk
, {o ∈ J0 |o < lk },
(A.40)
, {o ∈ I0 |o < lk }.
(A.41)
The assumption lk > mk implies
≥lk ≥lk I1 < J1 .
(A.42)
Then by (A.36) there exists a q ∈ J1≥lk such that π(q) < lk . Moreover,
the assumption lk > mk implies
<lk <lk (A.43)
I0 < J0 .
A.3 Proof of Proposition 3.1
141
Then by (A.37) there exists a q 0 ∈ J0<lk such that π(q 0 ) ≥ lk . Thus we
have
q > q0 ,
(A.44)
π(q) < π(q 0 ).
(A.45)
0
Let v , j − 2q + 2q . Since jq = 1 and jq0 = 0, the binary representation
v0 , v1 , . . . , vn−1 of v is given by
1. vq = 0
2. vq0 = 1
3. For all o ∈ Zn \{q, q 0 } : vo = jo
Moreover, from (A.44) follows that 0 ≤ v < j. But
sπ−1 (v) =
=
=
n−1
X
vπ−1 (m) 2m
m=0
n−1
X
0
vm0 2π(m )
m0 =0
n−1
X
(A.47)
!
jm0 2
m0 =0
π(q)
=i−2
(A.46)
π(m0 )
0
− 2π(q) + 2π(q )
0
+ 2π(q )
(A.48)
(A.49)
and with (A.45) we get sπ−1 (v) > i which is a contradiction to the
second condition of Lemma 3.2. Thus, we can conclude that lk ≤ mk
for all k ∈ Zt . Algorithm 5 constructs a sequence a0 , a1 , . . . , ak as in
(j)
(i)
Definition 3.6 which proves Wn 1 Wn .
Algorithm 5 Generate a0 , a1 , . . . , ak according to Definition 3.6
1: a0 ← j
2: k ← 0
3: for o = t − 1 to 0 do
4:
if lo < mo then
5:
k ←k+1
6:
ak ← ak−1 − 2lo + 2mo
7:
end if
8: end for
142
Proofs
A.4
Proof of Proposition 3.2
We need the following weight function
w(i) ,
n−1
X
ik k
(A.50)
k=0
where i0 , i1 , . . . , in−1 is the binary representation of i. If j % i then
w(j) < w(i) and therefore we get
Wn(j) ≺1 Wn(i) ⇒ w(j) < w(i).
(A.51)
For the ⇐ part, we need to prove
(j)
(i)
(j)
(m)
1. Wn ≺1 Wn ,
2. Wn 1 Wn
(j)
(i)
(m)
≺1 Wn ⇒ Wn = Wn
.
(i)
≺1 Wn . Assume by contradiction
condition j % i implies
(i)
(m)
(j)
(m)
there is a channel Wn such that Wn ≺1 Wn ≺1 Wn . By
(j)
Wn
The
that
(A.51), we have w(j) < w(m) < w(i). But the weight w(.) is integer
valued and there is no integer between w(j) and w(i) = w(j) + 1, which
is a contradiction.
(i)
(j)
Let us prove the ⇒ part. Since Wn is covered by Wn we have
(j)
(i)
Wn ≺1 Wn . Let a0 , a1 , . . . , ak be a sequence in Definition 3.6 corre(j)
(i)
sponding to Wn ≺1 Wn . This sequence contains at least two elements,
(j)
(i)
otherwise Wn = Wn . Moreover, it contains at most two elements, otherwise for any m ∈ {1, . . . , k − 1}
Wn(j) ≺1 Wn(am ) ≺1 Wn(i)
(j)
(i)
(A.52)
which contradicts that Wn is covered by Wn . Therefore, the sequence
consists of exactly two elements and hence we get j = a0 % a1 = i. It
remains to prove that l0 = l + 1. Let k be the smallest integer larger
than l such that ik = 1, which exists because il0 = 1. It follows that
ik−1 = 0. The proof can be divided into the following three cases
1. Case 1: l = k − 1, k = l0
2. Case 2: l = k − 1, k < l0
3. Case 3: l < k − 1
A.5 Proof of Proposition 3.6
l
1
1
0
j
m
i
143
l0
0
1
1
k
1
0
1
(a) Case 2
j
m
i
l
1
0
0
k−1
0
1
0
l0
0
0
1
(b) Case 3
Table A.1: The bits of j, m, i ∈ ZN with bit indexes l, k − 1, k, l0 ∈ Zn
are shown for the cases 2 and 3. All bits of m not listed in
the tables are equal to the bits of j (and i). In both cases,
we have j % m % i.
whereas the first case corresponds to l0 = l + 1. Table A.1 shows that
for the cases 2 and 3 there exists an m ∈ ZN such that
Wn(j) ≺1 Wn(m) ≺1 Wn(i) .
(A.53)
(i)
(j)
But this contradicts that Wn is covered by Wn and therefore l0 = l+1.
A.5
Proof of Proposition 3.6
(j)
(i)
(j)
We first prove the ⇒ part. Since Wn →c Wn we have Wn ≺c
(i)
Wn . Let a0 , a1 , . . . , ak be a sequence as in Definition 3.8 corresponding
(j)
(i)
to Wn ≺c Wn . This sequence a0 , a1 , . . . , ak contains at least two
(j)
(i)
elements, otherwise Wn = Wn . Moreover, it contains at most two
elements, otherwise for any m ∈ {1, . . . , k − 1}
Wn(j) ≺c Wn(am ) ≺c Wn(i)
(j)
(i)
(A.54)
which contradicts Wn →c Wn . Thus, the sequence consists of exactly
(j)
(i)
(j)
(i)
two elements and hence we get either Wn →1 Wn or Wn →2 Wn .
(j)
(i)
If Wn →2 Wn then it follows that there exists a bit index l ∈ Zn such
that jl = 0, il = 1 and for all k ∈ Zn \{l} : jk = ik . We want to show
that l 6= 0 leads to a contradiction which would conclude the proof of
the ⇒ part. Hence, assume l 6= 0 and let us consider the following two
cases
1. Case 1: j0 = 0
2. Case 2: j0 = 1.
144
Proofs
j
m
i
l
0
0
1
0
0
1
0
j
m
i
(a) Case 1
l
0
1
1
0
1
0
1
(b) Case 2
Table A.2: The bits of j, m, i ∈ ZN with bit indexes 0, l ∈ Zn are shown
for the cases 1 and 2. All bits of m not listed in the tables
are equal to the bits of j (and i).
(j)
If j0 = 0, let m ∈ ZN be defined as in Table A.2a. It follows Wn →2
(m)
(j)
and hence Wn ≺c Wn . Moreover, we have m % i and hence
(i)
(m)
(i)
≺1 Wn which implies Wn ≺c Wn . Hence, we found an m ∈
ZN such that
(m)
Wn
(m)
Wn
Wn(j) ≺c Wn(m) ≺c Wn(i)
(A.55)
(i)
(j)
which contradicts Wn →c Wn .
If j0 = 1, let m ∈ ZN be defined as in Table A.2b. It follows j % m
(m)
(j)
(m)
(j)
which implies Wn ≺c Wn . Moreover,
and hence Wn ≺1 Wn
(i)
(m)
(i)
(m)
≺c Wn . As before, this
→2 Wn and hence Wn
we have Wn
(j)
(i)
contradicts Wn →c Wn and we can conclude l = 0 which proves the
⇒ part.
For the ⇐ part, we need to prove
(j)
(i)
(j)
(m)
1. Wn ≺c Wn ,
2. Wn c Wn
(i)
(j)
(m)
≺c W n ⇒ W n = W n
.
(j)
(i)
Any of the two conditions in this proposition implies Wn ≺c Wn . We
assume by contradiction that there is a m ∈ ZN such that
Wn(j) ≺c Wn(m) ≺c Wn(i) .
(j)
(i)
(A.56)
If Wn →1 Wn then it follows wH (j) = wH (i) and therefore both ≺c
relations in (A.56) can only be decomposed into →1 relations, which
(j)
(i)
(j)
(i)
contradicts Wn →1 Wn . Now assume Wn →2 Wn with j0 = 0, i0 =
1. It follows that w(j) = w(i) with w(.) defined in (A.50) and therefore
by (A.51) both ≺c relations in (A.56) can only be decomposed into →2
(j)
(i)
relations, which contradicts Wn →2 Wn .
A.6 Proof of Proposition 4.3
A.6
145
Proof of Proposition 4.3
We get from (4.33) with P = B N0k−1 that
e (i) = min wH 0, 0, . . . , 0, ũN −1 B N k−1 −1 V −1
D
0
i
N
N
(A.57)
N −1
ũ
i
ũi 6=0
= min wH
0, 0, . . . , 0, ũiN −1 VeN−1 B N0k−1
(A.58)
= min wH
0, 0, . . . , 0, ũiN −1 VeN−1
(A.59)
N −1
ũ
i
ũi 6=0
N −1
ũ
i
ũi 6=0
where we used (4.16) and the fact that the Hamming weight of a vector is
independent of the order of its elements. We decompose VeN as in Figure
4.2. Without loss of generality, suppose that kN0 ≤ i ≤ (k + 1)N0 − 1.
Then the set over which we minimize in (A.59) satisfies
1. ũj = 0 for 0 ≤ j ≤ kN0 − 1,
2. ũj can be chosen freely for (k + 1)N0 ≤ j ≤ N − 1.
But VN0 and the permutations in Figure 4.2 are linear, bijective maps
and thus they map zeros to zeros and the set F N0 to F N0 . Therefore we
e (i) equals the minimal Hamming weight of all X
e N −1 obtained
get that D
0
N
if in Figure 4.3
1. the values of all known symbols, except those corresponding to
y0N −1 , are set to zero,
ei ,
2. the values of all open edges, except the one corresponding to U
are allowed to take any value,
ei is allowed to take
3. the value of the open edge corresponding to U
any value except zero.
It follows from (4.35) that at least i0 + 1 of the N0 connections between
the VN0 and VeN 0 layers in Figure 4.3 are nonzero. By the definition of
e
partial distances, this can also be achieved with equality. Let X(j)
be the
N −1
0
e
N variables of X0
that are directly linked to the connection j ∈ ZN0
through the transform VeN 0 . For every connection j with a zero value, if
the remaining open edges of the VeN 0 transform are set to zero, then all
e
values of X(j)
are zero. For every connection j with a nonzero value, at
146
Proofs
(i0 )
e 0 values of X(j)
e
least D
are forced to be nonzero by the definition of
N
0
(i
)
e 0 . Moreover, this can also be achieved with equality. To see this, let
D
N
w 6= 0 denote the value of the connection j and let w̃iN0
w̃i0 6= 0 and
0
e (i 0) = wH 0, 0, . . . , 0, w̃N0 0 −1 Ve −1
D
0
i
N
N
0
−1
be such that
(A.60)
0
e (i 0) . Then this implies
which needs to exists by the definition of D
N
0
N 0 −1 e −1
e (i 0) = wH ww̃−1
D
0,
0,
.
.
.
,
0,
w̃
V
(A.61)
0
0
0
i
i
N
N
and hence the vector 0, 0, . . . , 0, w · w̃i−1
· w̃iN0 −1 has the component
0
0
e (i 0) values of X(j)
e
indexed by i0 equal to w and results in exactly D
N
that are nonzero. Therefore we obtain (4.45) with equality and the
proposition is proven.
A.7
Proof of Proposition 5.1
If e corresponds to Ui , then Le = {i}. Either Si 6= N, and therefore
N ∩ Le = ∅, or Si = N but then i is not in K and hence also not in
(i)
∆KU (D, K) which proves Te = 0.
Suppose e does not correspond to an Ui . Then the set Le can be
partitioned into Lu and Ld and thus
(j)
(j)
Te = max max ∆KU (D, K) ∩ Le , max ∆KU (D, K) ∩ Le j∈N ∩Lu
j∈N ∩Ld
(A.62)
and it follows further that
(j)
(j)
(j)
∆KU (D, K) ∩ Le = ∆KU (D, K) ∩ Lu + ∆KU (D, K) ∩ Ld . (A.63)
(j)
First, we determine ∆KU (D, K) ∩ Ld for j ∈ N ∩ Lu and for the
two cases (Su , Sd ) = (N, K) and (Su , Sd ) 6= (N, K).
Suppose first (Su , Sd ) = (N, K). Hence, all bits ULd are known but
are not incorporated because Su = N:
Ld ∩ KU (D, K) = ∅.
(A.64)
But because Su = N and by (5.4), at least one bit of ULu has state N.
Thus, if the state of such a Uj is set to D then by (5.4), the state Su is
A.8 Proof of Proposition 5.2
147
set to D as well and it follows that Ld ⊂ KU (D ∪ {j}, K). Thus, we have
that
(j)
Ld ⊂ ∆KU (D, K).
(A.65)
If (Su , Sd ) 6= (N, K) then it is not difficult to show that no bit from ULd
is additionally incorporated by setting any state Sj , j ∈ N ∩ Lu , to D.
In summary, we have for any j ∈ N ∩ Lu that
(j)
1. if (Su , Sd ) = (N, K), then |N ∩ Lu | > 0 and Ld ⊂ ∆KU (D, K),
(j)
2. otherwise, ∆KU (D, K) ∩ Ld = ∅.
With (A.63), it follows that
(
Tu + |Ld |, if (Su , Sd ) = (N, K),
j∈N ∩Lu
Tu ,
otherwise.
(A.66)
By symmetry, the same holds if we exchange the roles of u and d in
(A.66) and therefore with (A.62) we get


max{Tu + |Ld |, Td }, if (Su , Sd ) = (N, K),
(A.67)
Te = max{Tu , |Lu | + Td }, if (Su , Sd ) = (K, N),


max{Tu , Td },
otherwise.
(j)
max ∆KU (D, K) ∩ Le =
Finally, if Sd = K, by (5.4) the states of all ULd equal K and thus N ∩ Ld
is empty and it follows Td = 0. By the same argument, if Su = K then
Tu = 0 and therefore (5.51) follows from (A.67).
A.8
Proof of Proposition 5.2
A super-edge e in the FV of the FTFG is called a termination edge if
1. the sibling of e has state K,
2. Le ∩ K = ∅.
An example is shown in Figure A.1. It follows that the state of a termination edge can not be K (otherwise Le ∩ K = Le ). Moreover, no
termination edge e1 can be an ancestor or descendant of any other termination edge e2 and thus
Le1 ∩ Le2 = ∅.
(A.68)
148
Proofs
U0
U1
U2
U3
K
K G1
K
N G1
2
K
N G1
2
4
2
U4
U5
K
K G1
K
D G1
2
4
N
D 7
D G1 8 X0
4
2
U6
U7
D
N G1
2
Figure A.1: This example contains two termination edges which are
drawn as dashed lines.
We say the termination edge e1 is lower than the termination edge e2 if
and only if i1 < i2 for all i1 ∈ Le1 , i2 ∈ Le2 . The importance of termination edges comes from the fact that all known bits are incorporated if
and only if Se = D for all termination edges e.
The known bits of the sibling of any termination edge e can only
be incorporated if the state of at least one of the bits ULe equals D. It
then follows by (A.68) that for every termination edge e we need to set
a different Si to D in order to incorporate the known bits of the sibling
of e. Thus, a set D that maximizes C(D, K) with |D| ≤ Dmax consists
of exactly one element from every Le , e ∈ TDmax , where TDmax is the set
of the Dmax lowest termination edges.
The MKBLIS determines iMKBLIS if and only if Tr > 0, i.e. not all
known bits are incorporated. The index iMKBLIS is determined such
that if (Tu , Td ) 6= (0, 0) then we go along an edge with positive score.
Therefore, at some point we reach a node with children edges u, d of
states (Tu , Td ) equal to (N, K) or (K, N). But the score of the edge in
state K is zero and the index iMKBLIS is determined such that we do not
go along an edge in state K with zero score. Thus, we go along the edge
with state N and we get
(i
)
∆KU MKBLIS (D, K) 6= ∅
(A.69)
A.8 Proof of Proposition 5.2
149
If the score of the edge in state N is zero, then this edge is a termination
edge. If the score of the edge in state N is not zero, then we will later
reach another node with children edges u, d of states (Tu , Td ) equal to
(N, K) or (K, N). And so on, until the state of the edge in state N is
zero. Thus, we have that iMKBLIS ∈ Le for some termination edge e
with Se = N. Moreover, because we go up if Tu and Td are both positive,
iMKBLIS ∈ Le where e is the lowest termination edge with Se = N.
If |D| = 0 before the MKBLIS starts, then the MKBLIS repeatedly
sets an Si to D from the set Le where e is the lowest termination edge
with Se = N, which therefore maximizes C(D, K).
If |D| > 0 before the MKBLIS starts, then let i ∈ D denote the index
of the bit from UD with highest reliability. Let e denote the termination
edge for which i ∈ Le before Si is set to K. After setting Si to K, e is
not a termination edge anymore because i ⊂ Le ∩ K.
After setting Si to K, the index iMKBLIS ∈ Le0 where e0 is the lowest
termination edge with state N. If at most one new termination edge is
created by setting Si to K, then clearly setting SiMKBLIS to D leads to
a contiguous set of incorporated termination edges and thus C(D, K) is
maximized. So we just need to prove that at most one new termination
edge is created by setting Si to K.
We split the proof into the two cases
1. e does not equal the edge corresponding to Ui
2. e equals the edge corresponding to Ui
If e does not equal the edge corresponding to Ui , then after setting
Si to K, the sibling of Ui is the only new termination edge. Note that
then UiMKBLIS is the sibling of Ui and thus after setting SiMKBLIS to D,
the new set of incorporated known bits is the set of incorporated known
bits before setting Si to K plus Ui .
If e equals the edge corresponding to Ui , the sibling of Ui needs to
have state K because Ui was a termination edge. Thus after setting Si
to K, the parent edge p of e has state K as well. Let p0 be the sibling of
p.
1. If Sp0 = K, then the same discussion applies recursively to the
parent of p0 and its sibling.
2. If Sp0 6= K, then either p0 is the only new termination edge or there
is no new termination edge.
Thus, the proposition is proven.
150
A.9
Proofs
Proof of Proposition 5.3
Because the MKBS determines iMKBS by always going along the edge
with higher score, we have
(j)
iMKBS ∈ argmax∆KU (D, K).
(A.70)
j∈N
If |D| = 0 before the MKBS starts, then the MKBS repeatedly sets
SiMKBS to D. Let i0 , i1 , . . . , iDmax −1 be the sequence of these indexes
iMKBS and let Dt , {i0 , i1 , . . . , it−1 } where D0 , ∅. Suppose that the
set D = D0 maximizes |KU (D, K)|. We show now that |KU (D0 , K)| =
|KU (DDmax , K)|. By (A.70), we have that
(j)
i0 ∈ argmax∆KU (D0 , K).
(A.71)
j∈N
We call the path from the root of the FTFG to the edge corresponding
to Ui the path of i. Let i00 be an index in D0 that shares the longest path
with i0 . Let e and e0 denote the edge in the path of i0 and i00 , respectively,
where the two paths divide first. Let Te and Te0 be the score of the edge
e and e0 , respectively, for D = D0 . Let D00 , D0 \{i00 } ∪ {i0 }. If we use
D00 instead of D0 , then Te known bits are additionally incorporated and
at most Te0 are additionally ignored. But Te ≥ Te0 because otherwise it
would contradict (A.71). Thus, we obtain that
|KU (D00 , K)| ≥ |KU (D0 , K)|.
(A.72)
But because |KU (D0 , K)| is maximal, we get equality in (A.72). One can
use a similar argument repeatedly until we are left with a set Dc that
equals DDmax and with |KU (Dc , K)| = |KU (D0 , K)|.
If |D| > 0 before the MKBS starts, then let i ∈ D denote the index
of the bit from UD with highest reliability. We split the proof into the
two cases
1. the state of the sibling of Ui has state 6= K
2. the state of the sibling of Ui has state K
In the first case, after setting Si to K, the index iMKBS is the index
of the sibling of Ui . After setting SiMKBS to D, the known bits that are
incorporated is augmented by Ui . And the only state that changed is Si
(without propagation), thus we can not incorporate more bits by setting
another state Sj to D, because then we would have set Sj to D before.
In the second case, after setting Si to K, the state of the parent edge
p of e becomes K as well. Let p0 be the sibling of p.
A.9 Proof of Proposition 5.3
151
1. If Sp0 = K, then the same discussion applies recursively to the
parent of p0 and its sibling.
2. If Sp0 = D, then the bits with index Lp are already incorporated
(note that i ∈ Lp ). Moreover, the number of bits that could be
additionally incorporated by setting the state of an Uj with Sj =
N to D did not change. Thus we would choose the same D when
we would restart the MKBS from scratch. We have already shown,
that the MKBS maximizes |KU (D, K)| if |D| = 0 before the MKBS
starts.
3. If Sp0 = N, then by setting the state to D of one of the bits Uj , j ∈
Lp0 , with Sj = N, incorporates at least Lp . Moreover, the number
of bits that could be additionally incorporated by setting the state
of an Uj with Sj = N to D changed only for indexes ∈ Lp0 and
is larger than it was for Ui . Thus, iMKBS is an index ∈ Lp0 . As
before, we would choose the same D when we would restart the
MKBS from scratch. We have already shown, that the MKBS
maximizes |KU (D, K)| if |D| = 0 before the MKBS starts.
Thus, the proposition is proven.
Bibliography
[1] E. Arıkan, “Channel polarization: A method for constructing
capacity-achieving codes for symmetric binary-input memoryless
channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073,
2009.
[2] E. Arıkan and E. Telatar, “On the rate of channel polarization,” in
Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2009, pp. 1493
–1495.
[3] M. Bardet, V. Dragoi, A. Otmani, and J. P. Tillich, “Algebraic
properties of polar codes from a new polynomial formalism,” in
Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2016, pp. 230–
234.
[4] R. E. Blahut, Algebraic Codes for Data Transmission. Cambridge
University Press, 2003.
[5] K. Chen, K. Niu, and J. Lin, “A reduced-complexity successive
cancellation list decoding of polar codes,” in Proc. IEEE Vehicular
Technology Conference (VTC Spring), Jun. 2013, pp. 1–5.
[6] J. W. Cooley and J. W. Tukey, “An algorithm for the machine
calculation of complex fourier series,” Mathematics of Computation,
vol. 19, no. 90, pp. 297–301, 1965.
[7] T. M. Cover and J. A. Thomas, Elements of Information Theory,
2nd ed. New York: Wiley, 2006.
[8] B. A. Davey and H. A. Priestley, Introduction to Lattices and
Order, 2nd ed. Cambridge University Press, 2002. [Online].
Available: http://dx.doi.org/10.1017/CBO9780511809088
153
154
Bibliography
[9] G. D. Forney, Jr., “Codes on graphs: normal realizations,” IEEE
Trans. Inf. Theory, vol. 47, no. 2, pp. 520–548, 2001.
[10] J. Honda and H. Yamamoto, “Polar coding without alphabet extension for asymmetric channels,” in Proc. IEEE Int. Symp. on Inf.
Theory (ISIT), Jul. 2012, pp. 2147–2151.
[11] N. Hussami, R. Urbanke, and S. B. Korada, “Performance of polar
codes for channel and source coding,” in Proc. IEEE Int. Symp. on
Inf. Theory (ISIT), Jul. 2009, pp. 1488–1492.
[12] A. Joseph and A. R. Barron, “Fast sparse superposition codes have
near exponential error probability for R < C,” IEEE Trans. Inf.
Theory, vol. 60, no. 2, pp. 919–942, Feb. 2014.
[13] S. B. Korada, E. Şaşoğlu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds, and constructions,” IEEE Trans.
Inf. Theory, vol. 56, no. 12, pp. 6253–6264, 2010.
[14] S. B. Korada, A. Montanari, I. E. Telatar, and R. Urbanke, “An
empirical scaling law for polar codes,” in Proc. IEEE Int. Symp. on
Inf. Theory (ISIT), Jun. 2010, pp. 884–888.
[15] S. B. Korada, “Polar codes for channel and source coding,” Ph.D.
dissertation, EPFL, 2009.
[16] S. Kudekar, T. Richardson, and R. L. Urbanke, “Spatially coupled
ensembles universally achieve capacity under belief propagation,”
IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7761–7813, Dec. 2013.
[17] S. L. Lauritzen, Graphical Models.
Clarendon Press, 1996.
[18] B. Li, H. Shen, and D. Tse, “An adaptive successive cancellation
list decoder for polar codes with cyclic redundancy check,” IEEE
Commun. Lett., vol. 16, no. 12, pp. 2044–2047, Dec. 2012.
[19] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal
Processing Magazine, vol. 21, no. 1, pp. 28–41, 2004.
[20] D. J. C. MacKay and M. C. Davey, Codes, Systems, and Graphical
Models. New York, NY: Springer New York, 2001, pp. 113–130.
[Online]. Available: http://dx.doi.org/10.1007/978-1-4613-0165-3_
6
Bibliography
155
[21] M. Mondelli, S. H. Hassani, and R. L. Urbanke, “From polar to
Reed-Muller codes: A technique to improve the finite-length performance,” IEEE Trans. Commun., vol. 62, no. 9, pp. 3084–3091,
2014.
[22] R. Mori and T. Tanaka, “Performance and construction of polar
codes on symmetric binary-input memoryless channels,” in Proc.
IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2009, pp. 1496–1500.
[23] ——, “Performance of polar codes with the construction using density evolution,” IEEE Commun. Lett., vol. 13, no. 7, pp. 519–521,
2009.
[24] ——, “Channel polarization on q-ary discrete memoryless channels
by arbitrary kernels,” in Proc. IEEE Int. Symp. on Inf. Theory
(ISIT), Jun. 2010, pp. 894–898.
[25] ——, “Source and channel polarization over finite fields and reedsolomon matrices,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp.
2720–2736, May 2014.
[26] N. Presman, O. Shapira, and S. Litsyn, “Mixed-kernels constructions of polar codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2,
pp. 239–253, Feb. 2016.
[27] N. Presman, O. Shapira, S. Litsyn, T. Etzion, and A. Vardy, “Binary polarization kernels from code decompositions,” IEEE Trans.
Inf. Theory, vol. 61, no. 5, pp. 2227–2239, May 2015.
[28] N. Rengaswamy and H. D. Pfister, “Cyclic polar codes,” in Proc.
IEEE Int. Symp. on Inf. Theory (ISIT), Jun. 2015, pp. 1287–1291.
[29] T. Richardson and R. Urbanke, Modern Coding Theory.
Cambridge University Press, 2008. [Online]. Available: http:
//dx.doi.org/10.1017/CBO9780511791338
[30] G. Sarkis, I. Tal, P. Giard, A. Vardy, C. Thibeault, and
W. J. Gross, “Flexible and low-complexity encoding and
decoding of systematic polar codes,” 2015. [Online]. Available:
http://arxiv.org/abs/1507.03614
[31] C. Schürch, “A partial order for the synthesized channels of a polar
code,” in Proc. IEEE Int. Symp. on Inf. Theory (ISIT), Jul. 2016.
156
Bibliography
[32] C. E. Shannon, “A mathematical theory of communication,” The
Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, Jul. 1948.
[33] ——, “A mathematical theory of communication,” The Bell System
Technical Journal, vol. 27, no. 4, pp. 623–656, Oct. 1948.
[34] N. Stolte, “Rekursive Codes mit der Plotkin-Konstruktion und ihre
Decodierung,” Ph.D. dissertation, TU Darmstadt, 2002.
[35] I. Tal and A. Vardy, “List decoding of polar codes,” in Proc. IEEE
Int. Symp. on Inf. Theory (ISIT), Aug. 2011, pp. 1–5.
[36] ——, “How to construct polar codes,” IEEE Trans. Inf. Theory,
vol. 59, no. 10, pp. 6562–6582, 2013.
[37] R. Tolimieri, M. An, and C. Lu, Algorithms for Discrete Fourier
Transforms and Convolution, 2nd ed. New York: Springer, 1997.
[38] P. Trifonov, “Efficient design and decoding of polar codes,” IEEE
Trans. Commun., vol. 60, no. 11, pp. 3221–3227, 2012.
Series in Signal and Information Processing
edited by Hans-Andrea Loeliger
Vol. 1:
Hanspeter Schmid, Single-Amplifier Biquadratic MOSFET-C
Filters. ISBN 3-89649-616-6
Vol. 2:
Felix Lustenberger, On the Design of Analog VLSI Iterative
Decoders. ISBN 3-89649-622-0
Vol. 3:
Peter Theodor Wellig, Zerlegung von Langzeit-Elektromyogrammen zur Prävention von arbeitsbedingten Muskelschäden.
ISBN 3-89649-623-9
Vol. 4:
Thomas P. von Hoff, On the Convergence of Blind Source
Separation and Deconvolution. ISBN 3-89649-624-7
Vol. 5:
Markus Erne, Signal Adaptive Audio Coding using Wavelets and
Rate Optimization. ISBN 3-89649-625-5
Vol. 6:
Marcel Joho, A Systematic Approach to Adaptive Algorithms
for Multichannel System Identification, Inverse Modeling, and
Blind Identification. ISBN 3-89649-632-8
Vol. 7:
Heinz Mathis, Nonlinear Functions for Blind Separation and
Equalization. ISBN 3-89649-728-6
Vol. 8:
Daniel Lippuner, Model-Based Step-Size Control for Adaptive
Filters. ISBN 3-89649-755-3
Vol. 9:
Ralf Kretzschmar, A Survey of Neural Network Classifiers for
Local Wind Prediction. ISBN 3-89649-798-7
Vol. 10:
Dieter M. Arnold, Computing Information Rates of Finite State
Models with Application to Magnetic Recording.
ISBN 3-89649-852-5
Vol. 11:
Pascal O. Vontobel, Algebraic Coding for Iterative Decoding.
ISBN 3-89649-865-7
Vol. 12:
Qun Gao, Fingerprint Verification using Cellular Neural
Networks. ISBN 3-89649-894-0
Vol. 13:
Patrick P. Merkli, Message-Passing Algorithms and Analog
Electronic Circuits. ISBN 3-89649-987-4
Vol. 14:
Markus Hofbauer, Optimal Linear Separation and
Deconvolution of Acoustical Convolutive Mixtures.
ISBN 3-89649-996-3
Vol. 15:
Sascha Korl, A Factor Graph Approach to Signal Modelling,
System Identification and Filtering. ISBN 3-86628-032-7
Vol. 16:
Matthias Frey, On Analog Decoders and Digitally Corrected
Converters. ISBN 3-86628-074-2
Vol: 17:
Justin Dauwels, On Graphical Models for Communications and
Machine Learning: Algorithms, Bounds, and Analog
Implementation. ISBN 3-86628-080-7
Vol. 18:
Volker Maximillian Koch, A Factor Graph Approach to ModelBased Signal Separation. ISBN 3-86628-140-4
Vol. 19:
Junli Hu, On Gaussian Approximations in Message Passing Algorithms with Application to Equalization. ISBN 3-86628-212-5
Vol. 20:
Maja Ostojic, Multitree Search Decoding of Linear Codes.
ISBN 3-86628-363-6
Vol. 21:
Murti V.R.S. Devarakonda, Joint Matched Filtering, Decoding,
and Timing Synchronization. ISBN 3-86628-417-9
Vol. 22:
Lukas Bolliger, Digital Estimation of Continuous-Time Signals
Using Factor Graphs. ISBN 3-86628-432-2
Vol. 23:
Christoph Reller, State-Space Methods in Statistical Signal
Processing: New Ideas and Applications. ISBN 3-86628-447-0
Vol. 24:
Jonas Biveroni, On A/D Converters with Low-Precision Analog
Circuits and Digital Post-Correction. ISBN 3-86628-452-7
Vol. 25:
Georg Wilckens, A New Perspective on Analog-to-Digital
Conversion of Continuous-Time Signals. ISBN 3-86628-469-1
Vol. 26:
Jiun-Hung Yu, A Partial-Inverse Approach to Decoding ReedSolomon Codes and Polynomial Remainder Codes.
ISBN 3-86628-527-2
Vol. 27:
Lukas Bruderer, Input Estimation and Dynamical System
Identification: New Algorithms and Results.
ISBN 3-86628-533-7
Vol. 28:
Sarah Neff, A New Approach to Information Processing with
Filters and Pulses. ISBN 3-86628-575-2
Hartung-Gorre Verlag Konstanz  http://www.hartung-gorre.de