# Error correction coding/latex dump

## < Error correction coding

*224,307*pages on

this wiki

\maketitle

The story so far: we all agree that Shannon, damn him, has already said everything that there is to be said about computationally unbounded error correction algorithms. So we focus on efficient encoders/decoders.

Our main focus is on computationally bounded Shannon adversaries. There is also another setting were you have a Hamming adversary and the adversary structure is nonthreshold, but that's on the back burner for now.

The main problem is that the adversary can incorporate a OWF, even if it is allowed no randomness.

{\bf Space bounds.} Space bounds alone do not seem to be sufficient to guarantee that OWFs don't exist, because as far as we can tell, $NC^0 \subset L$. If we further restrict the adversary to a TM that uses a read-once input tape and a logspace work tape, then there is a simple decoding algorithm (not sure about encoder yet). However, it is more promising to go from the bottom up right now.

The most brain dead adversary possible is a finite automaton with no randomness. The output alphabet of the automaton is $\{0,1,\lambda\}$ and $\epsilon$-transitions are allowed, to allow for the possiblity of producing more or fewer bits than the input. We treat the automaton as a transformation on streams, i.e its state does not magically reset itself on reaching the boundary of a block. This is quite natural (after all the block length is chosen {\em after} the automaton), and can make a significant difference to the model (see below).

For a FA adversary $A_F$ we have a tentative algorithm: consider $S \subset \{0,1\}^n$ such that $A_F(S) = \{A_F(x): x \in S\}$ is prefix-free. Given such a set $S$, for $m \leq \log |S|$, an arbitrary mapping from $2^m$ to $2^n$ is an encoding with rate $\frac{m}{n}$. Our encoder will pick the best of all encodings quantified over choices of $S$. How large does $n$ have to be before the rate becomes reasonably close to optimal? Can the optimal rate be achieved for any finite block length? At first glance it looks like the answer is no, for example when the FA is one that eats up the first bit of its input. But recall that the state of $F$ doesn't magically reset itself. So an encoder with block length 2 will actually achieve the optimal rate here. Also, if we only achieve optimality asymptotically, it's not clear that we are saying anything that's not a special case of the channel coding theorem. My intuition is that when $n$ equals the number of states of $F$ you will actually achieve optimality (and I've verified this for a few examples).

{\bf Efficiency.} We need to find at a way to bring the complexity down from $2^n$ to $\textsf{poly}(n)$.

{\bf Randomness.} It is not clear how one would model randomness in the traditional way (i.e, giving the automaton a random tape). Even if it were somehow possible, how would one throttle the rate at which the automaton consumes random bits? The model used in the literature for randomized FA seems to be Markov chains. Two other machines to consider are bounded-width branching programs and read-once spaced bounded TMs (in the latter model, the randomness must probably be encoded into the transition function instead of on a separate tape because of the throttling issue). I'm pretty sure there are many equivalences here, and that any algorithm for FA will generalize easily to some if not all of these other models, but I don't yet understand the minutiae.

{\bf Variable length.} Once you have a channel that ``inspects* the message, you need to consider a probability distribution on messages in order to make meaningful statements. For example, consider the FA that has the following transition diagram (first co-ordinate denotes state, second denotes input/output): $$(0, 0) \rightarrow (0, 0); (0, 1) \rightarrow (1, 1); \forall x\ (1, x) \rightarrow (0, \lambda)$$ This nasty little critter outputs whatever it sees until it sees a $1$, in which case it eats up its next bit and then behaves as usual.*

If you look at the worst case codeword length, the best rate you can achieve here is clearly $\frac{1}{2}$ (the entropy of each bit is at most $1$, and in the worst case you will lose one out of every two bits.) But if you assume the messages are uniformly distributed and look at the average codeword length, the natural algorithm gives you a rate of $\frac{2}{3}$.

But you can do even better: first bias your message so that $0$ has probability $p$ (and each bit is independent). Now the entropy of the messgage is $H(p)$ per bit and the expected length of the codeword is $p+2(1-p) = 2 - p$ bits per bit of message, and so the rate is $\frac{p \log p + (1-p) \log (1-p)}{2-p}$. The maximum rate is $\approx 0.694242$, and is attained when $p$ is (surprise!) $\frac{\sqrt 5 - 1}{2}$. I think we are on to something here: codes of this type are actually used when the physical medium implements $0$'s and $1$'s differently (such as when one requires more power to transmit), and the channel capacity $0.694242$ appears to be known in the coding theory literature. Furthermore, you just know you're on the right track when you run into the golden ratio! Maybe we should talk to some EE people and find out more about this.

{\bf Adversarial input.} If the message is also chosen by an adversary, then assume that that adversary is a PPT, and encode the message $m' = s || m \oplus \textsf{prg}(s) $ where \textsf{prg} is a PRG, $s$ is a seed and $||$ denotes concatenation. I'm not sure if \textsf{prg} needs to be a PRG against the adversary choosing the message or the one on the channel, but either way we're fine.

{\bf The encoder.} That was the easy part. The hard part is to do all this shit automatically given the description of the FA. %Here's what I have in mind: associate a prefix set with each state of the FA; now the transitions of the FA map to equations linking these states. The prefix set corresponding to any {\em recurrent state} can be expressed recursively. Markov chains. Transition probabilities.

{\bf Two independent problems.} Shannon defines channel capacity as $$C = \lim_{n\rightarrow \infty}\frac{1}{n} \max_{X_n}I(X_n,A(X_n))$$ This gives two more or less independent problems: finding the optimal message distribution and finding the encoder that will achieve the optimal rate when its input is optimally distributed. We want these two to have the following properties: \begin{itemize} \item For simple adversaries like finite automata the optimal distribution should have a form that is much simpler than Shannon's expression, strongly preferably computable by a PPT (although not necessarily uniform). \item The encoder should only ``look ahead" a finite number of bits of its input (for a FA, bounded by the size of the FA) \item The optimal rate should be achieved in the limit of the message length. \item For any deterministic adversary, the decoder is the identity function (not sure about this. Actually we should try to prove this as a general theorem.) \end{itemize}

For the channel in the previous example, the encoder is: $$\forall x\ [(0, 0) \rightarrow (0, 0); (0, 1) \rightarrow (1, 1); (1, \lambda) \rightarrow (0, x)]$$ where the quantification means that for every $x$, the automaton parametrized by $x$ is a valid encoder. Observe that the channel can be converted into the encoder by swapping the second component to the left and right of the $\rightarrow$ in each transition. However, this may be due to the simplicity of this channel. Consider instead the following channel: $$(0, 0) \rightarrow (0, 0); (0, 1) \rightarrow (1, 1); (1, 0) \rightarrow (2, 0); (1, 1) \rightarrow (0, 1); \forall x\ (2, x) \rightarrow (0, \lambda)$$ This channel behaves normally until it sees the string ``$10$*, whereupon it eats the next bit and then behaves as usual. Neither the input distribution nor the encoder for this channel is clear. We cannot reduce the channel to one that has an alphabet of size $4$, because the ``$10$* string can occur both at the odd and even positions. Solving this channel should tell us a lot about the problem and help us get a general algorithm.

I have a conceptual difficulty with our program above: the optimal distribution as defined by Shannon corresponds to {\em no encoding}; however the one we're looking for corresponds to the optimal encoder. Why should these two be the same?