Recall the definition of a typical sequence: it is the sequence we expect (probabilistically) to occur, given the statistics of the source. We had a theorem regarding the approximate number of typical sequences, the probability of a typical sequence, and so forth. In this section we generalize this.
All along this semester we have used the notion of representing
sequences of random variables as vectors. For example, the r.v.'s
(X1,X2) we could represent with a vector-valued random variable
X. In a sense, this is all that we are doing with the
jointly-typical sequences. We consider sequences (xn,yn) as if
there were simply some sequence zn, and ask: for the set of sequences
zn, what are the typical sequences:
From this definition, we can conclude similar sorts of things about
jointly typical sequences as we can about typical sequences:
There are about 2nH(X) typical X sequences, about 2nH(Y) typical Y sequences, and only about 2nH(X,Y) jointly typical sequences. This means that if we choose a typical X sequence and independently choose a typical Y sequence (without regard to the X sequence), in not all cases will the sequence (Xn,Yn) so chosen be jointly typical. In fact, from the last part of the theorem, the probability that the sequences chosen independently will be jointly typical is about 2-nI(X;Y). This means that we would have to try (at random) about 2nI(X;Y) sequence pairs before we choose a jointly typical pair. Thinking now in terms of fixing Y and choosing X at random, this suggests that there are about 2nI(X;Y) distinguishable sequences in X.