How much data is needed to describe a message?

An old joke tells of a man sending a telegram to his brother inviting him to his son’s wedding. Initially he writes this message:

Dear brother,
You are invited to my son’s wedding, two weeks from now.
Looking forward to meeting you.
Your brother.

After learning the price of each word he reasons that Dear brother is redundant since his brother is going to be getting the message by hand, and he already knows how much he loves him. The words you are invited are also redundant, since it is obvious that once there is a wedding his brother is invited, similarly Looking forward to meeting you can be deleted. The words my son’s are also redundant because it wouldn’t make sense to report another man’s wedding, and so the man ended up with the shorter telegram:

wedding in two weeks
which exactly captures the information he wanted to transfer to his brother.

Given a message, in information theory we try to assess how many bits are needed to encode this message, or in other words given an encoded message how many bits are redundant and can be deleted. To do that we first need to quantify the amount of data encoded in a message of n bits. In information theory this is referred to as entropy. The higher the entropy the more data is encapsulated in those bits (therefore fewer bits are redundant). Given a message M = (m_1, m_2, ..., m_n) of n symbols over an alphabet \sigma = {\sigma_1, \sigma_2, ..., \sigma_s}, of s letters the entropy is given by this formula:

Entropy(M) = - \sum_{i=1}^s Pr(\sigma_i)\log Pr(\sigma_i),

where Pr(\sigma_i)is the probability of an arbitrary letter in M to be \sigma_i. In the simple case, Pr(\sigma_i)=\frac{number\;of\;times\;\sigma_i\;appears\;in\;M}{n}.

As can be easily seen the higher the entropy is the more random the message seems (i.e. There are less patterns in it), on the other hand, the message aaaaaaaaaa….aa has entropy 0 (remember that \lim_{x \rightarrow 0} x\log x = 0. For compression, we want to re-encode a message (with fewer bits) such that its entropy increases. In encryption (the other end of the spectrum), on the other hand, we want to re-encode a message (with the same number of bits) such that its entropy increases.

The joke we opened with goes on, as the man decides to delete the word wedding (because what other reason is there to be sending a telegram in the first place), and the words in two weeks (which is the appropriate time prior a wedding to be sending invitations), and so he returns home without sending any telegram at all.