Outstanding news, via the BBC's Chris Vallance, of the public release [by GCHQ] of two papers authored by Alan Turing, Ph.D., Entitled 'The Applications of Probability to Crypt' and 'The Paper on the Statistics of Repetitions'. Both documents target cryptographic techniques, with the utilization of specific mathematical constructs.
"...Review of "Statistics of Repetitions" by A.M. Turing The report begins by stating the task as being able to "obtain reliable estimates of the value of given repeats". It does not say what the purpose of this is, but an example is given of two messages of length 200 and 250 characters respectively, with an overlap of 105 characters. It is evident then that what is being sought is a means of telling whether two machine enciphered messages have different plaintext but the same encipherment key during the overlap (presumably at the end of one message and the beginning of the other) - in other words, an offset depth. The paper uses polygraphic statistics, with care to avoid double counting of r-graphs within n-graphs, to get a more accurate estimate of the odds of a pattern of repeats occurring causally compared with randomly, using an "urn" model.
Review of "The Applications of Probability to Cryptography" by A.M. Turing This paper first describes the use of "odds" and "log odds" to simplify cryptanalytic probability tests, and then applies this to 4 problems: "Vigenere", "A Letter Substitution Problem", "Theory of Repeats", and "Transposition Ciphers". The introductory section presents conditional probability arguments in an elementary fashion without using much mathematical notation, and gives a number of examples from everyday life. One of these examples is life expectancy, and the example which uses "Hitler is now of age 52". Bayesian principles, Bayes' Theorem (or Law) is not mentioned until page 5, where Turing calls it the "Factor Principle". Nowadays we would write, if X are observations, H is a hypothesis with negation ~H, and P[] is probability, that the odds in favour of H given X are
P[H] P[X|H]
----- x -------
P[~H] P[X|~H]
The first factor is the prior odds, the second factor is the odds for the observations, and the combination is the posterior odds. Since Turing's time, mathematicians at GCHQ have become very familiar with this concept. The section closes with an example of log odds expressed as "bans" or "decibans", with that terminology "introduced in honour of the famous town of Banbury". The "Vigenere" section applies now standard scoring theory to the solution of those ciphers, and provides tables of language counts and log probabilities written in Turing's hand. The "Letter Substitution Problem" shows how to score the fitting of a crib against the convolution of 3 slides giving a range of 0 to 27. Turing uses generating functions (viz. (1-x^10)^3/(1-x)^3) to calculate the probability of each slide, and uses the non-uniformity to help with scoring the crib placement. The "theory of repeats" section is very similar to his paper "Statisitics of Repetitions". The "transposition ciphers" section uses what we would call digraph scores (and he calls "bigramme scores") to score the selection of adjacent columns in the cipher. In the margin next to one of Turing's assertions is the comment "I doubt it S.W." which was probably written by Sean Wiley, former Chief Mathematician at GCHQ. This shows that in some details even the leading mathematicians could disagree! The section also has a (long) derivation that all the eigenvalues of a particular Markov chain matrix are at most 1, which is now pretty standard. 3. The papers are available to view at The National Archives in Kew, West London, for anyone with a valid reader's ticket. Visit the website for further information on how to access public records: www.nationalarchives.gov.uk/visit ..." via GCHQ
