# Cryptanalysis Exercises

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

# Cryptanalysis Exercises

More-than-slightly contrived exercises in cryptanalysis with bioinformatics equivalents.

Aimed at an introductory-level class in probability & information theory for compbio.

## Information content of a sequence

• Blog.NumbersStations -- artificial voices reading numbers over shortwave radio, containing coded spy messages from Cuba -- does it get any better?

## Compositional analysis

• Single-character, di-character analysis
• Embed a signal in noise. Detect by compositional analysis
• More cunning embedding: try and match the letter frequencies. Countermeasure: use adjacency freqs
• Bring Bayesian scoring schemes into it somehow...

## MCMC

• This is a generative model-based approach, following the classic 1949 paper of Claude Shannon: Wikipedia:Communication_Theory_of_Secrecy_Systems
• General idea, illustrated for a Wikipedia:transposition_cipher:
• Train an order-N HMM on English (see "compositional analysis" above)
• Create an MCMC sampler over cipherkeys
• Let K=cipherkey, C=ciphertext, T=plaintext=T(K,C)
• You are then sampling from P(T,K)=P(C)*P(T(K,C)) where P(C) = 1/26! which disappears from Hastings ratio
• Options for guessing the key:
• Simplest is just to do full MCMC and take the mode
• Can also do a simulated annealing version (gradually lower the temperature)
• Lots of cheap & cheerful cryptanalysis can be done this way
• No direct bioinformatics analogy (I think), but there ARE lots of MCMC algorithms in sequence analysis, so...

## Pairwise sequence comparison

• Pairwise homology detection: try to match plaintext with ciphertext encrypted using a running key cipher with a tabula recta
• Both key and plaintext have a known frequency distribution (English), effectively yielding a substitution matrix
• Variation: try to match two ciphertexts encrypted from the same plaintext with different running keys
• Empirical demonstration: separation of signal and noise score distributions
• In some cryptanalytic scenarios, evidence of homology may be all you need (can then apply rubber hose cryptanalysis)
• Another pairwise homology example (better analogy, but not cryptanalysis per se - more like signals/traffic analysis): try to find the pair (or N/2 pairs) of matching transmissions in N noisily-intercepted transmissions
• the source of randomness here is imperfect eavesdropping by Eve (e.g. line noise), rather than some cipher that can be treated as pseudo-random
• for example, could have a "substitution matrix for Morse code": dot vs dash errors are like nucleotide substitutions
• can introduce indels in similar ways
• can make noise dependent on "length" of wire and thereby attempt to reconstruct topology of enemy network (c.f. phylogeny)... this may be stretching things a bit far, though...
• Local alignment (ungapped): padded version of pairwise homology detection

## Bayesian decision theory

Would be nice to have an example where you use posterior probabilities to maximize your expected reward (i.e. decision theory, specifically expected utility).

Perhaps an example involving industrial espionage?

---