Reinhard: MIC (mutual index of coincidence)

Beitrag lesen

Guten Abend,

ich versuche gerade zu verstehen, wie die MIC-Formel angewendet wird. Das Zitat stammt von hier. Es dreht sich um Abschnitt 2.2

„2.2 Determining the Keyword
We employ a new technique suggested by Dan Velleman[6] which uses mutual index of coincidence (MIC) in a very smart fashion.

The mutual index of coincidence of x and s, denoted MIC(x,s), is defined to be the probability that two random elements, one from each, are identical. Let x is a string of standard source language e.g. English. Let string x has length l and frequency distribution $$ r_1,r_2,...,r_n $$ where n is the size of the alphabet e.g. 26. Clearly, for large l, probability distributions of n letters $$ r_i/l $$ will be similar to the standard probability distributions $$ p_i $$ of the source language. Now, let string y has length l' and frequency distribution $$ r'_1,r'_2,...,r'_n $$. Then,

$$ \displaystyle MIC (x,s) = \frac{\sum\nolimits_{i=1}^n r_i r'i}{ll'} \simeq \frac{\sum\nolimits{i=1}^n p_i r'_i}{l'} $$

Suppose $$ s_1,s_2,..,s_m $$ be the m substrings of the ciphertext s. Each of these substrings are obtained by shift encryption of the unknown plaintext. Though plaintext is unknown at the moment, its probability distribution and therefore its IC is known. Consider, substring $$ s_j\ (1 \le j \le m) $$ is encrypted by unknown key $$ k_j $$. Suppose we shift $$ s_j $$ by $$ b\ (1 \le b \le n) $$ and obtain n different $$ s_j^b $$ each of which corresponds to a decryption with a different key value. If $$ s_j $$ has frequency distribution $$ r''_1,r''_2,...,r''_n $$ and length l'', then

$$ \displaystyle MIC (x, s_j^b) \simeq \frac{\sum\nolimits_{i=1}^n p_i r''_{i-b}}{l''} $$

It is expected that $$ MIC (x,s_j^b) \simeq IC(source\ language) $$ if b is the additive inverse of the correct key modulo n; otherwise we obtain a much smaller MIC value. For instance if n = 26, and maximum MIC value for a substring is obtained when b = 5, then the probable key for that substring is −5 mod 26 = 21. For the example ciphertext our prediction is m = 6. So, we divide the ciphertext into 6 substrings, and we shift each substring by one, two, up to 26 and compute the MIC values shown in Figure2. By simply selecting the maximum MIC value for each substring we get a probable keyword, in our example it is crypto. To see whether the keyword we have obtained is correct we decrypt the ciphertext discovering[4] a passage from Tanenbaum[12].“

So: Wie berechnet man nun den MIC? In der Tabelle bei String 1 Shift 26 (Key A) soll nun 0,0314 rauskommen.

var language = {
    en : {
        //Buchstabenhäufigkeiten A-Z in %
        frequencies : [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074],
        ic : 0.0655
    }
};
function MICtest(str, lang, len)
{
//str (String) : komplettes Kryptogramm
//lang (String) : Sprachkürzel (hier: 'en')
//len (Number) : Anzahl der Spalten in die das Kryptogramm zerteilt wird
lang = language[lang];
var substrings = [],
    frequencyDistribution = [],
    sum = 0;
//substring-Spalten bilden
for (var i = 0; i < len; i++)
  substrings[i] = [];
for (var i = 0, l = str.length; i < l; i++)
  substrings[i % len].push(str[i]);
//Soweit ich verstanden habe geht es nun so weiter: Buchstaben zusammenzählen und durch Gesamtlänge teilen
//Im folgenden nur noch für den 1. Substring
for (var i = 0; i < 26; i++)
  frequencyDistribution[i] = 0;
//_alp (Array) : beinhalten das Alphabet ['A', 'B',...,'Z']
for (var i = 0, l = substrings[0].length; i < l; i++)
  frequencyDistribution[_alp.indexOf(substrings[0][i])]++;
for (var i = 0; i < 26; i++)
  frequencyDistribution[i] /= substrings[0].length;
//Nun Summe bilden (ober Teil des Bruchstrichs  - 2. Formel!)
for (var i = 0; i < 26; i++)
  sum += lang.frequencies[i] * frequencyDistribution[i];
//Zuletzt alles durch die Substring Länge teilen
sum /= substrings[0].length;
console.log(sum); //~0,064

Nun ist 0,064 ganz sicher nicht 0,0314. Ich hoffe jemand kann mir sagen was ich nicht berücksichtigt habe. Oder besser: ein To-Do-Plan in welcher Reihenfolge ich was machen soll.

Reinhard


Wenn es so weiter geht, sollte der Forumsgeist mal ein 'Kryptologie'-Tag hinzufügen :-)

akzeptierte Antworten