Links to stuff on this blog

Use the Site Index of Projects page link above for an easier way to find stuff on my blog that you might be looking for!

Saturday, March 6, 2010

Linear Approximation Table for Cryptanalysis in MS Excel

This post I suppose doesn't have a real purpose other than a place to put the code that I wrote to look for linear relationships in substitution tables. This is along the same lines of the Differential Cryptanalysis post I did awhile back. Whats the point? I ask myself that often... mostly it's something to do... anyway in a cipher at some point there is something that is non linear. Most often but not always that is a look up table where one value is substituted for another one. This is basically the same idea of the decoder rings some of us had as kids where one letter in a message is changed for another one and so on until the whole message has been encrypted. Even the 'real' ciphers these days have some kind of substitution in them or some non linear equivalent. One example of a cipher with a substitution table is the Advanced Encryption Standard but there are many many others. There are many many different approaches to getting a cipher to be non linear but the table look up is a common way. Anyway... From the title of this post and what I have been writing so far you have probably guessed the goal is to make the output have a non linear relationship to the input. The reason for doing this is because if there is a linear (predictable) relationship between what goes in to what comes out you haven't really encrypted the message very well. The way this is commonly presented is in a Linear Approximation Table. Click HERE for some in depth info on this topic. What I am going to describe below is my non mathematical take on it.

Linear Cryptanalysis is a method for looking at the relationships between certain bits going into a part of a cipher and the bits coming out. Really it is asking how probable is it that I'll get a 1 or 0 on the output if I put a 1 or 0 on the input. This is usually done when looking at the substitution table in a cipher. An example of this might go something like this: (this is a goofy explanation but I think it gets the point across)
 
Suppose there is one byte value (8 bits) of data that goes into a look up table and some different byte value (8 bits) comes out. Also suppose that you look at the first 2 bits of that byte going in each time you send a value in there, and you send in all possible values that have the first 2 bits set to one. That would be a total of 64 byte values that you send into the table to get substituted for new values. Each time you send a value in you look at the value that comes out of the table and write it down. Once you are done you have a list of 64 numbers that came out of the table. Now maybe while looking at that list of numbers you notice that the 1st and 3rd bit of every one of the numbers in the list is a 1. You might say at this point that the first two bits of the input equal the first and third bit of the output. To put it another way you could say that those bits all have the same parity. To put it yet another way you could say that if the first and second input bits are I1 and I2 and the first and third output bits are O1 and O3 then: I1 + I2 = O1 + O3 = 0 where the + (plus sign) is binary Exclusive Or.
  
Another equation that also might work could be I3 + I5 + I8 = O3 + O4 = 1. The difference between these two formulas other than the bits being used is one equals 0 and the other equals 1. That's because one equation is a linear relationship and the other is affine. Affine to my nut and bolt pea sized brain is symmetry so I haven't quite figured out why those two are not exactly the same thing but for a linear approximation table that doesn't really matter. What matters is how many equations like the two above hold true with all possible combinations of input and output bits.

Finally here is the last detail of this stuff: If the table has all the possible values in it and all the values are different (no numbers repeat) and non linear then any given equation we pick will hold true half the time. In the case of a table from 0 to 255 values (or a total of 256 values) exactly half is 128. To stick with the way the professionals do this sort of thing I'll subtract 128 from the total number of occurances so that we will be counting from -128 to +128 with 0 right in the middle. That way a 0 in the output from the code I'll eventually get around to writing about in this post will indicate a non linear combinations of bits in the equation. A more negative value will indicate a affine relationship and a positive number a linear one. The more 0's you have in the output and the closer any numbers are to 0 the more non linear the table is.

OK that's really fun! Are you still awake after reading all this?? If an equation like the one above holds true most of the time for a substitution table then the table isn't doing a very good job of being non linear. In fact by knowing a few of the output bits you have a pretty good idea of what the input is, or at least you can narrow it down. What you would expect to see if the table is non linear or random is the equation like the one above would be correct half of the time and only half the time. So the goal of this is to look at combinations of input bits and compare them to combinations of the output bits looking for combinations that are 'correct' more than half the time. Once we have found input to output combinations that are more likely than others we can predict how the substitution table is doing it's thing.

So having said all that let me summarize by saying that to do this a person needs to choose all combinations of input bits and compare them to all the combinations of output bits with every possible input to the table. Doing this by hand would take a really long time so I wrote some Visual Basic scripts that run in Excel to do it for me. Computers are real time savers!

I ran into a couple of interesting programming issues when trying to do this. One is a really annoying trait of Visual Basic that allows you to misspell a variable name in a program and it won't give you an error. In fact it  runs along just fine treating the misspelled variable as a brand new variable. In fact I beat my head against the wall trying to figure out why the code didn't give the expected results until I noticed I had misspelled a name in the code... Computers are real time savers!

The other issue that I had a problem with is figuring out how to exclusive or together only a few bits in a byte at a time. What I decided to do it pre-compute the exclusive or of all the bits ahead of time and put them in the Excel sheet. I did this by creating a column of all the numbers from 0 to 255 then right next to it converting them to binary with the Excel formula:
=DEC2BIN(A23,8)
That converts the numbers from decimal to 8 bits of binary ones and zeros. Then I used this formula:
=LEN(G5)-LEN(SUBSTITUTE(G5,"1",""))
to count the numbers of ones in each 8 bit value. After I did that I took the number of 1's and reduced it mod 2 with:
=MOD(A23,2)
which then gave a value of 0 if the number of 1's was even and a 1 if the number of 1's was odd. That is the same as Exclusive Or'ing all the 1's together.

Isn't that fun!! Wow OK!! Once I had the column of 1's and 0's that was equal to the parity of each value from 0 to 255 all I had to do was select every combination of input and output bits and with every combination run through all input values from 0 to 255. Check out the code at the bottom of the post, there are three Visual Basic Functions to look at. One sends to the worksheet the maximum Linear approximation occurrences value (either + or negative with no sign), the input and output bit combinations that created it (one of possibly many combinations) and the number of 0's in the table. The other piece of code prints out the entire Linear Approximation Table table to the screen with the output bit combinations across the top and the input combinations down the side. Be warned this is a big table and it takes about 10 minutes to make it. The last piece of code runs through a table looking for differential cryptanalysis relationships similar to my other post but without the table output.
 
To test these pieces of code I ran them on a couple of substitution tables that I knew the answers to. One was the Advanced Encryption Standard substitution table and another table I got off the internet in a theoretical paper.
 
AES Sub Table has max linear approximation value of 16 and a difference distribution value of 4. The other table that I got off the internet that was proposed as a substitution table has a max linear table value of 22 and a max difference distribution table value of 6.
 
In anyone actually reads all this and notices any errors please let me know!
 
//******************
Excel Visual Basic Code:
Linear Approximation Values Code

This code needs the "Parity Chart" column starting in cell B6 and will print out the max parity value (unsigned) the Input and Output masks that created that parity value - there may be others but at least one set and also print out the number of 0 or non-linear values. The substitution table being investigated should start at cell C6.

Sub LinearList()
' This code will run thru all possible
'  input and output bit combinations (masks)
'  checking them against all inputs, keeping
'  track of the highest + and - parity masks
'  then print out which masks produced the
'  greatest number of parities
' http://ottobelden.blogspot.com
' Locate stuff on the sheet
Dim ParChart As Byte    ' Parity Chart Top
Dim SubTab As Byte      ' Sub Table Top
SubTab = 6      ' Sub Table row 6
ParChart = 6    ' Parity chart row 6
Dim IMasked As Integer ' Mask Values
Dim OMasked As Integer
Dim ParMax As Integer ' keep track of Max Parity
Dim IMaskMax As Integer ' Max input mask
Dim OMaskMax As Integer ' Max output mask
Dim Zeros As Long ' how many zero entries
Dim Temp As Integer ' temp value of parity
Cells(2, 3) = Time
Application.ScreenUpdating = False
For Imask = 1 To 255 ' Input Mask
    For Omask = 1 To 255 ' Output Mask
        For Input1 = 0 To 255 ' Input Value
            IMasked = Input1 And Imask
            IMasked = Cells(IMasked + ParChart, 2)
            OMasked = Cells(Input1 + SubTab, 3)
            OMasked = OMasked And Omask
            OMasked = Cells(OMasked + ParChart, 2)
            If OMasked = IMasked Then
                Temp = Temp + 1
            End If
        Next Input1
        '
    Temp = Abs(Temp - 128)
    If Temp = 0 Then
        Zeros = Zeros + 1
    End If
    If Temp > ParMax Then
        ParMax = Temp
        IMaskMax = Imask
        OMaskMax = Omask
    End If
    Temp = 0
    Next Omask
Next Imask
Application.ScreenUpdating = True
' Print out data
Cells(3, 3) = Time
Cells(5, 6) = ParMax
Cells(6, 6) = IMaskMax
Cells(7, 6) = OMaskMax
Cells(8, 6) = Zeros
End Sub

Linear Approximation Table Code

This code needs the "Parity Chart" column starting in cell B6 and will print out the entire table. The substitution table being investigated should start at cell C6.

Sub Linear()
' This code will create a Linear Approximation Table
'  of a 256 byte substitution table in a column
'  starting at cell SubTab
' Set up 3 loops, one for an input mask, second output
'  mask and one to loop thru all 256 inputs
' An additional table column at cell ParChart is
'  the parity of each value from 0 to 255
' http://ottobelden.blogspot.com
'
'These values locate stuff on the Excel sheet
Dim ParChart As Byte    ' Parity Chart Top
Dim SubTab As Byte      ' Sub Table Top
Dim ChartOut As Byte    ' Output chart column
Dim ChartIn As Byte     ' Output Chart Row
ChartOut = 5    ' Chart column E = 5
ChartIn = 6     ' Chart row = 6
ParChart = 6    ' Parity chart row 6
SubTab = 6      ' Sub Table row 6
Dim IMasked As Long ' input bits masked
Dim OMasked As Long ' output bits masked
Application.ScreenUpdating = False
Cells(2, 3) = Time
For Imask = 0 To 255 ' Input Mask
    For Omask = 0 To 255 ' Output Mask
        For Input1 = 0 To 255 ' Input Value
            IMasked = Input1 And Imask
            IMasked = Cells(IMasked + ParChart, 2)
            OMasked = Cells(Input1 + SubTab, 3)
            OMasked = OMasked And Omask
            OMasked = Cells(OMasked + ParChart, 2)
            If IMasked = OMasked Then
            Cells(ChartIn + Imask, ChartOut + Omask) = Cells(ChartIn + Imask, ChartOut + Omask) + 1
            End If
        Next Input1
    Cells(ChartIn + Imask, ChartOut + Omask) = Cells(ChartIn + Imask, ChartOut + Omask) - 128
    Next Omask
Next Imask
Application.ScreenUpdating = True
Cells(3, 3) = Time
End Sub


Difference Distribution Table Max Value Code

Sub DList()
' This code will create an array
'  corresponding to the Xor or 2 inputs
'  (indif) then index into the array a value
'  that is the Xor of the sub of each
'  input, incrementing the array value
'  then find the max value of the array
' http://ottobelden.blogspot.com
Dim Outs(256) As Integer
Dim Temp As Integer
Dim MaxVal As Integer
Dim Input2 As Integer
Cells(2, 3) = Time
Application.ScreenUpdating = False
For Indif = 1 To 255
    For Input1 = 0 To 255 ' Loop thru input values
        Input2 = Input1 Xor Indif
        Temp = Cells(Input1 + 6, 1) Xor Cells(Input2 + 6, 1)
        Outs(Temp) = Outs(Temp) + 1
    Next Input1
    ' Find max value in array
    For Check = 0 To 255
        If Outs(Check) > MaxVal Then
            MaxVal = Outs(Check)
        End If
    Next Check
Erase Outs
Next Indif
Application.ScreenUpdating = True
Cells(3, 3) = Time
Cells(4, 4) = MaxVal
End Sub

No comments:

Post a Comment