The Hardware Side of Cryptography

15 March 2008

Hardware Implementation of Rijndael SBOX using Logic Gates

Have you ever questioning about how Rijndael’s (AES) SBOX was done is hardware? Or perhaps you were questioning where Rijndael’s SBOX comes from, is it drop from from the sky? Or it just because of Mr. Rijmen and Mr. Daemen found it while jogging in the morning? Just kidding:mrgreen: .

All of that were not true, actually both of two crypto geniuses which is Mr. Rijmen and Mr. Daemen both has strong reason behind the design of Rijndael’s SBOX design. There should be a strong reason behind the scene, yeah for sure.

When i was reading the Offical Rijndael specification (fips-197), something is bothering me, a big question mark exists in my mind. I was just like a dummyhead student who is trying to digest cryptic lecture and pretending to be understand😯 . I can not just swallow the specification and pretending to be convinced. I can’t fool myself, i have to convince my self really 8) .

After sitting several hours in the front of my old computer, finally uncle Google gave me some hints. Aha! a miracle just happened, my broken brain is working. An idea just drop from the sky and i caught it immediately,and it got it😉 !

Ok, lets start the story!

After uncle Google gave me some hints, I found that there are several ways to implement Rijndael’s (AES) SBOX in such hardware sense. It just a lookup table that maps 8 bit combinations of input into unique 8 bit combinations of output.

The easieast wasy to do this is by ROM/EEPROM to emulate lookup table. Even kindergarten kids can do this😀 . But wait, since ROM/EEPROM has fixed parameter such access time, this way is gonna be slow. If, however, your application is not critical in term of speed, you may use this concept. Just to save your time. Beside that, ROM/EEPROM is a bit expensive you have to consider the cost too. I named this trivially as “ROM/EEPROM Lookup Table Emulation”.

Second way of this, which is a bit harder is by decomposing minterm of maxterm. You have to list, when each bit of SBOX output reach “1” and “0”,If you divide 8 bits of input into two nibbles, you will get “high nibble” and “low nibble” while each nibble takes 4 bits. Furthermore, you can decode each combination of 4 bits of low and high nibbles into 16 combinations each. Here, you have 32 outputs, first sixteenth bits came from decoding high nibble and second sixteenth bits came from decoding low nibble. For example, if SBOX results “1” when its input reach 0x56 0r “01010110” for bit say 7, you can gate y5 of first decoder, which is connected to high nibble and y6 of second decoder which is connected to low nibble. This is true since y5 will active when its input reach “0x05” or “0101” and y6 will active when its input reach “0x06” or “0110”. Here we have to consider “what active mean”, whether logic “1” or logic zero, so we can choose proper gate. For example 74LS154 uses “active low”, means its output pin will produce “0” when ever active. We will talk about this implementation later on. It might sounds too intuitive and funny to you, but it works then. Lets just call it “minterm/maxterm decomposition”.

The third one, is about boolean algebra. This way sounds horrible and disgusting to you, lets just swallow it first. I believe that it might bennefit us, at least to satisfy our madness ;p . To do this, you map each bit of output as a function of 8 bits of input, so you have 8 boolean algebraic equations. Since the equation is sum of bit products, we will name it “Sum Of Product Decomposition”. This job is tedious, or probabably will get you sick. But hope not so. Lets sacrifice our brain and deal with those complicated equation just to satisty ourself. However, this equation might be important if you want to design such an ASIC (Application Specific Integrated Circuit) for Rijndael cipher. Lets name this “Sum Of Product decomposition”.

The last one, which named “analytical implementation” was derived theoretically using abstract math. Although it sound more horrible than “Sum Of Product decomposition”, but believe me, this way will result the most efficient Rijndael’s SBOX implementation in logic gates. Furthermore, you can also hack the circuit to reduce circuit cost by combining SBOX and Inverse SBOX of Rijndael algorithm. Hmmm… sound nice and more practical huh🙂 . Besides it results simpler circuit, it also results cheaper circuit. Every engineer are looking for fast, simple and cheap circuit, no one could deny this. However, to proceeed this method, we need to understand abstract math at least in simple way.

Am i talking too much? sorry then. Lets get ourself dirty🙄 . Next, i will explain you those 4 ways of implementing Rijndael’s SBOX and even inverse of Rijndael’s SBOX. Hopefully you get the the point, dont blame me if i confuse you:mrgreen: .

1. ROM Lookup Table

This way of implemention is wasy to do. All you have to do is to allocate 2×256 bytes or 512 Bytes of ROM/EEPROM and fill the first 256 bytes of it with the value of Rijndael’s SBOX and the other 256 bytes by the inverse of Rijndael’s SBOX. Sounds easy huh!. But you have to guarantee when you connect SBOX input to address of ROM/EEPROM and you conncet connect SBOX output from ROM/EEPROM, you get exactly like what the specification said. Simply treat addresses as input port and data busses as output port of SBOX. Below are the table of SBOX and inverse SBOX of rijndael algorithm:

Rijndael SBOX

Rijndael SBOX

Rijndael Inverse SBOX

Rijndael Inverse SBOX

I’ll draw you some pictures also, hopefully it will helps you.

Rijndael SBOB and inverse SBOX Mapping

If you use 256 bytes of ROM/EEPROM use configuration below:

Rijndael SBOX and Inverse SBOX Lookup Table

If you use 512 bytes of ROM/EEPROM and cascaded SBOX and inverse SBOX, use configuration below:

Rijndael Cascaded SBOX and inverse SBOX

Please note that:
x1..x7 : the input lookup table (SBOX or inverse SBOX)
y1..y7 : the output lookup table (SBOX or inverse SBOX)
a1..a7 : the ROM/EEPROM address bits
d1..d7 : the ROM/EEPROM data bits

2. Minterm/Maxterm decomposition

Deriving each minterms of each bit of SBOX/Inverse SBOX probably too much if it shown here, I’ll just show one of them, to keep this explanation clear and understandable. Lets have a look to 7th bit of SBOX table. First, we have to convert hexadecimal notation and then we extract them. Below is a portion of Rijndael SBOX shown in binary.

0x00 01100011
0x01 01111100
0x02 01110111
0x03 01111011
0x04 11110010
…. ……..
…. ……..
…. ……..
…. ……..
0xfc 10110000
0xfd 01010100
0xfe 10111011
0xff 00010110

If we list them, we will find that probability of “0” occurence and probability of “1” occurence on each bit is always the same, means 50%-50% for each of them. you can proof this if you want to, please let me know if there are mistaken here. Sounds logically true, because each of input combination will be mapped uniquely to output combination and no collision at all. In addition, SBOX and inverse of SBOX are reciprocal, this is also proof that there are no collision on both SBOX and Inverse of SBOX. Hopefully you understand what i’m saying. I don’t know the best way to express this. Just let it flow😀 .

For example, If “0” occur on minterm 0x51,0x43,0xa5 and 0xf7 (else will be “1”). Since 74LS54 is an active low 4 to 16 decoder, and we use active low gating, therefore we use OR gate to combine activation signals from both decoder and finally we use AND gate to combine active low signals for OR gates. The picture is depicted below.

Rijndael SBOX (Decoder)

note that i will not implement SBOX or Inverse of SBOX on this way, because its innefective. I wrote this implementation here just to demonstrate you that SBOX can be implemented using minterm/maxterm decomposition. So, no more drama guys, story just ended here :p .

3. Sum Of Product Decomposition

Going further from previous method, now we are going to expand each output bit as a function of 8 bit input. After function decoposed, we can express them as a sum of product. To speed up this step, we will not discuss how equation was derived from truth table i’ll just show you the result. If you learned Fundamental of Digital Design, i’m sure you can solve karnaugh map deriving equation problem while brushing teeth🙂 . In addition, this way of implementation is fast but not cost effective, because you have to such specific circuit for each SBOX and Inverse SBOX implementation. I’ll skip this method and go further to next step.

4. Analytical Implementation

The last way that i found so far to implement SBOX is about decomposition of polynomial boolean function. I really appreciate to Mr. Edwin NC Mui’s paper of Practical Implementation of AES SBOX. This paper inspired me to write analytical implementation of Rijndael SBOX.

From official explanation of AES (fips-197), we showed that SBOX table is defined by:

Rijndael SBOX equation

And Inverse SBOX is defined by :

Rijndael Inverse SBOX Equation

While {a} is the value of original number that we would find for, {b} is the multiplicative inverse of {a} and {b’} is the result of SBOX substitution. In addition, s(a) is operation of finding SBOX substitution of {a} and s-1(a) is operation of finding the inverse SBOX of {a}.

Since matrix multiplication is easlily implemented on Logic gates, now the problem is about finding the multiplicative inverse of GF(28).

Based on paped authored by Edwin NC Mui, which entitled “Practical Implementation of Rijndael S-Box Using Combinational Logic“, we can define several module needed to find the implementation of AES SBOX/Inverse SBOX. The module are:

  • Affine Transformation Module
  • Inverse Affine Transformation Module
  • Isomorphic Mapping Module
  • Inverse Isomorphic Mapping Module
  • GF(24) Squarer Module
  • GF(24) Lambda Multiplication Module
  • GF(24) Multiplicative Inverse Generator Module
  • GF(24) Multiplication Module

First, since affine transformation is defined by:

Rijndael Affine Transformation Equation

If translate the equation above into combinational logic circuit, we will find that the schematic of Affine Transformation Module is drawn as follow:

Rijndael Affine Transform Module

Second, the Inverse affine transformation of Rijndael algorithm is defined by:

Rijndael Inverse Affine Transform Equation

Again, if transform equation above into combinational logic, we will find that the circuit of Inverse Affine Transformation Module is defined as follow:

Third, Isomorphic Mapping is defined by:

Isomorphic Mapping Equation

Forth, Inverse Isomorphic Mapping is defined by:

Inverse Isomorphic Equation

Fifth, The squaring function of GF(24) is defined by:

Rijndael GF(2^4) Squarer Equation

Sixth, GF(24) Lambda Multiplication is defined by:

GF(2^8) multiplication by Lambda
Seventh, GF(24) Multiplicative Inverse is defined by:
(coming soon)
Eighth, GF(24) Multiplication Module is defined by:
(coming soon)

RELATED STUFF

 Practical Implementation of Rijndael S-Box Using Combinational Logic

Link:
AES (Rijndael) Simulator

Blog at WordPress.com.

%d bloggers like this: