The SCOP Stream Cipher

Simeon V. Maltchev          Peter T. Antonov

December, 1997

Original post to sci.crypt.research
news group could be found here.

ABSTRACT

This paper describes a software-efficient encryption algorithm named SCOP. SCOP is a stream cipher specially designed for optimal software performance on the Intel Pentium processor. The computational cost of SCOP on this processor is less than 2 clock cycles per byte of text. The cipher is designed around one key-dependent S-box, one part of which dynamically changes during the encryption process and the other part is static. The cipher works in internal-feedback mode (IFB). The keystream consist of 32-bit words and is generated independently from the encrypted message.

Key words: Cryptography, Fast Software Encryption, Intel Pentium Processor, Stream Cipher

INTRODUCTION

In the last few years the software-optimized cryptography is a subject of active investigations. In 1991 Merkle described the advantages of the encryption functions designed for software implementation and he proposed a suite of three software-efficient algorithms (two block ciphers named Khufu and Khafre, and a one-way hash function named Snefru) [3].

In this paper we will discuss the software-efficient stream ciphers. Such algorithms are SEAL [4], designed by Rogaway and Coppersmith, RC4 [5] designed by Rivest, WAKE [5] designed by Wheeler.

The advent of the Pentium processor established new rules for designing encryption algorithms. Some of them are discussed in [6]. In [1] and [6] is shown that carefully coded implementations of some of the encryption algorithms, designed without Pentium in mind, are able to exploit his superscalar architecture almost to its maximum.

As it is shown above SCOP is specially designed to run on Pentium, but he also can run very fast on other 32-bit processors. SCOP is also designed to meet the following criteria:

SCOP is optimized for applications in which the key does not change very often, e.g. a crypto-channel between two machines or a file encryption/decryption program.

DESCRIPTION OF THE ALGORITHM

SCOP is a stream cipher working in IFB mode and uses a variable size key. The keystream consist of 32-bit words and is independent from the encrypted message. The algorithm has two parts: key expansion and data encryption. The first one converts the key, which is up to 48 bytes long, into an array of 1540 bytes.

The main operation used in the cipher is arithmetical addition of 32- bit words. The additional operations are 4 table lookups, which are performed for every word to be encrypted.

SCOP uses one 32-bit S-box (named V) with 384 entries, two 8-bit indexes i and j, and three 32-bit temporary variables T1, T2 and T3. The S-box V is conventionally divided into two parts: the first 128 32- bit words of V form the first part which is static, and the next 256 32- bit word of V form the second part which changes during the encryption process. The index j addresses only the variable part, while index i addresses the static part and the first half of the changing part (the first 128 32-bit words of the changing part).

The S-box V is initialized with the first 1536 bytes of the expanded key. The next three bytes of the expanded key are used to initialize the indexes i and j, and the temporary variable T3 (the higher 24 bits of T3 are initialized to 0). The usage of the last one byte of the expanded key is a bit special: just the lower 7 bits of it are used. They index a word into the static part of the S-box V. This word is made to be odd without regard whether it is odd before that or not.

The 32-bit keystream word K is generated by the following steps:

1.  T1 = V[128 + j]
2.  j  = (j + T3) mod 256
3.  T2 = V[128 + j]
4.  T3 = V[128 + j] + V[i]
5.  V[128 + j] = T3
6.  i  = (i + 1) mod 256
7.  j  = (j + T2) mod 256
8.  K  = T1 + T2

K is added to the plaintext word to produce the ciphertext word or is subtracted from the ciphertext word to produce the plaintext word.

Key Expansion

The main goal of the cipher design was the data encryption part of the algorithm. On the preprocessing of the key to the array less attention was paid. The array has no special properties and can be regarded as an array of random numbers. For completeness, definition of the key expansion part of the algorithm is given in Appendix A.

DESIGN PRINCIPLES OF THE ALGORITHM

The main idea of the SCOP design is to use one or more S-boxes, which are generated depending on the key and change dynamically during the encryption process, in a manner, which can not be determined from the generated keystream (IFB mode). The goal is to make difficult to determine the internal state of the cipher and to prevent crypto-attacks designed for fixed and static S-boxes.

To achieve high performance, the number of operations necessary to implement the above idea must be minimized. Variants with one or two S-boxes, as well as with one or two index variables are considered.

Algorithm is designed with respect to the following security requirements:

  1. Every word from the S-box which changes during the encryption, has to be changed in a way, which can not be determined from the generated keystream (IFB mode).
  2. Having in mind, that the i-th word (Ki) of the keystream is function of a part of the i-th internal state (pISi) of the keystream generator, then the pISi must be changed immediately after computing the value of Ki.
  3. The change in the values of the indexes of the S-box words, that form the keystream, must be done in a way, which can not be determined from the generated keystream, and the values of these indexes must not form short cycles.
  4. An internal state of the keystream generator after which it is possible to generate only such numbers, the n-th bit of which is always 0 or always 1 (in particular, only even or only odd numbers), must be impossible to happen.

These security requirements of the algorithm will be explained below.
Some performance-related design principles of the algorithm are given:

EXPALANATION OF THE SECURITY REQUIRMENTS OF THE ALGORITHM

We believe that security requirements 1, 2 and the first part of requirement 3 are general security requirements, which must be implemented by all ciphers which uses variable S-box and work in IFB mode.

The last part of security requirement 3 is fairly complex. It is of critical importance for the cipher security, because the forming of short cycles by the values of the indexes will cause the forming of such cycles by the values of the generated keystream words. It is hard to give a mathematical proof that the values of the indexes will not form short cycles. Design heuristics will be explained.

One of the most important points of the algorithm is whether it is secure if none of the indexes is altered through a constant (regarding that when encrypting each plaintext word, in the processor registers the values of T1, T2, S-box word indexed by i, T3, i and j are loaded or calculated, then the new values of the indexes can be formed as a function of some of these values). Two basic variants of the algorithm are considered.

In the first variant only one entirely changing S-box (named V) is used. In this variant there is one simple internal state of the keystream generator after which it is possible to generate only zeros: i = j = 0 and V[0] = 0. This seems to be a common problem for all algorithms with similar design. Because of this problem this variant of the algorithm should be discarded as not secure.

In the second variant of the algorithm two S-boxes are used. The first S-box is entirely static (named C) and the second S-box is entirely changing (named V). Index i is used for addressing words only in the static S-box, and index j - only in the variable S-box. The goal of the static S-box C is that it is possible to be guaranteed in the key expansion part of the algorithm that C[0] # 0, which can be used to solve the above problem.

It can be ascertained that it is possible to come to a cycle, not only when C[0] = 0, but also when in C there are words the lower 8 bits of which are zeros. To avoid these cycles there should not be such words in C. In principle this is not difficult to be guaranteed by the key expansion part of the algorithm, but the further investigations showed that similar cycles are possible when the lower 8 bits of certain words in C are a power of 2, e.g. 128. The absence of such words can also be guaranteed by the key expansion part of the algorithm, but on the whole this variant of the algorithm does not seem to be very reliable.

Depending on the above problems is made the decision one of the indexes to be altered through a constant. In the presented algorithm the value of the constant is chosen to be 1, but it is quite possible to choose any other number, which is a 255 divisor, e.g. 3, 5, 15, 17, ..., 51, 85, 255 (the index is 8 bit wide, so 255 = 2**8 - 1).

The security requirement 4 is not widely used principle of the design of ciphers which use changing S-box, but we consider it is most important for the cipher security.

The decision, that one of the indexes will be altered through a constant makes the static S-box needless. So we can consider the following working variant of the algorithm:

V - variable 32-bit S-box with 256 entries i, j - 8-bit indexes

1.  T1 = V[j]
2.  j  = (j + T3) mod 256
3.  T2 = V[j]
4.  T3 = V[j] + V[i]
5.  V[j] = T3
6.  i  = (i + 1) mod 256
7.  j  = (j + T2) mod 256
8.  K  = T1 + T2

The critical step of this variant of the algorithm is step 4. Depending on whether the addends which take part in it are even or odd, the following cases are possible:

N:      V[j]           V[i]              T3
---------------------------------------------
1.      even    +       even    =       even
2.      odd     +       even    =       odd
3.      even    +       odd     =       odd
4.      odd     +       odd     =       even

The first two cases are not interesting, because in them the proportion between the even and the odd numbers in the S-box does not change. This is not true for the last two cases. In the third case in the S-box appears a new odd number, in the fourth - a new even number.

The disappearing of the even numbers from the keystream is not possible, because even if in a certain moment there are only odd numbers in the S-box, the next generated keystream word, as well as the next number to appear in the S-box (word T3) will be even.

The disappearing of the odd numbers from the keystream is possible yet. If in a certain moment there are only even numbers in the S-box, no odd keystream word will be generated any more. The happening of such internal state is quite possible: only one odd number is left in the S-box, and it is the word T2 for the current encryption iteration, and i is equal to j. Then this number will be added to itself and an even number will take its place.

Note that the solving of the problem with the disappearing of the odd numbers from the S-box and respectively from the keystream, will fully satisfy security requirement 4, because when two odd numbers are added (arithmetical addition), there is a carry from the lowest bits (these at position 0) to the next higher bits (these at position 1), which means that the 1s can not disappear from position 1, and by analogy, the same is true for the bits at position 2, 3, 4, ..., 30, 31.

The main conclusion that can be drawn is that the variants of the algorithm, in which only one variable S-box is used, are not quite reliable, because of the possibility of the disappearing of the odd numbers in the generated keystream. Here is implied that the change of the S-box is such that a new number, probably not present in the S-box, is created and a certain number in the S-box is replaced with it. If the change consists only in the transposition of the numbers in the S-box, the problem does not exist. Unfortunately such a technique of alteration through transposition is not quite applicable for generators that create keystream of 32-bit words, because it should be possible for each 32-bit value to appear in it, i.e. the S-box should consist of 2**32 32-bit words (16 GB).

An adequate solution for the problem is to use two S-boxes, just as it is described above at the discussion of security requirement 3. To prevent the disappearing of the odd numbers from the S-box and respectively from the keystream it is enough to have at least one odd number in the static S-box. This can be easily guaranteed by the key expansion part of the encryption algorithm.

A potential problem of such a variant of the algorithm, compared to the variant with only one variable S-box is that the word, which will be added to V[j] (T2) in step 4 will be the same after each 256 encryption iterations, because C is static and the value of index i changes in a cycle with period 256. This is probably not a serious flaw, because after 256 iterations the word that will be T2 for the current iteration will have with high probability different index and different value.

This flaw can be partially compensated if two S-boxes are used, which partially overlap, or in other words, one S-box which is conventionally divided into two parts: the first part is static, the second one changes during the encryption process. One of the indexes addresses only the variable part (this is index j in our case), while the other index addresses the static part and the first portion of the changing part (this is index i in our case).

In this case the algorithm works in two different modes, as two different algorithms: when index i addresses the static part, the algorithm works as in the case with two separate S-boxes, and when it addresses the variable part - as in the case with one variable S- box. To prevent the disappearing of the odd numbers from the keystream, it is enough to have at least one odd number in the static part of the S-box, which can be easily guaranteed by the key expansion part of the encryption algorithm.

When the static part of the S-box is of length 128 words, the algorithm performs the same number of iterations in its two different modes. But it is quite possible to choose a length different from 128. The minimum allowed length is of 1 word, the value of which should be odd.

Note that it is impossible zeros or ones that disappear from any bit- position of the values of index j, because the value of j is addition function of words of the S-box. This has direct relationship to the problem with the forming of short cycles by the values of this index.

The security requirement 4, which is fully satisfied, and the technique used for that, are some of the main features of the presented in this paper encryption algorithm.

PERFORMANCE

The inner loop of SCOP can be coded with 15 instructions on every 32-bit processor, which has at least 6 registers and which has instructions for: moving data from memory to register and from register to memory; adding and subtracting of two registers; adding register with immediate value and subtracting immediate value from register; mask register with immediate value (AND instruction). If the processor has registers (at least 4) which give direct access to their lower 8 bits, like AL for EAX, the three mask instructions become needless and the algorithm can be coded with 12 instructions. Note that these 12 instructions include two instructions for reading and writing from/to processing plaintext/ ciphertext stream and one instruction for combining plaintext with keystream. These 12 instructions can be executed for at least 6 clocks on Pentium, which is fully achieved. There are no AGI stalls, imperfect pairs or problems with loop-unrolling (of course, there will be performance penalty when the encrypting buffer or the internal table of SCOP are out of on-chip cache). This means that SCOP encrypts data at 1.5 Pentium clocks per byte. So we can say that SCOP uses 100% of the Pentium�s abilities for concurrent instructions execution.

According to the performance times of SEAL [4] and RC4 [5] presented in [6], SCOP is about 2.5 times faster than SEAL and about 4.5 times faster than RC4. Having in mind, that there is no matter on which kind of Pentium processor SCOP is running (i.e. Pentium without MMX, Pentium with MMX, Pentium Pro or Pentium II) we can say that this is the other main feature of the presented in this paper encryption algorithm.

CONCLUDING REMARKS

It is easy to modify SCOP to get a cipher optimized for 64-bit architectures. The inner table would be twice as wide. This would nearly double the cipher�s speed. It is also interesting to implement the cipher using the new advantages of the Intel�s MMX technology. It is unclear whether security would be impacted in these cases.

One thing that the presented paper shows is that it is possible to build a secure software-optimized cipher which is not closed, i.e. it can run on a wide class of processors, but which also take the full advantages of some particular hardware technology.

LEGAL ISSUES

The SCOP encryption algorithm may be used for any legal purpose without payment of royalties to the authors. The SCOP encryption algorithm is the same as the encryption algorithm presented in [2] with little changes in the key expansion part of the algorithm.

SOURCES

Demo C sources.

Optimized Assembler sources.

REFERENCES

  1. A. Bosselaers, R. Govaerts, J. Vandewalle, Fast hashing on the Pentium, ftp://ftp.esat.kuleuven.ac.be/pub/COSIC/bosselae/pentium.ps.gz
  2. S. Maltchev, Analysis and software implementation of encryption methods for communications protection, Master Thesis, Technical University of Varna, Bulgaria, Summer 1997. Language: Bulgarian.
  3. R. Merkle, Fast software encryption functions, ftp://rpub.cl.msu.edu/pub/crypt/docs/merkle-khufu-khafre-snefru.txt
  4. P. Rogaway and D. Coppersmith, A software-optimized encryption algorithm, http://www.cs.ucdavis.edu/~rogaway/papers/seal.ps
  5. B. Schneier, Applied Cryptography, Second Edition: Protocols, Algorithms, and Source Code in C, John Wiley & Sons, Inc., 1996.
  6. B. Schneier and D. Whiting, Fast software encryption: Designing encryption algorithms for optimal software speed on the Intel Pentium processor, http://www.counterpane.com/fast_software_encryption.ps.zip

APPENDIX A

The key expansion part of the algorithm is based on one-way hash function GP8.

Description of GP8:

GP8 consists of 8 polynoms of type:

Y = a*X**4 + b*X**3 + c*X**2 + d*X + 1, where:

Let us denote the first polynom with P0, the second with P1 and so on, the eight with P7.

The hash value generated by GP8 is 128-bit and is computed in the way described below. The eight 16-bit X values form the "internal state" of the function, which is also a 128-bit number.

First, the key is expanded to 384 bits (48 bytes), using the algorithm given below. The first 32 bytes of them are used for determining of all 32 polynom coefficients and the next 16 bytes determine the initial 8 16-bit X values, i.e. the initial internal state of the function. The output of the i-th polynom (32-bit Yi) is regarded as two 16-bit numbers, where the lower 16 bits form the i-th 1/8 part of the generated by GP8 hash value, and the higher 16 bits form 1/8 part of the internal state which will be used by the function to generate the next hash value, i.e. the input X for the polynom No. (i + 1) mod 8.

The initial expansion of the key to 48 bytes is done by the following algorithm:

If k is the length of the key in bytes (k < 48), the i-th byte Pi (k <= i <= 48) is calculated:

P[i] = P[i - k] + P[i - k + 1]

Next the zero bytes are removed, if there are such bytes in the first 32 bytes of these 48 bytes. To do this all bytes from the first to the 32-th byte are processed, and the first encountered 0 is replaced with 1, the second one with 2, etc.

The key expansion algorithm is:

  1. Initially 8 hash values are generated which are not used.
  2. Then 8 hash values are generated (128 bytes). After that 1 hash value is generated that is not used. These two steps are repeated 12 times and 1536 bytes are generated at all.
  3. Finally, 1 hash value is generated from which only the 4 higher bytes are used.

Totally 117 calls to GP8 are needed during the key expansion.

1