CFR2D3.DO by James Main Kenney 1997
Part 3 (of 5) documentation for encryption program CFRJMK version 2.0 (CFRT20.BA for the Tandy Radio Shack TRS-80 Model 100/102/200; CFRN20.BA for the NEC PC-8201A/8300; CFRJMK20.BAS for GW-BASIC/QBASIC, compiled as CFRJMK20.EXE for MS-DOS/Windows)
Built-in Tests: Randomness and Correlation
There are no ciphers known to be perfectly secure other than a "one-time" cipher, whose unbreakability is universally accepted. It would be nice to have a rating system for relative security, but only a government organization such as the NSA could provide such information (such as the relative time they needed to break various ciphers), but will not because they are dedicated to violating privacy rather than protecting it. Unless a non-government cryptanalyst has published the results of his attacks on a cipher, the only way for a private citizen to personally evaluate it is to try to judge the quality of the algorithm and to perform certain tests.
CFRJMK is easy to study, since the source code is in BASIC and the algorithm is explained. To further inspire confidence, it contains some built-in tests: for the correlation between input and output, for the randomness of the fontstring, and for the visible presence of non-random behavior in scatter diagrams of the output versus the input, the working key values, and the working key versus time.
The option is given of performing a calculation of the coefficient of correlation (ideally zero) between corresponding values of plaintext and ciphertext (while at the same time displaying that data on a scatter diagram, as described below). The calculation is completed and displayed whenever ENTER or any other key is pressed except ESC or the spacebar (or keys P, B, S, O, K, or T using CFRJMK20.BAS/.EXE), and at the end of encryption or decryption.
A follow-up prompt allows the complete calculation of the correlation coefficient after the encryption or decryption of each character, using the data to that point; while very slow, this gives a feel for how the correlation is converging. Convergence is inherently slow, requiring much data for an adequate test.
Using CFRJMK20.BAS/.EXE, pressing key C during the input vs. output display toggles between manual and automatic calculation of correlation coefficient. Since CFRJMK20.BAS/.EXE allows the display to be changed during encryption, a count of the data pairs used for the calculation of correlation coefficient is given, since it may not be the same as the number of plaintext characters as it is for CFRT20.BA and CFRN20.BA. Once a graphic screen has been chosen (by selecting a pixel plot), the correlation data is accumulated during any display.
To facilitate the evaluation of fontstring randomness (an important contributor to working-key randomness), each complete (95-character) initial permutation is followed by a count of the number of "runs" in the fontstring, i.e., in the number of up-and-down swings in the ASCII values of the characters taken in sequence. After the second and subsequent permutations, the mean and sample standard deviation of the number of runs for all completed permutations are also displayed. There is a statistical theorem that in a (repeated) sequence of n unique random numbers, the number of runs is a random variable with
mean = (2n-1)/3
and
variance = (16n-29)/90.
(Distribution-Free Statistical Tests, James V. Bradley, Prentice-Hall, 1968, p.279) For a 95 number sequence, therefore, if the numbers are random then the mean number of runs is 63 with a standard deviation of 4.070217.... As with correlation coefficient, the convergence is slow, requiring many permutations for an adequate test.
On the scatter diagram (pixel plot) for ciphertext vs. plaintext, the dashed border is for calibration, not decoration: the break on every fifth pixel corresponds to the unshifted ASCII values ending in 0 or 5. Because of the limited vertical screen size in the 8-line notebooks, the plots used by the Tandy and NEC versions of CFRJMK are split into two boxes: ASCII input values are plotted downward from 32 to 93 in the left box, continuing downward from 94 to 126 in the right box, with output values from 32 to 126 rightward in each box. This plot is accompanied by a calculation of correlation coefficient (described above). CFRJMK20.BAS/.EXE plots the input horizontally and the output vertically in a single box. Look for black or white streaks corresponding to poor distribution of ciphertext values, but expect streaks from non-random plaintext. Using CFRJMK20.BAS/.EXE, the appearance of the pixel plots can be altered by the choice of screen graphics.
In the key plot, all 8192 of the possible working key values (0 - 8191) are plotted at separate pixel locations, 240 on a line by CFRT20.BA and CFRN20.BA and, in three plots: 95 on a line, 128 on a line, and 64 on a line by CFRJMK20.BAS/.EXE. The working key is the number from which the ASCII value of the plaintext or ciphertext character is subtracted during encryption or decryption. Look for the randomness of the distribution of values over time, and for the generation of all values (a filled-in plot) during a lengthy encryption.
To look for possible periodicity in the working key, the working key vs. time is plotted twice by CFRT20.BA and CFRN20.BA with different (vertical) time moduli (95 and 19), the second at the bottom of the right box. For CFRJMK20.BAS/.EXE there are five plots, the top four with horizontal time scales repeating after 95, 81, 69, and 59 cycles, and the bottom (with a full screen width) which normally repeats after 320 or 640 cycles but can be set to fewer cycles (or returned to full width) by pressing ENTER or any other key except ESC, the spacebar, or keys C, P, B, S, O, K, or T. This is analogous to using different sweep rates on an oscilloscope. The working key is displayed vertically modulo 95, since encryption is done with that modulus.
While observing any of the pixel plots, the screen may be cleared at any time by pressing the spacebar, so that new data can be distinguished from old. This does not affect the encryption or decryption or the width of the lower time-key display, but does erase the in-out calibration box. It may be useful for seeing the time sequence and for verifying the repetition of any perceived pattern, such as lines or areas being favored or avoided.
Other tests can be made, useful for any encryption program, by using a large plaintext file containing only repetitions of a single character. The ciphertext will then reflect the randomness of only the working key. If the cipher produced is ASCII text, as CFRJMK provides, count the number of characters of each ASCII value (using the author's CHRCNT.BA or CHRCOUNT.BAS/.EXE, e.g.) and examine the distribution of values.
Also, search the ciphertext for repetitions of, e.g., three or four-character strings found in different parts of the ciphertext; if duplicate strings are found, compare the adjacent characters. Select strings from all through the file, since repetitions may start well below the top. Less obvious weaknesses may include linear transformations other than simple repetition: a string such as "ABCD", e.g., could conceivably be altered by a constant shift to "jklm" or "3456", or by a linear shift to "jlnp" or "3579", etc. (entering an area requiring special software for adequate analysis).
Even the best algorithm will not produce secure ciphertext if the key is not adequate or is compromised. The key must be sufficiently large to defeat a "brute-force" attack, sufficiently complex to avoid being guessed, and secure from seizure. CFRJMK makes it easy to satisfy all of these requirements.