Monday, March 24, 2014

How the Dual EC DRBG Backdoor Works

In December 2013, Reuters reported that the National Security Agency had paid RSA ten million dollars to use a random number generation algorithm that contained a backdoor.  The algorithm, Dual_EC_DRBG, is the default random number generation algorithm in RSA's BSAFE toolkit which is used by other companies to implement cryptography in their products.  These allegations have seriously damaged the reputations of RSA and of the NSA who had established itself as a partner in the security community.

A great number of articles were published about the RSA-NSA deal after Reuters first reported on the matter.  Many of these make misleading or untrue technical assertions and few of them have attempted to provide any explanation as to how this supposed backdoor works.  Some commentators have claimed that this backdoor (if true) made the products that used the BSAFE toolkit more vulnerable to attack.  This is (mostly) not true.  Because the backdoor, whose possibility was first speculated on by Dan Shumow and Niels Ferguson of Microsoft, relies on techniques from public key cryptography, it is only usable by a person that knows the key and that key cannot be discovered by another party, even if that party knows the full details of the algorithm.

This post explains the Dual_EC_DRBG algorithm and how a backdoor could be implemented.  It is meant to be accessible to non-cryptographers. We’ll begin with a really brief review of modular arithmetic and the discrete logarithm problem, discuss basic operations on elliptic curves and introduce the discrete logarithm problem over elliptic curves (ECDLP).  Then, we’ll see how Dual_EC_DRBG can be engineered to contain a backdoor and explain why the backdoor is only usable to someone who knows the key.  If all of this is new to you, I recommend reading an introductory text such as this one.

Note: I originally wrote this as a paper for a graduate class.  In revising this for my blog, I've removed all of the in-text citations and tried to replace them with links.  My original references are listed at the end.  The point of this post is to explain ideas originated by others; the original ideas are not mine.

Modular Arithmetic

Most of this paper requires us to use modular arithmetic.  Modular arithmetic, which can be thought of us "clock math"  requires each number to be reduced modulo a number p.   In other words, we divide each number by p and only keep the remainder.  A few quick examples:

17 ≡ 1 mod 4     (17 ÷ 4 = 4 remainder 1)
19 ≡ 3 mod 4     (19 ÷ 4 = 3 remainder 1)

This is the same type of math that we use when telling time.  Two numbers that have the same value modulo p are said to be congruent.  Thus, 1, 17, 21 and 25 are all congruent modulo 4 (they all have the remainder 1).

The Discrete Logarithm Problem

The security behind Dual_EC_DRBG depends on a problem in number theory called the Discrete Logarithm Problem (DLP).  Over the real or complex numbers (which we used in high school algebra), logarithms can be solved easily-- In cryptography, a problem is "easy" to solve if there is an efficient algorithm for doing so.  It does not mean that the method is simple.   Logarithms over the real numbers can be solved in real time using a calculator or worked out by hand using methods from Calculus. 

Logarithms are the reverse of exponentiation.  So:
103 = 1,000   and
log10 (1,000) = 3

We can also use logarithms to find exponents that are fractional such as:
103.3 = 1,995.26
log10 (1,995.26) 3.3

 Discrete logarithms, however, require an integral solution (i.e. a whole number) and use modular arithmetic.  Unlike logarithms taken over the real or complex numbers, discrete logarithms are not continuous (if you graph them the points jump around, seemingly at random) and cannot be solved using the traditional methods from calculus. 

In mathematical terms, we need to find x where: ax ≡ b mod p.  Consider the example:

2x ≡ 9 mod 19
Using brute force, we can quickly find:
20 ≡ 1 mod 19
21 ≡ 2 mod 19
22 ≡ 4 mod 19
23 ≡ 8 mod 19
24 ≡ 16 mod 19
25 ≡ 13 mod 19
26 ≡ 7 mod 19
27 ≡ 14 mod 19
28 ≡ 9 mod 19     <------x is 8

For small numbers, like the ones in the example above, finding x is not a problem.  But, if the numbers used are very large, the DLP is considered a hard problem.  There are algorithms such as Shanks' Baby-Step Giant-Step Method, Pollard's Rho Method, the Pohlig-Hellman Algorithm and the Index-Calculus Method that are faster than brute force, but none of them are practical for large numbers.

Elliptic Curves

 Cryptosystems based on the mathematics of elliptic curves, i.e. elliptic curve cryptography (ECC), have gained in popularity since they were discovered in the 1980s because they seem to offer comparable levels of security to other public key algorithms (e.g.Diffie-Hellman, ElGamal and RSA) with much smaller keys which allows them to run much more efficiently.  For instance, ECC using 160-bit keys offers comparable security to 1024-bit RSA.

This section will cover some of the key features of elliptic curves but, the depth is limited.  More background is provided in textbooks such as Understanding Crypotography and Introduction to Mathematical Cryptography, as well as many more advanced texts.

Elliptic curves are equations with the form: y2 = x3 + ax + b.  Over the real numbers, these equations are continuous and can be easily graphed (See figure 1).  The solutions to these equations are points (x,y) which fulfill the equation for the curve.  As with logarithms, when elliptic curves are calculated mod p, the solutions jump around, seemingly at random. 


Figure 1: graph[1] of the equation: y2 = x3 -8x +11


ECC depends on the operations of "point addition" and "point doubling".  Point addition is written as P + Q = R for points P and Q.  This is easy to understand visually, but harder to calculate.  In order to add two points on an elliptic curve, we first draw a straight line that extends through both points. This line will intercept the graph at a third point.  We then them mirror this point across the X axis (multiply the y value by -1) and get the new point R. 


Figure 2: Point Addition on an Elliptic Curve[2]: (P + Q = R)

Point doubling is similar to addition.  To perform point doubling (P + P = 2P = R) we draw the tangent line at the point P and find a new point of intersection with the graph as we did with point addition.  We then mirror this point across the X axis (multiply the y value by -1) and get a new point R.

The Elliptic Curve Discrete Logarithm Problem

The elliptic curve discrete logarithm problem (ECDLP) is similar to the ordinary discrete logarithm problem except that it involves point addition on elliptic curves instead of exponentiation.  It is also considered to be a hard problem.  Given a starting point P and an ending point T, the ECDLP challenges us to find the value x such that T = xP = P +...+ P (x times). 
             
Dual_EC_DRBG

Dual_EC_DRBG is a random number generator that uses elliptic curve operations.  A brief description of the algorithm is as follows:

Variables:
ϕ : The elliptic curve equation
P, Q: Points on the curve
S: The current internal state of the RNG
R: intermediate value (not output)
T: output value

Operation:
R = P + P + P + ...+ P (S times) = S*P
Snew = R*P
T = Q + Q + Q +...+ Q (R times) = R*Q
Truncate 16 bits from T and output T

For accuracy, it’s worth noting that T is actually just the X value from the point R*Q.  The algorithm throws away 16 bits from the X value and outputs that. 

In order to be secure, a random number generator should not allow us to predict future outputs based on past output.  If we can query the generator for one output number and use that to predict the next output(s), we will have broken the algorithm.

The strength of Dual_EC_DRBG depends on the fact that even if you know the value R*Q, it is computationally hard to find the value S=R*P.  Remember: operations in Dual_EC_DRBG are conducted over an elliptic curve ρ.  R*Q does not mean R times Q in the traditional sense, it means to perform point addition Q + ... + Q (R times) over the elliptic curve ρ.  Since the points P and Q are public and chosen in advance, the most straightforward way to find R*P would be to use T=R*Q to find R, but this requires us to solve the elliptic curve discrete logarithm problem.  If it were easy to solve the ECDLP or otherwise calculate R from R*Q, this generator would be easy to break.
           

Figure 3 - Dual_EC_DRBG

Dual_EC_DRBG Problems

The first problem with Dual_EC_DRBG is that it only throws away 16 bits of T before it is output.  This means that we only need 65,536 guesses at the most to find R*Q.  This would not be an issue if the algorithm threw away more bits (e.g. half) or if it hashed T before outputting.  It also gives the output to the generator a small bias.  For a cryptographic random number generator, even a small bias is very bad. 

The next, and much more serious problem, with Dual_EC_DRBG is the selection of the values of P and Q.  If P and Q are chosen randomly with no known relation between them, then the algorithm is reasonably secure.  But, if P and Q are chosen so that there is a mathematical relationship such as Q = D*P, it would be easy for a person who knows this relationship to also find the inverse of D so that P = E*Q and this would allow them to completely break the algorithm (see Shumow and Ferguson).  

Suppose that Q = D*P and that we know E, the inverse of D.  Since we can determine R*Q, or at least restrict ourselves to a small number of guesses at R*Q, we just need to calculate:

(R*Q)*E  =  R*(P*D)*E = R*P.

NIST did not reveal any details about the selection of P and Q.  It's possible that they were chosen at random and that neither the NIST or the NSA know of any relationship between the two.  But, the discovery that it is possible to easily backdoor the algorithm with a careful selection of P and Q, the lack of details about the selection of P and Q, and the allegation by Reuters that the NSA paid RSA ten million dollars to include Dual_EC_DRBG in the BSAFE toolkit and make it the default algorithm are enough for us to assume that the NSA can break the algorithm.  

Conclusions about Dual_EC_DRBG

The assertion that Dual_EC_DRBG put the government at risk (since the government uses BSAFE) is mostly not true.  The bias in the output mentioned earlier is concerning, but there are no known attacks against Dual_EC_DRBG unless you have pre-existing knowledge of the relationship between P and Q.  In other words, this backdoor (if true as alleged) allows the NSA to break Dual_EC_DRBG but does not make it much vulnerable to anyone else.  This is much different than a backdoor password which would be immediately usable by any adversary who discovered it (e.g. by reverse engineering the code).

Solutions

The parameters chosen for Dual_EC_DRBG are not mandatory, they are defaults.  Any implementer is free to choose his own parameters for P and Q.  Any organization worried about the NSA's selection of P and Q can pick a new P and Q to avoid the problem entirely.

If we were to redesign Dual_EC_DRBG, there are three improvements we could make that would prevent a backdoor of this type from being implemented:

  1. Select a new P and Q and document how they were generated.  We could, for instance, hash digits of pi or the verses of a poem to generate either value.  This would provide assurance that we did not conceive P and Q with a known  mathematical relationship between them.
  2. Throw away half of the bits of T before we output it.  This would make it much harder to work backward to discover R*Q.
  3. Hash the bits of T using an algorithm such as SHA-2 or SHA-3.  This would make it infeasible to discover R*Q  because we'd have to be able to invert the hash and it would destroy any mathematical relationship between P and Q.  This would also eliminate concerns about the bias in T.


References and Further Reading

Barker, E., and Kelsey, J.  (2012).  NIST Special Publication 800-90A: Recommendation for Random
Number Generation Using Deterministic Random Bit Generators.

Green, M.  (2013).  The Many Flaws of Dual_EC_DRBG. 

Menn, J.  (2013).  Exclusive: Secret Contract Tied NSA and Security Industry Pioneer.  
http://www.reuters.com/article/2013/12/20/us-usa-security-rsa-idUSBRE9BJ1C220131220

Schneier, B.  (2007).  The Strange Story of Dual_EC_DRBG.

Shumow, D., and Ferguson, N.  (2007).  On the Possibility of a Back Door in the
NIST SP800-90 Dual EC PRNG.

Winkler, I.  (2014).  The RSA Conference Boycott Is Nonsense.  

No comments:

Post a Comment

Understanding Scope in Go

As per my New Year's resolution, I've been learning to program in Go and reading  The Go Programming Language .   On page 141 of the...