Dual EC DRBG |
Performance of the attacksThis page summarizes the latest news regarding the cost of computations to exploit Dual EC in TLS. This news is from the 2014 research paper On the Practical Exploitability of Dual EC in TLS Implementations by Checkoway, Fredrikson, Niederhagen, Everspaugh, Green, Lange, Ristenpart, Bernstein, Maskiewicz, and Shacham. Major parts of the computation are totally independent and can be performed in an arbitrary order, in particular in parallel. Therefore, an attacker can use a huge number of processor cores, for example multicore processors or computer clusters. The authors of the paper implemented the attacks using the standards OpenMP (to distribute the workload over threads running on different processor cores) and MPI (to distribute the workload over several processes running on different CPUs, e.g., in a computing cluster). The performance of the attacks was measured on
Performance optimizationsShumow and Ferguson mentioned experiments when they announced the basic back door at Crypto 2007 but did not publish their code. The first publicly available code was from a blog Dual_EC_DRBG backdoor: a proof of concept by Aris Adamantiadis in December 2013. He implemented a proof of concept for the basic back door. Adamantiadis' code recovers the state from 30 output bytes within 12.1 seconds on the reference CPU. The authors of the 2014 paper used optimized routines for the elliptic curve computations. This takes only 3.3 seconds on the reference CPU for the same computation described by Adamantiadis. The biggest speedup is obtained for computations involving elliptic curve points that are fixed (the P and Q in the Description of Dual EC) but also the curve arithmetic and various functions are much faster. Real-world targetsThe authors of the 2014 paper also showed that the number and positioning (output gaps and alignment) of available Dual EC output bytes in TLS varies significantly between the different attack targets. As a consequence the run time of the attack on the reference CPU varies strongly (from a few seconds to several hours) between different attack targets. The fastest actual attack requires significantly more computation than the proof of concept. To give an estimation of the computing cost for each attack, the researchers measured the average run time on the reference CPU. In order to mount real-time surveillance, the attack must be finished in a small amount of time. Therefore, the researchers quantified the computing cost of the attacks by computing the number of reference CPUs required to finish the attack within one second. To prove the scalability of the attack on a parallel computing system, the researchers measured the worst-case run time of the attack on the reference cluster. General approach for real-world attacksThe general approach for all attacks described in the research paper is the same (explained in more detail in The basic back door): For a given random output string derived from the handshake message, several missing bits need to be guessed to obtain the original x coordinate from the point on the elliptic curve (i.e., the corresponding x coordinate for the integer r in the Description of Dual EC). "Guessing bits" means to try all possible choices for the bits until the correct value is found: For each candidate of the x coordinate, the corresponding y coordinate is computed. About half of the candidates do not belong to a point on the elliptic curve (there is no corresponding y coordinate), so the check for these candidates can be aborted. For candidates that do belong to a point on the curve, the back-door computation is applied to compute a candidate for the inner state, and subsequent random output is computed. The random output is compared to corresponding data fields in the handshake message. If it does not fit, the candidate is discarded. If it fits, the candidate is the correct one with high probability and more outputs can be computed and compared to the handshake message until eventually the secret keying data is recovered. For each attack, the number of bits that need to be "guessed" and the amount of computation necessary to compute subsequent random output strings varies. Thus, each attack has a different total cost and a different average run time. BSAFEThe researchers investigated two versions of the BSAFE library: RSA BSAFE Share for C/C++ ("BSAFE-C") and RSA BSAFE Share for Java ("BSAFE-Java").
Windows SChannelThere are two situations for the paper's SChannel attacks: situation I, where the attacker has access to RNG output from the session prior to the one being attacked, and situation II, where the attacker has no previous RNG output. The attack is noticeably faster in situation I. This means that an attacker with access to all communication data has an advantage over one with access to only one target session, even when it comes to attacking that single session.
OpenSSL-fixedSimilar to BSAFE-C, also for OpenSSL 30 bytes of the x coordinate are known. However, OpenSSL adds entropy in every function call to Dual EC that also needs to be guessed by the attacker. This additional input consists of the system time in microseconds, process ID (pid), and an incremental counter value. The cost of the attack on OpenSSL-fixed depends on how much information about the additional input is available.
The following table summarizes the results:
Authors of this "Performance of the attacks" page (alphabetical order)
Last modified: 2014.07.07 |