CVE-2021-42574 and CVE-2021-42694 impact all popular programming language compilers, such as:

  • Go
  • C#
  • C, C++
  • Rust
  • Java
  • Python
  • JavaScript

“The fact that the Trojan Source vulnerability affects almost all computer languages makes it a rare opportunity for a system-wide and ecologically valid cross-platform and cross-vendor comparison of responses,” the paper [PDF] read.

For your information, compiler programs are responsible for interpreting high-level human-readable source code into their lower-level representations that the OS can execute. These include object code, assembly language, and machine code.

How is Unicode Algorithm Exploited?

The core issue lies in the Bidi (bidirectional) algorithm of Unicode. This algorithm encourages support for left-to-right and right-to-left languages, such as English and Arabic, respectively. Moreover, it also features Bidi overrides to enable writing of left-to-right words within a right-to-left sentence or vice versa. Hence, it forces the left-to-right text to be used as right-to-left.

'Trojan Source' Bug Lets Hackers Exploit Source Code

Unicode directionality formatting characters relevant to reordering attacks.

But while the compiler’s output is required to implement the source code correctly, any alterations generated by injecting Unicode Bidi override characters into strings and comments can yield a syntactically valid source code where the characters’ display order present a different logic from the actual one.

The Attack details

The source code files’ encoding is exploited to create targeted vulnerabilities instead of introducing logical bugs independently. This allows visual reordering of tokens in the source code. When rendered acceptably, the compiler is tricked into processing the code in a novel way, thus modifying the program flow. For instance, it can make a comment appear as a code.

Therefore, if Program A is anagrammed into Program B, the change in code logic would be subtle enough to remain undetected in further testing as an adversary can introduce targeted vulnerabilities, and these would remain hidden.

“You can use them in source code that appears innocuous to a human reviewer [that] can actually do something nasty. That’s bad news for projects like Linux and Webkit that accept contributions from random people, subject them to manual review, then incorporate them into critical code. This vulnerability is, as far as I know, the first one to affect almost everything,” wrote Ross Anderson.

Impact on The Supply Chain

These encodings can impact the supply chain because when invisible software vulnerabilities are injected into open-source software, it will eventually affect all users. Furthermore, researchers warned that Trojan Source attacks’ impact could be severer if an attacker uses homoglyphs to redefine pre-existing functions within an upstream package, thus, invoking them from a victim program.

“As powerful supply-chain attacks can be launched easily using these techniques, it is essential for organizations that participate in a software supply chain to implement defenses,” researchers warned.

Did you enjoy reading this article? Like our page on Facebook and follow us on Twitter.

Posted by Charlie