The Digital Insider | Boosting Faith in the Authenticity of Open-Source Software

Open source software — freely distributed software, along with its source code, so that copies, additions, or modifications can be readily made — is “everywhere,” to quote the 2023 Open Source Security and Risk Analysis Report.

Boosting Faith in the Authenticity of Open-Source Software – Technology Org

Speranza – artistic interpretation. Image credit: MIT CSAIL

Ninety-six percent of the computer programs used by major industries include open-source software, and seventy-six percent consist of open-source software. But the percentage of software packages “containing security vulnerabilities remains troublingly high,” the report warned.

One concern is that “the software you’ve gotten from what you believe to be a reliable developer has somehow been compromised,” says Kelsey Merrill, a software engineer who received a master’s degree earlier this year from MIT’s Department of Electrical Engineering and Computer Science.

“Suppose that somewhere in the supply chain, an attacker with malicious intent has changed the software.”

The risk of a security breach of this sort is by no means abstract. In 2020, to take a notorious example, the Texas company SolarWinds made a software update to its widely used program called Orion.

Hackers broke into the system, inserting pernicious code into the software before SolarWinds shipped the latest version of Orion to more than 18,000 customers, including Microsoft, Intel, and roughly one hundred other companies, as well as a dozen U.S. government agencies—including the Departments of State, Defense, Treasury, Commerce, and Homeland Security.

In this case, the corrupted product came from a large commercial company. Still, lapses may be even more likely to occur in the open source realm, “where people of varying backgrounds—many of whom are hobbyists without any security training—can publish software that gets used around the world.”

She and three collaborators—her former advisor Karen Sollins, a Principal Scientist at the MIT Computer Science and Artificial Intelligence Laboratory; Santiago Torres-Arias, an assistant professor of computer science at Purdue University; and Zachary Newman, a former MIT graduate student and current research scientist at Chainguard Labs—have developed a new system called Speranza, which is aimed at reassuring software consumers that the product they are getting has not been tampered with and is coming directly from a source they trust.

“What we have done,” explains Sollins, “is to develop, prove correct, and demonstrate the viability of an approach that allows the [software] maintainers to remain anonymous.” Preserving anonymity is obviously important, given that almost everyone—software developers included—value their confidentiality.

This new approach, Sollins adds, “simultaneously allows [software] users to have confidence that the maintainers are, in fact, legitimate maintainers and, furthermore, that the code being downloaded is, in fact, the correct code of that maintainer.”

So how can users confirm the genuineness of a software package to guarantee, as Merrill puts it, “that the maintainers are who they say they are?” The classical way of doing this, which was invented more than 40 years ago, is by means of a digital signature, which is analogous to a handwritten signature—albeit with far greater built-in security through the use of various cryptographic techniques.

To carry out a digital signature, two “keys” are generated simultaneously—each of which is a number composed of zeros and ones 256 digits long. One key is designated “private,” the other “public,” but they constitute a mathematically linked pair.

A software developer can use their private key and the contents of the document or computer program to generate a digital signature attached exclusively to that document or program. A software user can then use the public key, the developer’s signature, and the contents of the package they downloaded to verify its authenticity.

Validation comes in a yes, no, 1, or zero. “Getting a 1 means the authenticity has been assured,” Merrill explains. “The document is the same as when it was signed, hence unchanged. A 0 means something is amiss; you may not want to rely on that document.”

Although this decades-old approach is tried-and-true in a sense, it is far from perfect. Merrill notes that one problem “is that people are bad at managing cryptographic keys, which consist of very long numbers, in a secure way that prevents them from getting lost.”

People lose their passwords all the time, Merrill says. “And if a software developer were to lose the private key and then contact a user saying, ‘Hey, I have a new key,’ how would you know who that really is?”

To address those concerns, Speranza is building off of “Sigstore”—a system introduced last year to enhance the security of the software supply chain. Sigstore was developed by Newman (who instigated the Speranza project), Torres-Arias, and John Speed Meyers of Chainguard Labs.

Sigstore automates and streamlines the digital signing process. Users no longer have to manage long cryptographic keys. Still, they are instead issued ephemeral keys (an approach called “keyless signing”) that expire quickly—perhaps within minutes—and therefore don’t have to be stored.

A drawback with Sigstore stems from the fact that it dispensed with long-lasting public keys, so software maintainers have to identify themselves—through a protocol called OpenID Connect (OIDC)—in a way that can be linked to their email addresses.

That feature, alone, may inhibit the widespread adoption of Sigstore, and it served as the motivating factor behind—and the raison d’etre for—Speranza. “We take Sigstore’s basic infrastructure and change it to provide privacy guarantees,” Merrill explains.

With Speranza, privacy is achieved through an original idea that she and her collaborators call “identity co-commitments.” Here, in simple terms, is how the idea works: A software developer’s identity, in the form of an email address, is converted into a so-called “commitment” that consists of a big pseudorandom number. (A pseudorandom number does not meet the technical definition of “random” but is practically about as good as random.)

Meanwhile, another big pseudorandom number—the accompanying commitment (or co-commitment)—is generated that is associated with a software package that this developer either created or was granted permission to modify.

In order to demonstrate to a prospective user of a particular software package as to who created this version of the package and signed it, the authorized developer would publish a proof that establishes an unequivocal link between the commitment that represents their identity and the commitment attached to the software product.

The proof that is carried out is of a special type, called a zero-knowledge proof, which is a way of showing, for instance, that two things have a common bound, without divulging details as to what those things—such as the developer’s email address—actually are.

“Speranza ensures that software comes from the correct source without requiring developers to reveal personal information like their email addresses,” comments Marina Moore, a PhD candidate at the New York University Center for Cyber Security.

“It allows verifiers to see that the same developer signed a package several times without revealing who the developer is or even other packages they work on. This provides a usability improvement over long-term signing keys, and a privacy benefit over other OIDC-based solutions like Sigstore.”

Marcela Mellara, a research scientist in the Security and Privacy Research group at Intel Labs, agrees: “This approach has the advantage of allowing software consumers to automatically verify that the package they obtain from a Speranza-enabled repository originated from an expected maintainer and gain trust that the software they are using is authentic.”

Written by Steve Nadis

Source: Massachusetts Institute of Technology