Format string

Introduction

In the C programming language, a format string is a string that specifies how to format the output of a particular function. It is often used with the printf() function, which outputs a formatted string to the console or to a file.

The format string consists of text and placeholders for values that will be inserted into the string at runtime. Placeholders are represented by a percent sign (%), followed by a conversion specifier that specifies the type of value being inserted (e.g., %d for an integer, %f for a floating-point number). Additional formatting options can be specified between the percent sign and the conversion specifier.

For example, the following code uses a format string to print a message with a floating-point value:

Copy code#include <stdio.h>

int main(void) {
  double value = 3.14159;
  printf("The value is %.2f\n", value);
  return 0;
}

This will output the following string: "The value is 3.14"

In this example, the format string is "The value is %.2f\n" and the placeholder is %.2f. The conversion specifier %.2f specifies that the value being inserted is a floating-point number and that it should be formatted with two decimal places.

Vulnerability ?

Format string vulnerabilities can occur when user-supplied input is used as a format string in a call to a function such as printf(), sprintf(), or fprintf(). If an attacker is able to control the format string, they may be able to manipulate the output of the function in unintended ways or even execute arbitrary code.

One common type of format string vulnerability occurs when an attacker is able to supply additional placeholders in the format string that are not properly validated. For example, consider the following code:

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  printf(argv[1]);
  return 0;
}

This code print the first command-line argument with printf() as a format string. If an attacker supplies a string with additional placeholders, such as "%s%s%s%s%s%s%s", the printf() function will attempt to access additional arguments that have not been supplied, potentially leading to a buffer overflow or other vulnerabilities like:

How it works ?

Only the printf() function will be described to highlight the behavior of a format string exploitation. This behavior may change with another vulnerable function, but the logic behind it is likely to be the same.

When a function such as printf() is called, arguments are pushed onto the stack in the reverse order in which they appear in the function call. The last argument is pushed onto the stack first, followed by the penultimate argument, and so on. The stack pointer (ESP) is then adjusted to point to the top of the stack, where the first argument is stored.

Here all the argument are stored into the HEAP, then the address pointer is pushed onto the stack

There is an example of the printf() function being called with some arguments:

Copy code#include <stdio.h>

int main(void) {
  printf("%s%s%s\n", "AAAAAA", "BBBBBBB", "CCCCCCC");
  return 0;
}

Then the stack look like the followed when all arguments are pushed :

   address     |   values
---------------+------------------------------------------------------------------
               |   +------------------- stack frame ---------------------+
               |   | +------------------ arguments -------------------+  |
   0xffffd0ac  |   | | 0x56557028  0x56557020  0x56557018  0x56557010 |  |
               |   | +------------------------------------------------+  |
               |   | +-saved ebp -+ +-saved eip -+                       |
   0xffffd0bc  |   | | 0xffffd18c | | 0x56556217 | ...                   |
               |   | +------------+ +------------+                       |

There is values stored into theses HEAP addresses :

$ x/s 0x56557020
0x56557028:     "%s%s%s\n"

$ x/s 0x56557020
0x56557020:     "AAAAAAA"

$ x/s 0x56557018  
0x56557020:     "BBBBBBB"

$ x/s 0x56557010 
0x56557020:     "CCCCCCC"

As explained, arguments are pushed in the reverse order than the declaration

Then, the first argument is used as formatter. If this variable contains a format specifier, the process will read the next variable onto the stack.

Here the first argument contain 3 format specifier, the process will read successively the 3 next variable onto the stack : 0x56557020 0x56557018 0x56557010

It's possible to highlight this behavior by using the %08x format specifier instead of %s. This specifier will print directly the bytes stored onto the stack instead of the string stored at the pointed HEAP address.

What happen if there is more format specifier than followed arguments ?

Here is a schema of this behavior :

printf("%s%s%s\n", "A", "B", "C");

 +-----------+
 |            |      +-----------------------------------------------+
 | "%s%s%s\n" | ---- | printf() will interprets the format specifier |
 |            |      | and read the followed addresses from the stack|  
 +------------+      +-----------------------------------------------+
 |            |       |
 | Pointer A  | <-----+
 |            |       |
 +------------+       |
 | Pointer B  | <-----+
 +------------+       |
 | Pointer C  | <-----+
 +------------+
 | Saved EBP  |
 +------------+

It's also possible to select a specific argument using a format specifier, eq :

printf("%2$s", "A", "B"); will print the "B" value.

How to prevent ?

If possible, make the format string a constant.

If it isn’t possible, then always specify a format string as part of the program rather than as an input. Most format string vulnerabilities can be patched by simply specifying %s as the format string.

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  printf(argv[1]);
  return 0;
}

Last updated