Format string
Last updated
Last updated
In the C programming language, a format string is a string that specifies how to format the output of a particular function. It is often used with the printf()
function, which outputs a formatted string to the console or to a file.
The format string consists of text and for values that will be inserted into the string at runtime. Placeholders are represented by a percent sign (%
), followed by a conversion specifier that specifies the type of value being inserted (e.g., %d
for an integer, %f
for a floating-point number). Additional formatting options can be specified between the percent sign and the conversion specifier.
For example, the following code uses a format string to print a message with a floating-point value:
This will output the following string: "The value is 3.14
"
In this example, the format string is "The value is %.2f\n
" and the placeholder is %.2f
. The conversion specifier %.2f
specifies that the value being inserted is a floating-point number and that it should be formatted with two decimal places.
Format string vulnerabilities can occur when user-supplied input is used as a format string in a call to a function such as printf()
, sprintf()
, or fprintf()
. If an attacker is able to control the format string, they may be able to manipulate the output of the function in unintended ways or even execute arbitrary code.
One common type of format string vulnerability occurs when an attacker is able to supply additional placeholders in the format string that are not properly validated. For example, consider the following code:
This code print the first command-line argument with printf()
as a format string. If an attacker supplies a string with additional placeholders, such as "%s%s%s%s%s%s%s
", the printf()
function will attempt to access additional arguments that have not been supplied, potentially leading to a buffer overflow or other vulnerabilities like:
Denial fo service
When a function such as printf()
is called, arguments are pushed onto the stack in the reverse order in which they appear in the function call. The last argument is pushed onto the stack first, followed by the penultimate argument, and so on. The stack pointer (ESP) is then adjusted to point to the top of the stack, where the first argument is stored.
There is an example of the printf()
function being called with some arguments:
Then the stack look like the followed when all arguments are pushed :
There is values stored into theses HEAP addresses :
As explained, arguments are pushed in the reverse order than the declaration
What happen if there is more format specifier than followed arguments ?
Here is a schema of this behavior :
printf("%s%s%s\n", "A", "B", "C");
If possible, make the format string a constant.
If it isnβt possible, then always specify a format string as part of the program rather than as an input. Most format string vulnerabilities can be patched by simply specifying %s
as the format string.
Then, the first argument is used as formatter. If this variable contains a , the process will read the next variable onto the stack.
Here the first argument contain 3 , the process will read successively the 3 next variable onto the stack : 0x56557020 0x56557018 0x56557010
The process will still read the next value on the stack as if it were an argument. Thus is possible to read arbitrary value into the stack !