# Format string

## Introduction&#x20;

In the C programming language, a format string is a string that specifies how to format the output of a particular function. It is often used with the `printf()` function, which outputs a formatted string to the console or to a file.

The format string consists of text and [placeholders ](/pwn/stack-exploitation/format-string/specifier.md)for values that will be inserted into the string at runtime. Placeholders are represented by a percent sign (`%`), followed by a conversion specifier that specifies the type of value being inserted (e.g., `%d` for an integer, `%f` for a floating-point number). Additional formatting options can be specified between the percent sign and the conversion specifier.

For example, the following code uses a format string to print a message with a floating-point value:

```c
Copy code#include <stdio.h>

int main(void) {
  double value = 3.14159;
  printf("The value is %.2f\n", value);
  return 0;
}
```

This will output the following string: "`The value is 3.14`"

In this example, the format string is "`The value is %.2f\n`" and the placeholder is `%.2f`. The conversion specifier `%.2f` specifies that the value being inserted is a floating-point number and that it should be formatted with two decimal places.

## Vulnerability ?&#x20;

Format string vulnerabilities can occur when user-supplied input is used as a format string in a call to a function such as `printf()`, `sprintf()`, or `fprintf()`. If an attacker is able to control the format string, they may be able to manipulate the output of the function in unintended ways or even execute arbitrary code.

One common type of format string vulnerability occurs when an attacker is able to supply additional placeholders in the format string that are not properly validated. For example, consider the following code:

```c
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  printf(argv[1]);
  return 0;
}
```

This code print the first command-line argument with `printf()` as a format string. If an attacker supplies a string with additional placeholders, such as "`%s%s%s%s%s%s%s`", the `printf()` function will attempt to access additional arguments that have not been supplied, potentially leading to a buffer overflow or other vulnerabilities like:&#x20;

* Denial fo service
* [Data leak](/pwn/stack-exploitation/format-string/data-leak.md)
* [Data modification](/pwn/stack-exploitation/format-string/data-modification.md)
* [Arbitratry code execution](/pwn/stack-exploitation/arbitrary-code-execution.md)

## How it works ?&#x20;

{% hint style="info" %}
Only the `printf()` function will be described to highlight the behavior of a format string exploitation. This behavior may change with another vulnerable function, but the logic behind it is likely to be the same.
{% endhint %}

When a function such as `printf()` is called, arguments are pushed onto the stack in the reverse order in which they appear in the function call. The last argument is pushed onto the stack first, followed by the penultimate argument, and so on. The stack pointer (ESP) is then adjusted to point to the top of the stack, where the first argument is stored.

{% hint style="info" %}
Here all the argument are stored into the HEAP, then the address pointer is pushed onto the stack&#x20;
{% endhint %}

There is an example of the `printf()` function being called with some arguments:

```c
Copy code#include <stdio.h>

int main(void) {
  printf("%s%s%s\n", "AAAAAA", "BBBBBBB", "CCCCCCC");
  return 0;
}
```

Then the stack look like the followed when all arguments are pushed :&#x20;

```
   address     |   values
---------------+------------------------------------------------------------------
               |   +------------------- stack frame ---------------------+
               |   | +------------------ arguments -------------------+  |
   0xffffd0ac  |   | | 0x56557028  0x56557020  0x56557018  0x56557010 |  |
               |   | +------------------------------------------------+  |
               |   | +-saved ebp -+ +-saved eip -+                       |
   0xffffd0bc  |   | | 0xffffd18c | | 0x56556217 | ...                   |
               |   | +------------+ +------------+                       |
```

There is values stored into theses HEAP addresses :&#x20;

```
$ x/s 0x56557020
0x56557028:     "%s%s%s\n"

$ x/s 0x56557020
0x56557020:     "AAAAAAA"

$ x/s 0x56557018  
0x56557020:     "BBBBBBB"

$ x/s 0x56557010 
0x56557020:     "CCCCCCC"
```

{% hint style="success" %}
As explained, arguments are pushed in the reverse order than the declaration&#x20;
{% endhint %}

Then, the first argument is used as formatter. If this variable contains a [format specifier](/pwn/stack-exploitation/format-string/specifier.md), the process will read the next variable onto the stack.&#x20;

{% hint style="info" %}
Here the first argument contain 3 [format specifier](/pwn/stack-exploitation/format-string/specifier.md), the process will read successively the 3 next variable onto the stack : `0x56557020  0x56557018  0x56557010`
{% endhint %}

{% hint style="info" %}
It's possible to highlight this behavior by using the `%08x` format specifier instead of `%s.` This specifier will print directly the bytes stored onto the stack instead of the string stored at the pointed HEAP address.&#x20;
{% endhint %}

What happen if there is more format specifier than followed arguments ?&#x20;

:warning: The process will still read the next value on the stack as if it were an argument. **Thus is possible to read arbitrary value into the stack !**&#x20;

Here is a schema of this behavior :&#x20;

{% tabs %}
{% tab title="With good amount of parameters" %}
`printf("%s%s%s\n", "A", "B", "C");`

```
 +-----------+
 |            |      +-----------------------------------------------+
 | "%s%s%s\n" | ---- | printf() will interprets the format specifier |
 |            |      | and read the followed addresses from the stack|  
 +------------+      +-----------------------------------------------+
 |            |       |
 | Pointer A  | <-----+
 |            |       |
 +------------+       |
 | Pointer B  | <-----+
 +------------+       |
 | Pointer C  | <-----+
 +------------+
 | Saved EBP  |
 +------------+
```

{% endtab %}

{% tab title="With less parameters than needed" %}
`prinf("%s%s%s\n", "A", "B");`

```
 +-----------+
 |            |      +-----------------------------------------------+
 | "%s%s%s\n" | ---- | printf() will interprets the format specifier |
 |            |      | and read the followed addresses from the stack|  
 +------------+      +-----------------------------------------------+
 |            |       |
 | Pointer A  | <-----+
 |            |       |
 +------------+       |
 | Pointer B  | <-----+
 +------------+       |
 | Saved EBP  | <-----+
 +------------+
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
It's also possible to select a specific argument using a format specifier, eq :&#x20;

`printf("%2$s", "A", "B");`  will print the "B" value.&#x20;
{% endhint %}

## How to prevent ?&#x20;

If possible, make the format string a constant.

If it isn’t possible, then always specify a format string as part of the program rather than as an input. Most format string vulnerabilities can be patched by simply specifying `%s` as the format string.

{% tabs %}
{% tab title="Patched" %}

```c
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  printf(argv[1]);
  return 0;
}
```

{% endtab %}

{% tab title="Vulnerable" %}

```c
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
  printf("%s\n", argv[1]);
  return 0;
}
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.ctfrecipes.com/pwn/stack-exploitation/format-string.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
