# De Bruijn Sequences

A de Bruijn sequence is a sequence of symbols over a given alphabet that contains every possible substring (of a given length) of the alphabet exactly once as a contiguous block. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the **one possible match** within the sequence to calculate the offset.

{% hint style="info" %}
For example, consider an alphabet with 4 symbols:&#x20;

```bash
{0, 1, 2, 3}
```

A de Bruijn sequence of length `2` over this alphabet would be a sequence of symbols that contains every possible two-symbol substring of the alphabet exactly once as a contiguous block.&#x20;

One possible de Bruijn sequence of length 2 over this alphabet is: `0123010220130123`.&#x20;

This sequence contains every possible two-symbol substring of the alphabet exactly once, including&#x20;

{% code overflow="wrap" %}

```bash
"00", "01", "02", "03", "10", "11", "12", "13", "20", "21", "22", "23", and "30", "31", "32", "33".
```

{% endcode %}
{% endhint %}

## Generate sequences

The following command can be used to generate a sequence :&#x20;

<pre class="language-bash"><code class="lang-bash">$ pwn cyclic -n [sub string lenght] [sequence lenght]
<strong>
</strong>$ pwn cyclic -n 4 50
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
</code></pre>

GDB aslo provide a command to generate patterns :&#x20;

```bash
gdb-peda$ pattern create 70
'AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3'
```

## Usage

This type of pattern is mostly used to retrieve the offset between user input and EIP. The entire string will be send as user input, then the program will crash because there is no instruction at any of the possible address (0x61616166 for example) :&#x20;

```bash
Invalid $SP address: 0x61616166
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x56556302 in main ()
gdb-peda$
```

&#x20;Then The offset can be determined by retrieving the sequence into the entire string&#x20;

It's possible to search a specific sequence into the entire string using the `-l` parameter : &#x20;

```bash
$ pwn cyclic -l aaam
45
```

It's also possible directly into GDB :&#x20;

```bash
gdb-peda$ pattern search $eip
Registers contain pattern buffer:
EBX+0 found at offset: 20
ESI+0 found at offset: 24
EIP+0 found at offset: 28
Registers point to pattern buffer:
[EDX] --> offset 0 - size ~70
[ESP] --> offset 32 - size ~38
[EBP] --> offset 40 - size ~30
```

**There is a python code that directly use De Bruijn Sequences to retrieve offset between user input and saved instruction pointer :**&#x20;

```python
from pwn import *

# Connect to the target program
p = process("./chall")

# Send the payload containing cyclic data
payload = cyclic(1024)
p.sendline(payload)

# Print the crash message
print(p.recvall())

# Get the offset to the saved EIP
offset = cyclic_find(p.corefile.fault_addr)

log.success("Offset to saved EIP: {}".format(offset))
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.ctfrecipes.com/pwn/stack-exploitation/stack-buffer-overflow/de-bruijn-sequences.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
