# RTF  Documents Analysis

## RTF Internals

The Rich Text Format (RTF) is a plain-text document format developed by Microsoft (1987–2008) for cross-platform compatibility. It encodes text and graphics in a readable, portable format, supports various editors without requiring Microsoft Office, and does not support macros, enhancing security.

RTF files, though lacking macro support, can still be weaponized via embedded objects, binary data, or parser exploits. They are commonly created using WordPad, Microsoft Word, or alternatives like LibreOffice Writer.

## Analysis of Malicious RTF Files

RTF (Rich Text Format) files have the capability to embed other files within the RTF file itself. Attackers often use this to embed malware and send it to victims. The vast majority of RTF samples are known to contain embedded shellcode payloads.

The file name has the extension `.doc`, but if we check using `HxD` or `Trid`, we can clearly see it is an RTF file. This is commonly done by threat actors to open this through the MS Word.

```bash
trid C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc
```

<figure><img src="/files/caje8O7jSrcfruUPKy0e" alt=""><figcaption></figcaption></figure>

For the analysis of RTF files, we can use `rtfdump.py`. Let's check with RTF Dump now to see if there are any objects present.

```bash
python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc
```

<figure><img src="/files/aPy3OBJjGbBjnINHvKgR" alt=""><figcaption></figcaption></figure>

The screenshot above shows the presence of an object in this RTF document. We can view the object in hex using `-s 4` to select object 4 (i.e., `\*\objupdate53415341`). The `-H` is used to display the output in hex format.

```bash
C:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc -s 4 -H | more
```

<figure><img src="/files/DV2uN514VwLqBIF5WIXz" alt=""><figcaption></figcaption></figure>

The output reveals the keyword **`equAtIOn.3`**, linked to the Microsoft Equation Editor, a component vulnerable to CVE-2017-11882, commonly exploited by malware to execute shellcode remotely.

Use Didier Stevens' `format-bytes.py` tool, located in `C:\Tools\MalDoc\Office\Tools\DidierStevensSuite`, to analyze structured binary data with format strings. Run it using the following command:

```bash
python rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc -s 4 -d | python format-bytes.py -f name=eqn1
```

<figure><img src="/files/PmviKk5RqgIEREprtbK5" alt=""><figcaption></figcaption></figure>

The output includes a mix of integers and byte sequences, with each line representing a distinct stream or object component within the RTF file. Key observations include:

* Line 9, labeled "Start MTEF header," indicates the presence of a MathType Equation File (MTEF) header, commonly used to embed mathematical equations in documents.
* "Bytes (\<class 'bytes'>)" refers to a sequence of bytes, potentially part of an embedded object or shellcode.
* "Shellcode/Command (fontname)" suggests that the object may contain shellcode or a command disguised as a font name—an evasion technique seen in malicious documents.

This object likely contains shellcode that requires further analysis.

First, let's dump this object with shellcode into a file that we can analyze later. To dump the shellcode, we'll use the `--dump` option with the `--hexcode` format.

```bash
python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc --select 4 --hexdecode --dump > c:\temp\agenttesla_rtf.sc
powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"
```

<figure><img src="/files/VAd5BgNEUGA0sXhJW4Kh" alt=""><figcaption></figcaption></figure>

To analyze the shellcode, we can use a shellcode emulator such as `scdbg.exe`. If we run this directly in the shellcode emulator, it will throw an error, as shown in the screenshot below.

```bash
C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc
```

<figure><img src="/files/RLIn64ozN4NUGo0zqp8p" alt=""><figcaption></figcaption></figure>

This error is normal because this is an object that we dumped, and the shellcode entry doesn't start from the beginning of this object. We need to provide the shellcode entry point to `scdbg.exe`. After that, we can emulate the shellcode.

To analyze the shellcode, we first need to identify its entry point. This allows us to emulate its execution and observe the API calls it makes. The easiest way to locate the entry point is by using **`XORSearch`**, a tool from the DidierStevensSuite that detects shellcode by applying built-in or custom wildcard rules (`-W` and `-w` options, respectively).

```bash
C:\Tools\MalDoc\Office\Tools\DidierStevensSuite\XORSearch.exe -W c:\temp\agenttesla_rtf.sc
```

<figure><img src="/files/tdxpjlo5ooxMcKrLOMnR" alt=""><figcaption></figcaption></figure>

We are using XORSearch to analyze the shellcode, and have found multiple instances of `GetEIP` using various XOR and ROT (rotation) methods. `GetEIP` is a common technique used in shellcode to determine the current instruction pointer, which is often used in exploits.

In this scenario, XORSearch gives us many positions of the `GetEIP` method used within the shellcode. If we specify these positions in the shellcode emulator, it should work with any of the positions. Let's try with any of the first four different offsets in this shellcode, (i.e., `00000372`, `00000376`, `000003AF`, and `00000409`).

## Shellcode Emulation using SCDBG

We'll try these offsets in the shellcode emulator again using the `/foff` offset flag. Let's start the shellcode emulator again with the first rule triggered by XORSearch, i.e., offset `00000372`.

```bash
C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc /foff 372
```

<figure><img src="/files/pVnTC1qWVQwuOwFz4dR0" alt=""><figcaption></figcaption></figure>

The offset was effective, allowing scdbg.exe to successfully emulate the shellcode, which downloads a malicious file, saves it to %APPDATA%, and executes it to continue the attack.

Execution begins at file offset 372 (0x401372), where the shellcode calls APIs to perform malicious actions—first expanding `%APPDATA%\winiti.exe` to determine the drop path, then loading `UrlMon` via `LoadLibraryW` to enable file downloads.

The malware uses `GetProcAddress` to resolve `URLDownloadToFileW`, which downloads `winiti.exe` from a remote server to the AppData directory. It then loads `shell32.dll` and executes the file using `ShellExecuteW` to continue the attack.

## Shellcode Emulation using SpeakEasy

The shellcode emulation can also be done using another tool - [speakeasy](https://github.com/mandiant/speakeasy) developed by Mandiant.

We'll provide the same offset to speakeasy using the option `-r --raw_offset 372`. This can be executed using the command below:

```bash
speakeasy -t c:\temp\agenttesla_rtf.sc -r -a x86 -r --raw_offset 372
```

<figure><img src="/files/La7mQXbCUOUX7ABqNLvE" alt=""><figcaption></figcaption></figure>

To summarize, we first extracted the suspicious object, namely the shellcode. Subsequently, we were able to identify and extract the Indicators of Compromise (IOCs) from this shellcode.

## Questions

Q1) Locate the malicious sample starting with "a60...rtf" in the location "C:\Tools\MalDoc\Office\Demo\Samples\RemcosRAT\rtf". Perform the analysis on this sample and find out which vbs file is being downloaded in AppData. Type the file name as your answer. Answer format is b\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*.vbs

```bash
python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py a60f72316633a40d5ab45b035ecd03b7cd0162ce161946cfa2ad86d11fbc9c13.rtf
```

<figure><img src="/files/bXloMIaQxSuzpIbwWBln" alt=""><figcaption></figcaption></figure>

```bash
python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py a60f72316633a40d5ab45b035ecd03b7cd0162ce161946cfa2ad86d11fbc9c13.rtf --select 4 --hexdecode --dump > c:\temp\agenttesla_rtf.sc
powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"
```

<figure><img src="/files/yZiVmXdf6qEjLEvIn9bR" alt=""><figcaption></figcaption></figure>

```powershell
powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"
```

<figure><img src="/files/tLOC91LPCabxBH9SfrwR" alt=""><figcaption></figcaption></figure>

```bash
C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc /foff 946
```

<figure><img src="/files/A8WJnFTXxMbcLBDO1NqJ" alt=""><figcaption></figcaption></figure>

Answer:  `beautifulldaykiss.vbs`&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://faresbltagy.gitbook.io/footprintinglabs/malicious-document-analysis-htb-notes/rtf-documents-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
