RTF Documents Analysis

RTF Internals

The Rich Text Format (RTF) is a plain-text document format developed by Microsoft (1987–2008) for cross-platform compatibility. It encodes text and graphics in a readable, portable format, supports various editors without requiring Microsoft Office, and does not support macros, enhancing security.

RTF files, though lacking macro support, can still be weaponized via embedded objects, binary data, or parser exploits. They are commonly created using WordPad, Microsoft Word, or alternatives like LibreOffice Writer.

Analysis of Malicious RTF Files

RTF (Rich Text Format) files have the capability to embed other files within the RTF file itself. Attackers often use this to embed malware and send it to victims. The vast majority of RTF samples are known to contain embedded shellcode payloads.

The file name has the extension .doc, but if we check using HxD or Trid, we can clearly see it is an RTF file. This is commonly done by threat actors to open this through the MS Word.

trid C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc

For the analysis of RTF files, we can use rtfdump.py. Let's check with RTF Dump now to see if there are any objects present.

python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc

The screenshot above shows the presence of an object in this RTF document. We can view the object in hex using -s 4 to select object 4 (i.e., \*\objupdate53415341). The -H is used to display the output in hex format.

C:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc -s 4 -H | more

The output reveals the keyword equAtIOn.3, linked to the Microsoft Equation Editor, a component vulnerable to CVE-2017-11882, commonly exploited by malware to execute shellcode remotely.

Use Didier Stevens' format-bytes.py tool, located in C:\Tools\MalDoc\Office\Tools\DidierStevensSuite, to analyze structured binary data with format strings. Run it using the following command:

python rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc -s 4 -d | python format-bytes.py -f name=eqn1

The output includes a mix of integers and byte sequences, with each line representing a distinct stream or object component within the RTF file. Key observations include:

Line 9, labeled "Start MTEF header," indicates the presence of a MathType Equation File (MTEF) header, commonly used to embed mathematical equations in documents.
"Bytes (<class 'bytes'>)" refers to a sequence of bytes, potentially part of an embedded object or shellcode.
"Shellcode/Command (fontname)" suggests that the object may contain shellcode or a command disguised as a font name—an evasion technique seen in malicious documents.

This object likely contains shellcode that requires further analysis.

First, let's dump this object with shellcode into a file that we can analyze later. To dump the shellcode, we'll use the --dump option with the --hexcode format.

python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py C:\Tools\MalDoc\Office\Demo\Samples\AgentTesla\rtf\payload_1.doc --select 4 --hexdecode --dump > c:\temp\agenttesla_rtf.sc
powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"

To analyze the shellcode, we can use a shellcode emulator such as scdbg.exe. If we run this directly in the shellcode emulator, it will throw an error, as shown in the screenshot below.

C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc

This error is normal because this is an object that we dumped, and the shellcode entry doesn't start from the beginning of this object. We need to provide the shellcode entry point to scdbg.exe. After that, we can emulate the shellcode.

To analyze the shellcode, we first need to identify its entry point. This allows us to emulate its execution and observe the API calls it makes. The easiest way to locate the entry point is by using XORSearch, a tool from the DidierStevensSuite that detects shellcode by applying built-in or custom wildcard rules (-W and -w options, respectively).

C:\Tools\MalDoc\Office\Tools\DidierStevensSuite\XORSearch.exe -W c:\temp\agenttesla_rtf.sc

We are using XORSearch to analyze the shellcode, and have found multiple instances of GetEIP using various XOR and ROT (rotation) methods. GetEIP is a common technique used in shellcode to determine the current instruction pointer, which is often used in exploits.

In this scenario, XORSearch gives us many positions of the GetEIP method used within the shellcode. If we specify these positions in the shellcode emulator, it should work with any of the positions. Let's try with any of the first four different offsets in this shellcode, (i.e., 00000372, 00000376, 000003AF, and 00000409).

Shellcode Emulation using SCDBG

We'll try these offsets in the shellcode emulator again using the /foff offset flag. Let's start the shellcode emulator again with the first rule triggered by XORSearch, i.e., offset 00000372.

C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc /foff 372

The offset was effective, allowing scdbg.exe to successfully emulate the shellcode, which downloads a malicious file, saves it to %APPDATA%, and executes it to continue the attack.

Execution begins at file offset 372 (0x401372), where the shellcode calls APIs to perform malicious actions—first expanding %APPDATA%\winiti.exe to determine the drop path, then loading UrlMon via LoadLibraryW to enable file downloads.

The malware uses GetProcAddress to resolve URLDownloadToFileW, which downloads winiti.exe from a remote server to the AppData directory. It then loads shell32.dll and executes the file using ShellExecuteW to continue the attack.

Shellcode Emulation using SpeakEasy

The shellcode emulation can also be done using another tool - speakeasy developed by Mandiant.

We'll provide the same offset to speakeasy using the option -r --raw_offset 372. This can be executed using the command below:

speakeasy -t c:\temp\agenttesla_rtf.sc -r -a x86 -r --raw_offset 372

To summarize, we first extracted the suspicious object, namely the shellcode. Subsequently, we were able to identify and extract the Indicators of Compromise (IOCs) from this shellcode.

Questions

Q1) Locate the malicious sample starting with "a60...rtf" in the location "C:\Tools\MalDoc\Office\Demo\Samples\RemcosRAT\rtf". Perform the analysis on this sample and find out which vbs file is being downloaded in AppData. Type the file name as your answer. Answer format is b****************.vbs

python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py a60f72316633a40d5ab45b035ecd03b7cd0162ce161946cfa2ad86d11fbc9c13.rtf

python c:\Tools\MalDoc\Office\Tools\DidierStevensSuite\rtfdump.py a60f72316633a40d5ab45b035ecd03b7cd0162ce161946cfa2ad86d11fbc9c13.rtf --select 4 --hexdecode --dump > c:\temp\agenttesla_rtf.sc
powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"

powershell "Get-Content c:\temp\agenttesla_rtf.sc | Format-Hex | more"

C:\Tools\MalDoc\Office\Tools\scdbg\scdbg.exe /f c:\temp\agenttesla_rtf.sc /foff 946

Answer: beautifulldaykiss.vbs

PreviousExcel Macro Analysis NextBuild Home Lab - SOC Automation

Last updated 15 hours ago