RTF Documents Analysis
Last updated
Last updated
The Rich Text Format (RTF) is a plain-text document format developed by Microsoft (1987–2008) for cross-platform compatibility. It encodes text and graphics in a readable, portable format, supports various editors without requiring Microsoft Office, and does not support macros, enhancing security.
RTF files, though lacking macro support, can still be weaponized via embedded objects, binary data, or parser exploits. They are commonly created using WordPad, Microsoft Word, or alternatives like LibreOffice Writer.
RTF (Rich Text Format) files have the capability to embed other files within the RTF file itself. Attackers often use this to embed malware and send it to victims. The vast majority of RTF samples are known to contain embedded shellcode payloads.
The file name has the extension .doc
, but if we check using HxD
or Trid
, we can clearly see it is an RTF file. This is commonly done by threat actors to open this through the MS Word.
For the analysis of RTF files, we can use rtfdump.py
. Let's check with RTF Dump now to see if there are any objects present.
The screenshot above shows the presence of an object in this RTF document. We can view the object in hex using -s 4
to select object 4 (i.e., \*\objupdate53415341
). The -H
is used to display the output in hex format.
The output reveals the keyword equAtIOn.3
, linked to the Microsoft Equation Editor, a component vulnerable to CVE-2017-11882, commonly exploited by malware to execute shellcode remotely.
Use Didier Stevens' format-bytes.py
tool, located in C:\Tools\MalDoc\Office\Tools\DidierStevensSuite
, to analyze structured binary data with format strings. Run it using the following command:
The output includes a mix of integers and byte sequences, with each line representing a distinct stream or object component within the RTF file. Key observations include:
Line 9, labeled "Start MTEF header," indicates the presence of a MathType Equation File (MTEF) header, commonly used to embed mathematical equations in documents.
"Bytes (<class 'bytes'>)" refers to a sequence of bytes, potentially part of an embedded object or shellcode.
"Shellcode/Command (fontname)" suggests that the object may contain shellcode or a command disguised as a font name—an evasion technique seen in malicious documents.
This object likely contains shellcode that requires further analysis.
First, let's dump this object with shellcode into a file that we can analyze later. To dump the shellcode, we'll use the --dump
option with the --hexcode
format.
To analyze the shellcode, we can use a shellcode emulator such as scdbg.exe
. If we run this directly in the shellcode emulator, it will throw an error, as shown in the screenshot below.
This error is normal because this is an object that we dumped, and the shellcode entry doesn't start from the beginning of this object. We need to provide the shellcode entry point to scdbg.exe
. After that, we can emulate the shellcode.
To analyze the shellcode, we first need to identify its entry point. This allows us to emulate its execution and observe the API calls it makes. The easiest way to locate the entry point is by using XORSearch
, a tool from the DidierStevensSuite that detects shellcode by applying built-in or custom wildcard rules (-W
and -w
options, respectively).
We are using XORSearch to analyze the shellcode, and have found multiple instances of GetEIP
using various XOR and ROT (rotation) methods. GetEIP
is a common technique used in shellcode to determine the current instruction pointer, which is often used in exploits.
In this scenario, XORSearch gives us many positions of the GetEIP
method used within the shellcode. If we specify these positions in the shellcode emulator, it should work with any of the positions. Let's try with any of the first four different offsets in this shellcode, (i.e., 00000372
, 00000376
, 000003AF
, and 00000409
).
We'll try these offsets in the shellcode emulator again using the /foff
offset flag. Let's start the shellcode emulator again with the first rule triggered by XORSearch, i.e., offset 00000372
.
The offset was effective, allowing scdbg.exe to successfully emulate the shellcode, which downloads a malicious file, saves it to %APPDATA%, and executes it to continue the attack.
Execution begins at file offset 372 (0x401372), where the shellcode calls APIs to perform malicious actions—first expanding %APPDATA%\winiti.exe
to determine the drop path, then loading UrlMon
via LoadLibraryW
to enable file downloads.
The malware uses GetProcAddress
to resolve URLDownloadToFileW
, which downloads winiti.exe
from a remote server to the AppData directory. It then loads shell32.dll
and executes the file using ShellExecuteW
to continue the attack.
The shellcode emulation can also be done using another tool - speakeasy developed by Mandiant.
We'll provide the same offset to speakeasy using the option -r --raw_offset 372
. This can be executed using the command below:
To summarize, we first extracted the suspicious object, namely the shellcode. Subsequently, we were able to identify and extract the Indicators of Compromise (IOCs) from this shellcode.
Q1) Locate the malicious sample starting with "a60...rtf" in the location "C:\Tools\MalDoc\Office\Demo\Samples\RemcosRAT\rtf". Perform the analysis on this sample and find out which vbs file is being downloaded in AppData. Type the file name as your answer. Answer format is b****************.vbs
Answer: beautifulldaykiss.vbs