Office Files Analysis
Last updated
Last updated
Analyzing malicious Office files is important because Office documents are common attack vectors due to their widespread use and support for macros and embedded objects. Understanding how these attacks work helps in developing effective defenses. Analysis can reveal specific techniques, tactics, and procedures (TTPs) used by threat actors, aiding in attribution and understanding the threat landscape.
Malicious Office documents, such as Word or Excel files, are commonly used by attackers to deliver malware. These files may contain malicious macros, embedded objects, or exploit vulnerabilities to execute code, often as part of phishing campaigns. Analyzing them requires a structured approach to uncover the following techniques:
Macro-Based Attacks: Malicious macros run when enabled by the user, often downloading further payloads.
Embedded Objects: Objects like OLE or ActiveX controls may execute code without user awareness.
Exploiting Vulnerabilities: Crafted documents can exploit flaws in Office applications, such as buffer overflows.
Phishing & Social Engineering: Documents are used to deceive users into enabling macros or clicking malicious links.
Macros in Microsoft Office automate repetitive tasks using commands written in Visual Basic for Applications (VBA), a Microsoft-supported language across all Office products.
Office Open XML (OOXML) files such as .docx
, .xlsx
, and .pptx
cannot store macros by default. Only specific file formats can contain VBA macros, such as:
Word
: .docm, .dotm
Excel
: .xlsm, .xltm
PowerPoint
: .pptm, .potm
These file formats end with an 'm' to indicate the presence of macros, which may contain executable code. Users can rename the extension, but if macros are present, a security warning will state: "Macros have been disabled."
Office documents, like PDFs, have their own scripting language—in this case, VBA (Visual Basic for Applications). VBA macros are powerful and can directly call Windows APIs, enabling actions like malware download and code execution. Attackers commonly use macros to:
Modify Files: Change or delete system files.
Execute Code: Run malicious scripts or binaries.
Deliver Payloads: Fetch and launch malware from remote sources.
Analyzing macros is essential, as they are a common attack vector.
The easiest way to detect the presence of macros inside an Office file is by using the oleid.py
Python utility followed by the document for analysis.
Office documents can be saved in various formats, with the most common being:
The Python script oledir
helps in showing the layout of an OLE file.
Despite the lack of macro support, RTF files can still be used in attacks through embedded objects (such as OLE1 objects), binary contents, or exploits targeting vulnerabilities in RTF parsers.
For detailed analysis, we will use the rtfdump.py
Python utility, which can be downloaded from the official GitHub repository. This utility can be executed inside the target (VM) at the following path:
Q1) Run olevba.py
with -a
option on the file "C:\Tools\MalDoc\Office\Demo\Samples\QuasarRAT\QuasarRAT.docx". This will show a list of suspicious keywords. Figure out the keyword that downloads files from the Internet. Type the keyword as your answer. Answer Format is m********.*******
Answer: microsoft.xmlhttp
Let's start with the MS Office document format first. To get started, let's review the different file types that we know.
doc
Microsoft Word document before Word 2007
docm
Microsoft Word macro-enabled document
docx
Microsoft Word document (Open XML format, Latest)
dot/dotx/dotm
Word template files.
Initially, when we don't know about a file type, we can extract some basic information about the sample using trid.exe
. This will provide us with the information related to what kind of sample we're dealing with.
The output indicates that it is a DOC file and also contain an OLE object. We can use olemeta.py
, which is a script to parse OLE files such as MS Office documents (e.g., Word, Excel). This script extracts all standard properties present in the OLE file.
To get the timestamp information, we can use the oletimes.py
Python script
Next, we can use oleid.py
to get more information related to the sample.
We can see there are VBA macros
present. Let us check this using the olevba
utility. This script is used to open a MS Office file, detect if it contains VBA macros, and extract and analyze the VBA source code from your own Python applications.
Q1) Use olemeta.py
to analyse the document properties. Find out who is the author of this document, and type the name of author as your answer.
Answer: Mohammed Alkuwari
In this section, we'll analyze another sample, which is little more complicated and a heavily obfuscated malicious document that drops QuasarRAT
malware on the system. We'll take a sample renamed as QuasarRAT.docx, which is tagged under the malware family (signature) of QuasarRAT
, xRAT
. The details related to this sample are as follows:
Next, we can run the olevba
Python utility to extract more details related to the macro in the document.
We can see the use of the AutoExec
function to trigger code execution when a user opens the document.
Despite obfuscation, the script's use of VBA functions like CreateObject
, Open
, Write
, and SaveToFile
reveals its role as a dropper. It downloads a QuasarRAT payload from an external source, writes it to disk, and executes it—demonstrating typical dropper behavior used to deploy additional malware.
Q1) When you extract VBA Macro code of this sample using olevba.py, there is a call to MsgBox. What is the content of this MsgBox function? Type it as your answer.
Answer: Open this Transaction Recipt Again!
Adversaries have exploited remote code execution vulnerabilities in Office documents, such as CVE-2021-40444
, which leveraged a malicious ActiveX control in MSHTML to deliver Cobalt Strike Beacon loaders linked to ransomware campaigns. This section explores such malicious Office document tactics.
Microsoft states that files from external sources are usually tagged with a Mark of the Web (MoTW
), which triggers Protected View and requires user action to enable active content. However, this document bypasses that protection and executes its payload automatically upon opening, without MoTW or user interaction.
This vulnerability is triggered simply by opening a document—no user interaction, such as clicking 'Enable Content', is required. It can also impact other MSHTML-based applications like Skype, Outlook, and Visual Studio.
Let's begin our analysis by examining the App-description.docx
document and scrutinizing the output from oleid.py
.
As suggested in the above output in the screenshot, we can use oleobj
to obtain the external relationship directly as shown below:
It is really good to get the external relationship and details of the suspicious URL
directly in no time by using oleobj.py
. However, we should also be aware of the whole process, such as where the relationship is stored and how to extract it using some more useful tools and scripts.
Zipdump has an option to dump all content of the file using the --dumpall
parameter. This is really important as we can search through it.
The content reveals a wealth of information. To identify specific patterns, we'll use the re-search.py
script, which applies regular expressions to search files. It supports both custom regex and predefined patterns from a built-in library, specified using the regex
argument.
At the end of the output, there's a match for an external URL that is suspicious.
We can also perform a Yara search in the whole document. YARA, which stands for "Yet Another Recursive Acronym," is an open-source pattern-matching Swiss army knife that identifies patterns within files, making it a powerful tool for malware detection.
Zipdump supports the functionality to perform searches using YARA rules with files, directories, and direct strings as well. We'll use the YARA string search option to search for this domain using --yara "#s#pawevi.com"
. This should tell us which file contains this suspicious string.
The output shows that this string is present in the relationships file with index 18. Let's open this index 18 relationship file using --select 18
along with the --dumpall
or -d
option to show the dump file content.
Q1) Locate the sample "C:\Tools\MalDoc\Office\Demo\Samples\SnakeKeylogger\PO026037.docx" and investigate relationships with external links. Type the external link as your answer. Answer format is an HTTP URL.
Answer: http://gurl.pro/u8-drp