Threatbear Logo
engineering

Using Amazon Athena to check if a password has been pawned

Author

Hilton D

Date Published

Ever wonder if a password you use has been used before or more importantly whether this password is widely known to hackers?

Troy Hunt runs an excellent site called “Have I Been Pwned” that allows one to check if their account details have been compromised as a result of a data breach. This works well if you want to check a dozen or so accounts but what if you want to check a couple thousand or million passwords?

Amazon Athena is a service that allows you to upload a data set to Amazon S3 and then query that data using SQL queries. The best bit is that you only pay for the data crawled by the query — cost effective big data analysis!

Objective

Check a list of 4000 passwords (or hashes of passwords) against the “Have I Been Pawned” password list using Amazon Athena

Method overview

Download the password hash list

Some simple data wrangling needs to be performed to transform the list into a format that Athena can query. G-Zipping the list will also save you on storage costs, perhaps at the expense of increased query duration.

Upload the list to a S3 bucket

Create the Athena HIBP Database


1CREATE EXTERNAL TABLE `pwndpasswords`(
2 `hash` string,
3 `numoccur` int)
4ROW FORMAT DELIMITED
5 FIELDS TERMINATED BY ':'
6 MAP KEYS TERMINATED BY 'undefined'
7WITH SERDEPROPERTIES (
8 'collection.delim'='undefined')
9STORED AS INPUTFORMAT
10 'org.apache.hadoop.mapred.TextInputFormat'
11OUTPUTFORMAT
12 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
13LOCATION
14 's3://qwerty.slicehost.com/'
15TBLPROPERTIES (
16 'has_encrypted_data'='false',
17 'transient_lastDdlTime'='1528802289')

Now depending on how many passwords you want to check you can either just query the database directly or create another database with the passwords you want to check and use a SQL INNER JOIN to obtain the matches:


1SELECT my.hash,
2my.password
3FROM leaked
4INNER JOIN pwndpasswords
5ON my.hash = pwndpasswords.hash

Result

After a few minutes you will get a list (or .csv file) of hashes that match between these two databases which you can use to take remedial action such as changing passwords or — heaven forbid — notifying customers that they should change their passwords.

While this method probably is slower than other methods (I surmise that using GPU accelerated Hashcat would be quicker) it is without question simple and efficient in the sense that there is no infrastructure to maintain or drivers to update.

Another benefit is that you can run multiple queries concurrently (I think Athena limits an account to twenty simultaneous queries by default) which means that multiple analysts can be doing investigations simultaneously!