Skip to content
πŸŽ‰ Welcome! Threatbear can now offer managed detection and response services for 24x7x365 coverage!
Using Amazon Athena to check if a password has been pawned

Using Amazon Athena to check if a password has been pawned

July 23, 2018

Ever wonder if a password you use has been used before or more importantly whether this password is widely known to hackers?

Troy Hunt runs an excellent site called β€œHave I Been Pwned” that allows one to check if their account details have been compromised as a result of a data breach. This works well if you want to check a dozen or so accounts but what if you want to check a couple thousand or million passwords?

Amazon Athena is a service that allows you to upload a data set to Amazon S3 and then query that data using SQL queries. The best bit is that you only pay for the data crawled by the query β€” cost effective big data analysis!

Objective

Check a list of 4000 passwords (or hashes of passwords) against the β€œHave I Been Pawned” password list using Amazon Athena

Method overview

Download the password hash list

Some simple data wrangling needs to be performed to transform the list into a format that Athena can query. G-Zipping the list will also save you on storage costs, perhaps at the expense of increased query duration.

Upload the list to a S3 bucket

Create the Athena HIBP Database

CREATE EXTERNAL TABLE `pwndpasswords`(
  `hash` string, 
  `numoccur` int)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ':' 
  MAP KEYS TERMINATED BY 'undefined' 
WITH SERDEPROPERTIES ( 
  'collection.delim'='undefined') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://qwerty.slicehost.com/'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'transient_lastDdlTime'='1528802289')

Now depending on how many passwords you want to check you can either just query the database directly or create another database with the passwords you want to check and use a SQL INNER JOIN to obtain the matches:

SELECT my.hash,
my.password
FROM leaked
INNER JOIN pwndpasswords
ON my.hash = pwndpasswords.hash

Result

After a few minutes you will get a list (or .csv file) of hashes that match between these two databases which you can use to take remedial action such as changing passwords or β€” heaven forbid β€” notifying customers that they should change their passwords.

While this method probably is slower than other methods (I surmise that using GPU accelerated Hashcat would be quicker) it is without question simple and efficient in the sense that there is no infrastructure to maintain or drivers to update.

Another benefit is that you can run multiple queries concurrently (I think Athena limits an account to twenty simultaneous queries by default) which means that multiple analysts can be doing investigations simultaneously!