Protect Your Users with HMAC Login Authentication

Thomas Randolph

2020-03-12 19:45

Logins Send Passwords

"Huuu durrr" you say, "that's the point of a login."

I disagree!

The point of a login is to authenticate a person using the application. Sending their password to the backend is not necessarily part of that process.

There are essentially four vulnerable parts of an authentication system:

Creating the account (all the sensitive data can be intercepted)
Storing the credentials (they can be stolen)
Logging in (it can be replayed or stolen by a man-in-the-middle)
The human (they can re-use their password other places, etc.)

Of these, we can do pretty well in protecting against #2 and #3. We can't do much against #1 without drastically changing the typical sign-in flow. We also can't do much about most of the human factors that are out there, but we can protect our users against their password being stolen from us and stuffed into other websites, if they re-use passwords.

Of course, ideally, our database would never be stolen and our users would never have their networks compromised (leading to their credentials being stolen). "Ideally" is the wrong way to build an authentication system, though, so we're going to look at a way to deal with the non-ideal.

Granted, we will be using the most common method of sign-in (username plus password) and the most common method of storage (username plus hash). This is intentional.
We can get significant security gains by changing some of the magic between the user entering their information and the API authenticating them without making significant architectural changes.

Of course, you can make significant architectural changes for even better security, but that's a topic for another time.

HMAC is Better

"But how?"

Thanks for asking.

HMAC stands for Hash-based Message Authentication Code.

The basic premise is this: If I (the API) know a secret key that the user can continually regenerate using a secret passphrase, then we can both hash the same message and come up with the same result.
If we both come up with the same result without sharing the secret key (after the first time, more on that later), then I (the API) can theoretically assume that the user is the same person who originally created the account, because they've successfully generated the same secret key.

Once more, this time with (paid) actors.

Alice created an account on our website.
Our API - Bianca - has the secret key from when Alice created her account: EGG.
Alice knows the secret passphrase that will recreate the secret key: PAN.
When Alice visits the login form, she enters her username and her secret passphrase: PAN.
In her browser, our application uses a hash to recreate EGG. It then uses EGG to hash some data, like her username, into a third secret: SCRAMBLE.
The browser then sends Bianca the two pieces of data: the username and SCRAMBLE.
Bianca knows the same secret key already, so she does the same computation: EGG plus Alice's username results in SCRAMBLE. Since Bianca has come up with the same result as Alice asserted, Bianca assumes that Alice is who she says she is, and logs Alice in.

For clarity (and the more visual among us), here's a sequence diagram that might help.

sequenceDiagram autonumber participant FC as Frontend Crypto participant A as Alice participant B as Bianca participant BC as Backend Crypto Note over A: Alice enters her
username (alice)
and passphrase
(PAN) into the site
login form. Note over A: Alice clicks the
Login submit
button rect rgba( 0, 0, 255, 0.3 ) A ->> FC: "alice", "PAN" Note over FC: hash(
"alice" + "PAN"
) = "EGG" FC ->> A: "EGG" rect rgba( 255, 0, 0, 0.25 ) A ->> FC: "EGG", "alice" Note over FC: hmac(
"EGG", "alice"
) = "SCRAMBLE" FC ->> A: "SCRAMBLE" end end rect rgba( 0, 255, 0, 0.25 ) Note over A: message = {
"username": "alice",
"hmac": "SCRAMBLE"
} A -->> B: message end Note over A,B: Alice's secret passphrase, "PAN"
never goes across the network! Note over B: Bianca already
knows the secret:
"EGG" from when
Alice signed up... rect rgba( 0, 0, 255, 0.3 ) rect rgba( 255, 0, 0, 0.25 ) B ->> BC: "EGG", "alice" Note over BC: hmac(
"EGG", "alice"
) = "SCRAMBLE" BC ->> B: "SCRAMBLE" end end Note over B: hmac "SCRAMBLE"
equals hmac in
message, therefore
username ("alice")
in message is
authenticated B -->> A: "alice" authenticated!

So the four parts of HMAC:

Hash-based: Alice and Bianca both perform a hash of some data (numbers 3 & 6 in the diagram)
Message: Alice (via our login form) sends a message asserting who she is to Bianca (number 5 in the diagram)
Authentication: Since the passphrase and key are not transferred, Bianca believes the authenticity of Alice's message if both Alice and Bianca come up with the same hash (number 8 in the diagram)
Code: SCRAMBLE is the code both Alice and Bianca generate. (numbers 4 & 7 in the diagram)

Interceptions: 0

Note that only Alice knows the secret passphrase PAN. It never leaves her browser.
This is how we protect Alice from having her password stolen by man-in-the-middle attacks. Even if she's on a compromised network, Alice's password never leaves her browser. If we wanted the additional security of a salt, we could change our signup and login flows to first submit the username, which would return the salt, and then the signup or login would proceed normally but using the salt to enhance the entropy of whatever passphrase is used.

We want to leave the login flow alone as much as possible, though, so we'll use Alice's username as her salt. It's not a perfect salt, but it's better than nothing. If Alice's password is password (shame on you, Alice), and her username is alice1987, at least our combined secret passphrase will be alice1987password (or passwordalice1987) instead of just password. It's an improvement, even if minimal.

Message Authentication 😍

We have one more thing we can do to protect Alice.

If someone is snooping on Alice's network while she's logging in - maybe from unsecured coffee shop wifi - they could grab her login request (remember, it's alice1987 and the authentication code SCRAMBLE).
Once someone has that info they could do anything with it. They could try to reverse the hash algorithm to figure out her secret passphrase, or they could re-use the same request to make Bianca think that Alice is trying to log in again.

The only thing we can do about the former is regularly audit our hash algorithms to make sure we're using best-in-class options so that figuring out the secret phrase will take longer than it's worth.
We can - however - protect against the latter attack, called a replay attack.

To protect against replay attacks, all we have to do is add a timestamp to the message! When Bianca checks the authentication request, she'll also check that the timestamp isn't too old - maybe a maximum of 30 seconds ago. If the request is too old, we'll consider that a replay attack and just deny the login.

The beauty of message authentication is that as long as our front end sends all of the information and it includes that information in the HMAC (SCRAMBLE), the API (Bianca) can always validate the information.

Show Me How Now, Brown Cow

The amazing thing about this is it just uses off-the-shelf tools baked into virtually every programming language. Even web browsers provide (some of) these tools by default as far back as IE11!
Sadly, one crucial piece - the TextEncoder - is not supported in Internet Explorer at all, so if you need to support IE, some polyfilling is necessary. Fortunately, everything here is polyfillable!

To get us started, we'll build a couple of helper functions to do basic cryptographic work.

function hex( buffer ){
	var hexCodes = [];
	var view = new DataView( buffer );

	for( let i = 0; i < view.byteLength; i += 4 ){
		hexCodes.push( `00000000${view.getUint32( i ).toString( 16 )}`.slice( -8 ) );
	}

	return hexCodes.join( "" );
}

export function hash( str, algo = "SHA-512" ){
	var buffer = new TextEncoder( "utf-8" ).encode( str );

	return crypto
		.subtle
		.digest( algo, buffer )
		.then( hex );
}

These two functions take a stream of data and convert it to hexadecimal (hex) and take a string and a hashing algorithm to create a hash (hash).
Note that SubtleCrypto only supports a small subset of the possible algorithms for digesting strings. For our use-case, however, SHA-512 is great, so we'll default to that.

Note that hex is synchronous, but hash returns a Promise that eventually resolves to the hexadecimal value.

Next up, we need a way to generate a special key that will be used to create the authentication code.

function makeKey( secret, algo = "SHA-512", usages = [ "sign" ] ){
	var buffer = new TextEncoder( "utf-8" ).encode( secret );

	return crypto
		.subtle
		.importKey(
			"raw",
			buffer,
			{
				"name": "HMAC",
				"hash": { "name": algo }
			},
			false,
			usages
		);
}

Here, makeKey takes Alice's secret passphrase, some algorithm (SHA-512 again, in our case), and a list of ways the key is allowed to be used.
For our intent, we only need to be able to sign messages with this key, so our usages can just stay [ "sign" ].

What we get back is a Promise that eventually resolves to a CryptoKey.

Finally we need a way to generate signed HMAC messages.

export async function hmac( secret, message, algo = "SHA-512" ){
	var buffer = new TextEncoder( "utf-8" ).encode( message );
	var key = await makeKey( secret, algo );

	return crypto
		.subtle
		.sign(
			"HMAC",
			key,
			buffer
		)
		.then( hex );
}

Here, hmac takes Alice's secret passphrase, and our message that we want to authenticate (plus our hash algorithm).
What we get back is a Promise that eventually resolves to a long hexadecimal string. That hex string is the authentication code!

When our API (Bianca) and our front end agree on the mechanism for creating messages to be authenticated, as long as the result from hmac on both sides agrees, the message has been authenticated!

Sign Up & Log In

So let's say all that code above is off in a file called Crypto.js.
Here's a file with two functions in it that create a new account for a user, and log a user in.

import { hash, hmac } from "./Crypto.js";

async function createAccount( username, passphrase ){
	var secret = await hash( `${username}${passphrase}` );

	return fetch( "/signup", {
		"method": "POST",
		"body": JSON.stringify( {
			secret,
			username
		} );
	} )
}

async function login( username, passphrase ){
	var now = new Date();
	var secret = await hash( `${username}${passphrase}` );
	var authenticationCode = await hmac( secret, `${username}${now.valueOf()}` );

	return fetch( "/login", {
		"method": "POST",
		"body": JSON.stringify( {
			"hmac": authenticationCode,
			"timestamp": now.toISOString(),
			username
		} )
	} )
}

Here's a sequence diagram for signing up.

sequenceDiagram autonumber participant FC as Frontend Crypto participant A as Alice participant B as Bianca Note over A: Alice enters her
username (alice)
and passphrase
(PAN) into the site
create account
form. Note over A: Alice clicks the
Account Create
button rect rgba( 0, 0, 255, 0.3 ) A ->> FC: "alice", "PAN" Note over FC: hash(
"alice" + "PAN"
) = "EGG" FC ->> A: "EGG" end rect rgba( 0, 255, 0, 0.25 ) Note over A: signup = {
"username": "alice",
"secret": "EGG"
} A -->> B: signup end Note over B: Bianca stores "EGG"
in the table
"authentications"
for the username
"alice" B -->> A: "alice" account created!

The sequence diagram for logging in is identical to the one at the beginning of this post.

What About The API?

Good question.
It does the exact same work you've always done, just using HMAC now.

In the case of account creation, nothing has changed. It just so happens the secret we receive from the user is a bit longer (and more random) than usual. Store it safely in your authentications database table associated to the username.

For logins, your API will be doing a bit more work.
Here's a bit of rough psuedo-code (unlikely to run, never tested) for a node login back end.

import { createHmac } from "crypto";

function login( request, response ){
	var now = Date.now();
	var oldest = now - 30000;
	// fromISO from your favorite DateTime library
	var timestamp = fromISO( request.body.timestamp ).valueOf();
	var { hmac, username } = request.body;
	// db is a Database abstraction layer here, use your favorite!
	var secretKey = db.Authentications.getSecretKeyForUser( username );

	var authenticator = createHmac( "sha512", secretKey );

	// Note the parity here with the front end, we are hashing the same message
	authenticator.update( `${username}${timestamp}` );

	// If the one we created matches the one from the front end, we're authenticated
	var isAuthenticated = authenticator.digest( "hex" ) == hmac;

	// Except.......

	if( oldest >= timestamp ){
		// HEY THIS LOOKS LIKE A REPLAY ATTACK!
	}
	else{
		return isAuthenticated;
	}
}

I Zoned Out When I Saw `crypto`, How Is This Better?

Thanks for asking.
This is better than the standard authentication process for a number of reasons:

Better protection against password reuse.
You can't protect your user from other websites stealing their password and stuffing it into your website (other than disallowing known-breached passwords, a topic for another time), but you can add a layer of protection.
By only storing a hashed value based on Alice's secret passphrase, you reduce the potential of an attacker being able to steal your data and use it elsewhere.
This is effectively the exact same policy as never storing plaintext passwords, except we're doing the hashing on Alice's computer, which means...
Better protection against man-in-the-middle attacks.
Granted, you should be running your entire website over SSL with HSTS, but just in case you're not, or just in case someone manages to perform a complicated SSL downgrade attack (only possible without HSTS!), HMAC further protects the all-important passphrase.
Since the only thing exiting Alice's web browser under most circumstances is the Authentication Code, it's a lot (years, maybe) of work to bruteforce that hash and extract the secret key.
Then, that key is only useful on your website, so it's even more work to bruteforce the key and extract the original passphrase. Generally speaking, this kind of effort is not worth it to the average website attacker.
If you run a website with government, banking, or sensitive personal secrets, I assume this is all old news to you and you're probably doing something even better... right?
Better protection against replay attacks.
This is pretty straight-forward. Because we're mixing a timestamp into the message to be authenticated, an attacker has only one option to be able to pretend they're Alice after intercepting her login request: bruteforce the hash to extract the secret key, then update the timestamp and regenerate the message authentication code.
Again, this is generally more effort than it's worth.

Is It All Rainbows?

Of course not.
Without a doubt, this is better than the standard login form that sends a username and password to an API.
However, it does still have some drawbacks.

Like almost every authentication system, it relies on strong cryptographic hashing.
Because it uses lots of hashes, it's a bit more secure because there's more bruteforcing to do, but hashes are fundamentally based on the idea that it takes a really long time to break them, not on the idea that they're infallible.
On that note, the hashing and key derivation here is computationally expensive.
Of course, this is inherent in the idea of using hashes; they are fundamentally intended to be expensive because if they were cheap and fast they would be easy to break. However, for users that are underprivileged (like in developing countries, for example) or otherwise running under-powered hardware, HMAC may cause non-trivial delays during login.
For modern, full-powered hardware, it's likely your login flow will slow down by a few milliseconds, if it slows down noticeably at all. For under-powered devices, your users could see drastic slowdowns into the one second or worse range.
Above all: know your audience. If your audience is entirely on under-powered devices, the resulting slowdown during login could be a deal-breaker. However, keep in mind that there is always a trade off between convenience and security. Perhaps the non-trivial processing time for your users will eventually be balanced out by your data being particularly secure if you ever get breached.
The secret is still sent across the internet.
Once Alice's computer has hashed together her username and passphrase during signup it takes the secret key and... sends it across the internet.
This is the only time that happens, but it only takes one snooping man-in-the-middle to ruin everything. There are ways to get around this problem, but they involve a very different signup architecture.
In an effort to make this as usable as possible for people with boring old username/password signup flows, this will have to do; we need to get that secret key to the back end somehow.
There's nothing you can do about the human element without significantly changing your signup and login flow.
Guess how hard it would be for an attacker to guess a username bill1968 and password billsmith1968.
It's likely that you should be designing systems that detect and ban or discourage this kind of flawed input on signup, but ultimately humans are the weakest part of any security system.

So Should I Do It or Nah?

Yeah, you should do it.

It takes maybe an hour to set this stuff up, and the security gains are worth far, far more than an hour of your time.

Happy Hashing!

Thanks!

This post could not be what it is without help from the following folks:

Ian Graves: For explaining HMAC in a way that finally clicked for me so that I felt like I could go learn how to do it in a browser.
David Leger: For reviewing an early version of this post (that will probably never see the light of day) and reminding me to keep it simple.
Taylor McCaslin: For reminding me not to make it easy to copy/paste untested code, and for lots of insight into the complexities of client-side hashing and key derivation performance.
Chris Moberly: For a thorough review and for reminding me to include a note about HSTS and HTTPS/SSL.
Jeremy Matos: For reminding me that visual things are easier to understand, and prompting me to include the sequence diagrams.