Checking integrity of transferred files

Updated:

Fix a mistake with a path in the command. I am glad I suddenly decided to check this article, lol.

Two things inspired this very short blog post. One, reading a very strange advice in an article a while ago about hashes and the fact that I needed to use hashes to check my files yesterday. I will tell you why I've still been using MD5.

However, don't expect something too technical. I won't go deep. This is just that I can point people to it when they need help with checksums and don't know which one to use to check if their files are corrupted.

What's MD5? What's SHA? What's even hash?

Okay, maybe let's start with some explanations.

Hash

From TechTerms.com:
A hash is a function that converts one value to another. Hashing data is a common practice in computer science and is used for several different purposes. Examples include cryptography, compression, checksum generation, and data indexing.

MD5

This time Wikipedia is helpful:

The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991.

MD5 can be used as a checksum to verify data integrity against unintentional corruption. Historically it was widely used as a cryptographic hash function; however it has been found to suffer from extensive vulnerabilities. It remains suitable for other non-cryptographic purposes, for example for determining the partition for a particular key in a partitioned database, and may be preferred due to lower computational requirements than more recent Secure Hash Algorithms.

SHA

We go with Wikipedia again:
The Secure Hash Algorithms are a family of cryptographic hash functions published by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS), including: SHA-0, SHA-1, SHA-2, SHA-3

Use MD5. (For things where it's good enough)

The article in question was suggesting to use SHA-256 for hashes to check integrity of files. Yes, you can do that. The problem is, however, that it's quite unncessary.>

I think most people have heard by now that MD5 is broken. It's not safe security-wise. Hopefully most websites don't use MD5 hashes for their password, because bad things can happen. One should use other algorithms and with salt. But let's say we don't store people's passwords. We aren't trying to encrypt some super secret so Eve doesn't hear about it. Can't we use MD5?

We certainly can. MD5 is still good enough in most cases when all you want to do is to check that the file you just transferred from your server came in one piece. In fact, I'd argue that if we are sending several big files, MD5 would be better than SHA. It's much faster. You can save time and resources by using MD5 over SHA algorithms.

So why is SHA slower

Short answer: Because it's more secure. SHA is more computationally expensive and deals with things like collisions. MD5 is much easier to reverse. If you need anything secure, I'd stay away from MD5. But it's fine to use MD5 when you just want to check if file transfer was successful.

Okay, but how do you use MD5

Oh, it couldn't be simpler. Let's say we have a directory full of files. Move to the directory.
md5sum ./* > /path/to/output/file
Let it run, boom. You got your hashes.

Now let's go to the directory where we transferred all these files. We need to check all their hashes against the output file. Easy.
md5sum --check /path/to/output/file

That's pretty much it. I would also suggest using --quiet so it only prints when it's not a match.

Note: Make sure you are in the right directory so the paths in the output file match those that md5sum uses while checking, otherwise you'll run into an error. Speaking from yesterday's experience when I didn't pay attention what my working directory was.

19/07/23
see you, space cowboy