Renaming Files to Their Hashes With Bash
Author's Note 2022-12-29
As of the date of this edit, I'm using a new solution written in fish, which can be found in my dotfiles repo on GitLab. I'm going to leave this here as there's some cool bash scripting knowledge that I'll probably want in the future.
Preface
The way I organize my images is by throwing them all in a single folder, and
assigning metadata tags to them. Because I use metadata for organization, the
names aren't relevant, and I usually leave them as is. However, sometimes the
names will contain a common word when searching for other documents, or in rare
circumstances duplicate names. My solution to this is to rename every file in my
pictures folder to be the sha1sum
of its contents, which ensures the filename
is unique.
The Script
#!/usr/bin/env bash
set -euo pipefail
for i in "$1"/*; do
full_filename=$i
filename=${full_filename##*/}
no_extension=${filename%%.*}
num_chars=${#no_extension}
if [[ ( -f "$i" ) && (${num_chars} != 40) ]]; then
sum=$(shasum "$i")
echo "$i" "$1/${sum%% *}.${i##*.}"
if [[ $2 == true ]]; then
mv "$i" "$1/${sum%% *}.${i##*.}"
fi
fi
done
Usage
This script accepts two arguments, the directory to rename all the files in, and
something to determine whether to execute the mv
commands. It doesn't matter
if you include the "/" after the directory or not, Linux doesn't seem to care,
and I assume macOS won't either.
Breakdown
#!/usr/bin/env bash
set -euo pipefail
If you've seen executable scripts before, you'll recognize the first like as the shebang line, which tells the OS what program the script should run with, in this case bash, the Bourne Again SHell.
The second line enables a "strict" mode in bash. It cases bash to behave in a way that makes many subtle bugs impossible, so I would strongly recommend doing this. Here's a more complete explanation: Strict Mode
for i in "$1"/*; do
This is the start of a for loop in bash. In plain English, this is saying for
each thing in the directory the user supplied to me, do something. for i
declares the variable i
which will be used to reference what file is being
used in each iteration of the loop. "$1"
expands into the directory supplied
by the user on the command line. The /*
at the end is called a glob, and
causes the whole expression to expand into every file path inside the user
supplied directory.
full_filename=$i
filename=${full_filename##*/}
no_extension=${filename%%.*}
num_chars=${#no_extension}
This is a roundabout way to figure out the number of characters in the name of a file, ignoring the rest of the path to get to the file, as well as any extensions it may have at the end. It's done with POSIX parameter expansions, and each line is self-explanatory what it is doing based on the variable name. The reason for doing this is to know whether a file was already renamed.
A message from the future:
This is not a perfect system, as if a filename happens to contain the same
number of characters as a sha1sum
, then it won't be renamed. The new version
of this script calculates the hash no matter what, then compares it to the
current filename. While slower, it'll actually be correct, which is more
important considering this script isn't ran often.
if [[ ( -f "$i" ) && ("${#i}" == 40) ]]; then
This is a conditional statement in bash, where [[ ]]
denotes the start of a
conditional of some kind, and &&
is the and operator.
The first expression is asking whether the file path we're currently on in the loop is a file or a directory. There shouldn't ever be a directory in my pictures' folder, but just in case one sneaks in there it won't have anything done to it.
The second expression is checking whether the length of the filename string is
40 characters. This is done with a #
prefixing the variable name in an
expansion. 40 characters is used as that is how long a sha1sum
is (the default
for the shasum
command used later), as I don't want to calculate the hash if a
file has already been renamed.
sum=$(shasum "$i")
echo -- "$i" "$1/${sum%% *}.${i##*.}"
Sets the variable sum equal to the shasum
of the file we are on in the
iteration. Echo will print whatever comes after it out to the terminal, which in
this case is some absolute wizardry I stole from somebody on the internet. The
output will be the original file name, and then the location and name of the
correctly renamed file, preserving its original extension.
if [[ $2 == true ]]; then
mv "$i" "$1/${sum%% *}.${i##*.}"
fi
fi
done
This checks to see whether the second parameter passed to the script is the word true, and if so, it will execute the move action as shown from the previous echo command. The idea is to run it with something random the first time to sanity check the output, then run it with the word true to actually rename all the files.
It's worth noting this script will not recursively enter directories, and will actually ignore them for renaming entirely.