You have most likely been there too. The website is nothing out of the ordinary, quite a small one, but the space it occupies on the hard drive is no match to the outward size, which boosts backup costs the customer pays. The most obvious reason for the discrepancy is unused files, we thought, and so started looking for them. In this article, I tell the story of the quest for unused files in a Drupal-powered website (and not just Drupal, for that matter).
First off, how do the culprits, the unused files, appear? No mystery here, really. For example, you may have had a news section to the website. Later on, you decided to get rid of it, restructure the site, so the news section was deleted, but the pictures that made the news all the more merrier did not go away with the section. Same applies to such things as product catalogues, blog posts etc. In other words, there is no site immune to the plague of unused files.
Below you will find a sequence of simple actions that allows saying the final goodbye to unused files. ATTENTION! Backup before doing anything to the code or the files!
How do you define unused files?
- A file is considered to be unused if it is not mentioned in the DB in any way;
- there is no link or reference to the file in the code (theme, css, javascript).
Make a DB dump and put it to the site’s root folder. Create a .sh file there – name it dfindfiles.sh, for example, – and put the following code into that file:
#!/bin/sh START=./sites/default/files CURDIR=`pwd` IG_STYLES=./styles/* IG_JS=./js/* IG_CSS=./css/* dbdump=`pwd`/dumpwebsite.sql usedfile=`pwd`/output_used.txt notusedfile=`pwd`/output_notused.txt notusedfile_check=`pwd`/output_notused_check.txt cd ${START} echo "Step 1. Checking for used and unused files to database..." echo "$(date) $line" for file in `find . ! -path "$IG_JS" ! -path "$IG_CSS" ! -path "$IG_STYLES" -type f -print | cut -c 3- | sed 's/ /#}/g'` do file2=`echo $file | sed 's/#}/ /g'` file3=`basename $file2` result=`grep -c "$file3" $dbdump` if [ $result = 0 ]; then echo $file2 >> $notusedfile else echo $file2 >> $usedfile fi done cd ${CURDIR} echo "Step 2. Checking files from list not used files..." echo "$(date) $line" for p in $(cat $notusedfile); do grep -rnw --include=*.{module,inc,php,js,css,html,htm,xml} ${CURDIR} -e $p > /dev/null || echo $p >> $notusedfile_check; done echo "Files checking done." echo "Check the following text-file for results:" echo "$notusedfile_check"
The script
Following the unused files definnition, the script does two things:
- searches for any mentions of the file in the DB;
- searches for links and references to the file in the source code.
#!/bin/sh
Setting environment
START=./sites/default/files
Setting the start directory for scanning. This is where the site’s files go. By default, the path is sites/default/files. If in doubt, browse to your Drupal control panel and check what you have in Configuration – File system, Default file system path field.
CURDIR=`pwd`
Setting current directory containing the file with the code.
IG_STYLES=./styles/*
Ignoring the directory where images are generated.
IG_JS=./js/*
Ignoring the directory where javascript is generated.
IG_CSS=./css/*
Ignoring the directory where css is generated.
dbdump=`pwd`/dumpwebsite.sql
Specifying the DB dump.
usedfile=`pwd`/output_used.txt
Specifying the file that contains the list of used files.
notusedfile=`pwd`/output_notused.txt
Specifying the file that contains the list of files not found in the DB.
notusedfile_check=`pwd`/output_notused_check.txt
And here you find the names of files you can delete without any second thought.
cd ${START}
Back to the start…
echo "Step 1. Checking for used and unused files to database..."
Announcing commencement of the first step.
echo "$(date) $line"
Telling when that first step was made.
for file in `find . ! -path "$IG_JS" ! -path "$IG_CSS" ! -path "$IG_STYLES" -type f -print | cut -c 3- | sed 's/ /#}/g'` do file2=`echo $file | sed 's/#}/ /g'` file3=`basename $file2` result=`grep -c "$file3" $dbdump` if [ $result = 0 ]; then echo $file2 >> $notusedfile else echo $file2 >> $usedfile fi done
Cycle to search for files that fit the first definition. The script sets ignored directories and replaces spaces in names of files with “#}”. Inside the cycle, filenames acquire their initial appearance and are searched for in the DB dump. If there is an entry with the name, the path to that file it goes to output_used.txt, if there is none – to output_notused.txt
cd ${CURDIR}
Changing to site’s root directory.
echo "Step 2. Checking files from list not used files..."
Announcing commencement of the second step.
echo "$(date) $line"
Telling when that step was made.
for p in $(cat $notusedfile); do grep -rnw --include=*.{module,inc,php,js,css,html,htm} ${CURDIR} -e $p > /dev/null || echo $p >> $notusedfile_check; done
This is the cycle that checks for files listed in output_notused.txt. If a file is found, it goes to /dev/null, else – gets listed in output_notused_check.txt
echo "Files checking done." echo "Check the following text-file for results:"
Telling the world the search is over.
echo "$notusedfile_check"
Outputting the final results file. output_notused_check.txt now contains the list of files that you can delete. Copy it to the directory where scanning begins (sites/default/files in our example). Change to that directory and run:
xargs rm -fr < output_notused_check.txt
The files listed in output_notused_check.txt will be deleted. Once done, check if all the images you have on the website load well and nothing is missing. Next, go to the site’s root directory and delete dumpwebsite.sql, output_used.txt, output_notused.txt и output_notused_check.txt. May your quest for unused files be a successful one!