Lawtee Blog

How to Slim Down the GitHub Repository for a Hugo Blog

How to Slim Down the GitHub Repository for a Hugo Blog

My Hugo blog has been hosted on GitHub for the past year, but with the upgrade of the Hugo version, a lot of invalid fragmented files were generated during multiple refactorings. Additionally, all the image files from my blog generated a large amount of junk files during the conversion to webp, causing the GitHub repository to grow excessively, nearing the 1GB warning line. Therefore, I decided to clean up the repository today.

photo
photo

Official GitHub Repository Size Limits

It is recommended to keep repositories small, ideally under 1 GB, and strongly recommended to keep them under 5 GB. Smaller repositories clone faster and are easier to use and maintain. If your repository excessively impacts our infrastructure, you may receive an email from GitHub Support asking you to take corrective action. We strive to be flexible, especially for large projects with many collaborators, and will work with you to find a solution whenever possible. You can effectively manage the size and overall health of your repository to prevent it from impacting our infrastructure.


Standard Slimming Methods

1. Clean Up Untracked Files

First, ensure that you have cleaned up all untracked files. You can use the following commands to view and clean up untracked files:

1git clean -n  # View which files will be deleted
2git clean -f  # Delete untracked files
3git clean -fd # Delete untracked files and directories

2. Remove Unnecessary Large Files

If you have previously committed some large files to the repository, these files may occupy a significant amount of storage space. You can use git filter-repo to rewrite Git history and remove unnecessary files.

  1. Install git filter-repo:

    1pip install git-filter-repo # Requires Python to be installed first
  2. Remove large files:

    1git filter-repo --path <file-path> --invert-paths

    For example, to delete a file named largefile.zip:

    1git filter-repo --path largefile.zip --invert-paths

3. Compress the Git Repository

Git provides the git gc command to compress the repository and remove unnecessary objects.

1git gc --prune=now --aggressive

4. Further Compress the Git Repository

Git provides git repack to pack objects in the repository, which can further optimize storage and performance.

1git repack -a -d -f

5. Clean Up the Remote Repository

If you have already deleted unnecessary files and compressed the local repository, you may also need to clean up the remote repository.

  1. Force push the local repository to the remote repository:

    1git push --force
  2. Clean up remote repository references:

    1git remote prune origin

Summary

By cleaning up untracked files, removing unnecessary large files, compressing the Git repository, and cleaning up the remote repository, you can reduce the size of the Git repository to some extent.

After using the above methods, I found that the repository size reduction was still not significant, only dropping from 935MB to 880MB, which was far from the expected goal. Therefore, I had to take more drastic measures.


Non-Standard Cleanup Methods

For personal blogs, commit history is not particularly useful since it mainly consists of text and image information, and there’s rarely a need to revisit old records. Therefore, it’s sufficient to just clear the remote repository.

Delete and Recreate the Repository

Delete and recreate the repository on GitHub, then bind and upload the new repository.

This method is suitable for repositories that are not connected to other services. If the repository is connected to services like Vercel, Cloudflare, or other third-party services for deployment, it’s better not to use this method, as re-deploying can be troublesome. Instead, consider the reset method below.

Reset the Original Repository

  1. Back up the local repository, either using commands or by manually copying the necessary folders.

    1 git clone /d/hugo/user /d/hugo/user # Your file path
  2. Create a new folder somewhere, add a random file, and set it up as a Git repository.

    1 cd /d/hugo/new # Your file path
    2 git init
  3. Add the remote repository.

    1 git remote add origin https://github.com/user/user.github.io #
  4. Reset the repository.

    1 git add -A
    2 git commit -m "Initial commit"
  5. Force push to the original repository.

    1 git push -f origin master

Back to a New Repository
Back to a New Repository

Rebind and Upload to the Original Repository

Find the previously backed-up folder and repeat steps 2-5 above.

Summary

In Hugo, the public and resources folders are the most likely to generate junk files. I used to manage these by directly deleting them when debugging with Hugo commands, but these folders contain thousands of files, which messed up the Git history.

This time, I chose to create a .gitignore file in the blog’s root directory to exclude these two folders. The file content is as follows:

1 public/
2 resources/

Tips

If the resources folder is not uploaded, it may result in longer deployment times on GitHub Actions or Vercel, as image conversion needs to be done on the server. However, if there are few images, the impact is minimal.

photo
photo

#blog #hugo #github #git

Comments