Optimizing WSL2 for AI Developers: A Step-by-Step Cleanup Guide

Hello everyone! In this post, I’ll be sharing my tips on efficiently managing your laptop while working with open-source LLM models.

Today, many developers are eager to delve into the exciting world of Generative AI and Machine Learning. As they embark on their learning journey, it’s crucial to install numerous open-source models on their machines when exploring LLM or machine learning models.

Absolutely, I followed the same path by installing numerous open-source models on my computer. One day, I noticed that my system was running slowly, and upon investigation, I discovered that my storage was almost full, with only 54.7GB free out of 475GB. It wasn’t a RAM issue, as I have 16GB of RAM in use.

I went through all my folders, deleted some unnecessary ones, and checked the storage again. Surprisingly, it freed up only a few gigabytes of space. That’s when I realized there was a significant chunk of memory being consumed elsewhere in the system. It turned out to be WSL2, which I use for development and was occupying a whopping 80% of my storage.

Within WSL2, I’ve installed numerous LLM models, leading to 80% of my storage being occupied. To address this, I’ve implemented the steps below to clean up and optimize my storage.

Step 1: Locating files exceeding 100MB in size.

sudo find / -type f -size +100M

With the command above, I gathered information about open-source models, cache files, and other large files, mainly from Hugging Face, Whisper, and Bark models. My initial focus is on open-source models and cache files, so the next step is to check the sizes of these files.

/home/karthick/.cache/whisper/base.pt
/home/karthick/.cache/whisper/small.pt
/home/karthick/.cache/whisper/large-v2.pt
/home/karthick/.cache/whisper/small.en.pt
/home/karthick/.cache/huggingface/hub/models--openai--whisper-base.en/blobs/d4dd5542fd6a1d35639e21384238f3bfe6c557c849d392b5905d33ee29e71db5
/home/karthick/.cache/huggingface/hub/models--facebook--mask2former-swin-large-ade-semantic/blobs/b143c144341c15b4f20165cc6d2c9305fb1b66792f68a6e0e06d2b20dc063b14
/home/karthick/.cache/huggingface/hub/tmp4bk4gnlj
/home/karthick/.cache/huggingface/hub/models--distilbert-base-cased-distilled-squad/blobs/f198de8ef6e40aeccd6eaa86e34dde73c3bb4bf0e54003cd182a18c29a1811db
/home/karthick/.cache/huggingface/hub/models--suno--bark/blobs/ccdedd35373bc3a16845f1f1452c5c96926f5cbccab01e824f7f15add2c16a35
/home/karthick/.cache/huggingface/hub/models--timm--resnet101.a1h_in1k/blobs/1a076c0384295e7d3704f201c5711ff245507db934230bab72ff3014a95f90fe
/home/karthick/.cache/huggingface/hub/models--openai--whisper-small.en/blobs/6014ac49b506df900f66f4aca6b0801eed7245594ace97bcaf73e0ae5b863066
/home/karthick/.cache/huggingface/hub/tmpjk4u9lgk
/home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-50-panoptic/blobs/3f8024c4744402adf5ebba3be495c91e5af388d73fb32f1e0a93c053767abeca
/home/karthick/.cache/huggingface/hub/models--facebook--wav2vec2-base-960h/blobs/8aa76ab2243c81747a1f832954586bc566090c83a0ac167df6f31f0fa917d74a
/home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-50/blobs/9400d5a6a433c73bb3440f42daab69a7b728b4bce0922904ac4779cb04e08989
/home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-101/blobs/86cec5ddf787238c249f1eee18c67d403ae04bf74b062766adfcc075ae467809
/home/karthick/.cache/huggingface/hub/tmpujx6_0de
/home/karthick/.cache/huggingface/hub/models--hustvl--yolos-small/blobs/a5fbcdd7e5612fa92f586648762ccc4ac29cd121fa23eadd7f6289918f8a8bb0
/home/karthick/.cache/huggingface/hub/models--abhishek--llama-2-7b-hf-small-shards/blobs/7c7c9ffba4d673dfa401147b4377469dcbc687d7ed06196752c202229fce9f0b
/home/karthick/.cache/huggingface/hub/models--dslim--bert-base-NER/blobs/b04492186cfb45a64908487a17a9f8d6ddec3a403ef39db5bca688f0fa702a34
/home/karthick/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2/pytorch_model.bin
/home/karthick/.cache/torch/hub/checkpoints/resnet101_a1h-36d3f2aa.pth
/home/karthick/projects/Llama-2-Open-Source-LLM-CPU-Inference/models/llama-2-7b-chat.ggmlv3.q8_0.bin
/home/karthick/projects/bark_tutorial/bark/text_2.pt
/home/karthick/projects/bark_tutorial/bark/fine.pt
/home/karthick/projects/bark_tutorial/bark/coarse.pt
  /home/karthick/projects/bark_tutorial/bark/text.pt
  /home/karthick/projects/bark_tutorial/bark/coarse_2.pt
/home/karthick/projects/bark_tutorial/bark/pytorch_model.bin
/home/karthick/projects/bark_tutorial/bark/fine_2.pt

Step 2: Review the size of each file individually.

du -ah /file-path

I used the command mentioned above to check the file sizes, and the results were astonishing. There were some very large models present, totaling around 50GB of open-source model files.

6.7G    /home/karthick/projects/Llama-2-Open-Source-LLM-CPU-Inference/models/llama-2-7b-chat.ggmlv3.q8_0.bin
5.0G    /home/karthick/projects/bark_tutorial/bark/text_2.pt
5.0G    /home/karthick/.cache/huggingface/hub/models--suno--bark/blobs/ccdedd35373bc3a16845f1f1452c5c96926f5cbccab01e824f7f15add2c16a35
4.2G    /home/karthick/projects/bark_tutorial/bark/pytorch_model.bin
3.7G    /home/karthick/projects/bark_tutorial/bark/coarse_2.pt
3.5G    /home/karthick/projects/bark_tutorial/bark/fine_2.pt
2.9G    /home/karthick/.cache/whisper/large-v2.pt
2.8G    /home/karthick/.cache/huggingface/hub/models--abhishek--llama-2-7b-hf-small-shards/blobs/7c7c9ffba4d673dfa401147b4377469dcbc687d7ed06196752c202229fce9f0b
2.2G    /home/karthick/projects/bark_tutorial/bark/text.pt
1.7G    /home/karthick/.cache/huggingface/hub/tmpjk4u9lgk
1.3G    /home/karthick/.cache/huggingface/hub/tmpujx6_0de
1.2G    /home/karthick/projects/bark_tutorial/bark/coarse.pt
1.1G    /home/karthick/projects/bark_tutorial/bark/fine.pt
923M    /home/karthick/.cache/huggingface/hub/models--openai--whisper-small.en/blobs/6014ac49b506df900f66f4aca6b0801eed7245594ace97bcaf73e0ae5b863066
826M    /home/karthick/.cache/huggingface/hub/models--facebook--mask2former-swin-large-ade-semantic/blobs/b143c144341c15b4f20165cc6d2c9305fb1b66792f68a6e0e06d2b20dc063b14
462M    /home/karthick/.cache/whisper/small.pt
462M    /home/karthick/.cache/whisper/small.en.pt
418M    /home/karthick/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2/pytorch_model.bin
414M    /home/karthick/.cache/huggingface/hub/models--dslim--bert-base-NER/blobs/b04492186cfb45a64908487a17a9f8d6ddec3a403ef39db5bca688f0fa702a34
361M    /home/karthick/.cache/huggingface/hub/models--facebook--wav2vec2-base-960h/blobs/8aa76ab2243c81747a1f832954586bc566090c83a0ac167df6f31f0fa917d74a
277M    /home/karthick/.cache/huggingface/hub/models--openai--whisper-base.en/blobs/d4dd5542fd6a1d35639e21384238f3bfe6c557c849d392b5905d33ee29e71db5
250M    /home/karthick/.cache/huggingface/hub/tmp4bk4gnlj
249M    /home/karthick/.cache/huggingface/hub/models--distilbert-base-cased-distilled-squad/blobs/f198de8ef6e40aeccd6eaa86e34dde73c3bb4bf0e54003cd182a18c29a1811db
232M    /home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-101/blobs/86cec5ddf787238c249f1eee18c67d403ae04bf74b062766adfcc075ae467809
171M    /home/karthick/.cache/torch/hub/checkpoints/resnet101_a1h-36d3f2aa.pth
171M    /home/karthick/.cache/huggingface/hub/models--timm--resnet101.a1h_in1k/blobs/1a076c0384295e7d3704f201c5711ff245507db934230bab72ff3014a95f90fe
165M    /home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-50-panoptic/blobs/3f8024c4744402adf5ebba3be495c91e5af388d73fb32f1e0a93c053767abeca
160M    /home/karthick/.cache/huggingface/hub/models--facebook--detr-resnet-50/blobs/9400d5a6a433c73bb3440f42daab69a7b728b4bce0922904ac4779cb04e08989
118M    /home/karthick/.cache/huggingface/hub/models--hustvl--yolos-small/blobs/a5fbcdd7e5612fa92f586648762ccc4ac29cd121fa23eadd7f6289918f8a8bb0
139M    /home/karthick/.cache/whisper/base.pt

Step 3: Eliminate unnecessary open-source models.

import os

# Array of file paths to be deleted
files_to_delete = ['/home/karthick/.cache/whisper/base.pt', '/home/karthick/.cache/whisper/small.pt', '/home/karthick/projects/bark_tutorial/bark/fine.pt', ...]

# Iterate through the array and delete each file
for file_path in files_to_delete:
    if os.path.exists(file_path):
        os.remove(file_path)
        print(f"File {file_path} has been deleted.")
    else:
        print(f"The file {file_path} does not exist.")

I’m utilizing the Python script mentioned above to eliminate unnecessary open-source models and cache files.

Conclusion:

By following the steps outlined above, I successfully restored my storage. When diving into AI models, it’s crucial to be mindful of your system storage to prevent slowdowns. Whether you choose to keep large models locally or remove them depends on your project and learning requirements — it’s all about adapting to your needs.

Optimizing WSL2 for AI Developers: A Step-by-Step Cleanup Guide

Step 1: Locating files exceeding 100MB in size.

Step 2: Review the size of each file individually.

Step 3: Eliminate unnecessary open-source models.

Conclusion:

Comments (0)

Leave a Comment