Working with huge datasets, 800K+ files in Google Colab and Google Drive

1. Upload the entire folder to google drive containing the 800k+ images

2. Zip the dataset folder, Upload to GDrive and then unzip

3. Create the dataset directly on Google Colab and write the files on drive

This is damn long time for a single background image to process, each of these BG is creating 200*20 images

4. Maybe use threads ?

Hmm . . . so what worked ! ?

Create the Dataset on Google Drive, directly into a .zip/.tar file 🥳🎊

  • always work with your huge datasets in batches !
  • save your work in google drive periodically, use .zip files if you work with huge datasets, consider splitting them into parts if possible
  • you might need to use the garbage collector in python to clear up memory
Depth Estimation model run on my dataset

That’s all Folks!




