Streamline Your S3 Management: How to Count Small Files Locally with Python

I recently came across a need to count objects in an S3 bucket that were of particular size. There isn’t a native way to do this within the S3 console so I decided to script it out using Python and the AWS SDK. You can see all of the code on my GitHub.

The script is very easy to utilize. The logic is as follows:

Step 1: Setup

First, the script initializes a client for the S3 service using Boto3. This setup requires your AWS credentials configured, which the script uses to authenticate requests to your S3 buckets.

Step 2: Input Parameters

The script accepts two command-line arguments:

  • The name of the S3 bucket.
  • The maximum file size (in bytes) for which you want to count the files.

Step 3: List and Count

Using Boto3’s paginator, the script efficiently handles large datasets by fetching lists of objects in batches. It iterates through each object in the specified bucket:

  • It checks if the object’s size is less than or equal to the size limit you specified.
  • It increments a count for each file that meets the criteria.

Step 4: Error Handling

If the script encounters issues accessing the bucket or if an API call fails, it catches the exception and prints an error message. This helps in debugging and ensures you understand why a particular operation failed.

Step 5: Output

Finally, the script outputs the total count of files that meet the size criteria. This result can be used directly in reports, further automation scripts, or just for informational purposes.

Usage:

python count_small_files.py my-example-bucket 1048576

Replace my-example-bucket with your bucket name and 1048576 with the maximum file size in bytes (1 MB in this example). This command will tell you how many files in my-example-bucket are 1 MB or smaller.

This Python script is a practical tool for anyone looking to manage S3 storage more efficiently. By running this script, you can quickly gather insights into your data distribution, helping you make informed decisions about storage management and optimization.

Stay tuned for more insights on managing cloud services and enhancing your productivity through automation. Don’t forget to subscribe for more updates and tutorials!

Comments

Leave a Reply