Title image of 3 important Blob Storage tips

3 important Blob Storage tips

27 July 2022

·
Azure

Azure blob storage is awesome. It’s very straightforward to set up and start uploading data. However, a few gotchas could cause problems and ruin your experience of the service.

So how do you get the most out of Azure blob storage?

Every time I use the service I follow these 3 tips. I’ve compiled them from my past mistakes. So learn from my pain and don’t get caught out with blob storage :)

1) How to keep Blob Storage Cheap

Losing all your money

Don’t let it take all your money!

You pay for Blob Storage in two ways: The amount of data stored AND the number of operations performed.

Data stored is a straightforward metric and easy to manage. But people often get caught out by operations performed.

It’s basically a charge for every 10,000 reads and writes. The charge is minimal and 10,000 sounds like a lot. You don’t think it would ever be a problem.

But all it takes is a bad design choice in your code and you can end up with a huge Azure bill.

I know this because I’ve done it myself :D

How I messed up with Blob Storage

At my day job, I was improving the performance of one of our background processing systems.

This system processes a couple of million records a day and churns out tones of transaction logs. These logs are important for auditing purposes so we were shoving them into a SQL database.

The problem was SQL Server isn’t designed to ingest large amounts of data. It got to the point where just saving the logs used to take 4 hours…

The logs upload was a huge bottleneck in this background job.

So I had the idea to save the logs to blob storage instead. Blob storage is perfect for data shoving. So I made the change and it worked! Saving the logs went down from 4 hours to just 20 minutes! huge improvement :)

All was good until a bad design choice I made came back to bite us…

I decided to store the logs for each record in its own file. I wanted to make it fast to find logs for specific records which is important for anyone diagnosing an issue.

So every day the system was saving 2 million blobs to storage. That’s 2 million transactions :D

The cost of write operations for cool blobs in UK South is £0.1173 per 10,000.

So (2,000,000 / 10,000) * £0.1173 = £23.46 per day

or about £700 per month.

Pretty pricing for storing logs we rarely look at…

Once we realized this we change it quickly to combine the logs into a single file before it was uploaded to Blob Storage.

Here’s the monthly cost difference:

MethodOperations CostStorage CostTotal per month
Separate Blobs£700£1.80£701.80
Single Blob£0.1173£1.80£1.91

Using UK South prices for Cool blobs

2) Folders

1000’s of spinning folders

So organized :o

Keep your blobs organized.

This is especially true if you’re storing customer data. You always want some kind of separation between customers.

You will have major issues if your customers can accidentally see each other’s information.

Folders are an easy way to do it.

Using whatever internal customer Id you have is a good place to start. And you can always include usernames, dates or even products. It depends on your application to what folders you use. But making blobs easier to find will make things easier later on.

A benefit I’ve seen from doing this is during customer off-boarding. When a customer cancels their subscription we need to delete all of their data.

Tracking down hundreds of blobs is a nightmare. But because we’ve grouped them all into a customer folder we can just delete the folder.

Good data management saves so much pain.

3) Index Tags

Bart Simpson using his label maker

Easy to find everything with labels!

Index tags are a relatively new addition to blob storage. They allow metadata to be attached to each blob. The metadata is indexed and searchable so finding blobs is now easy.

Metadata can include which project the blob is from, the processing status of the blob and when the blob was created. All really useful information. And when combined with folders make data management a lot easier.

Previously we had to use other databases to keep track of our blobs. Storing the metadata in another location increases complexity and risks becoming out of sync. Index tags are a much simpler solution.

My tip around index tags is simple: Use them.