Using scripting to perform bulk functions in Azure Blob storage

Published On: November 15, 2022

Recently I needed to go through a big data set and find the occurrence of a certain file suffix. Specifically, I needed to check whether an archive had any files with the lockbit suffix which means that it had been affected by ransomware.

The file structure of this archive was saving the files I needed to check-in folders based on their file type within date-specific folders. These folders would start at a year as the parent folder, move to month, day, hour, and then finally minute before having individual folders for the file type. Therefore, no easy searching as you would have in a flat folder structure.

Looking around the web there was no easy solution in searching using the Azure portal or Storage explorer application. It’s then when I remembered that I had hacked together a script recently to bulk change access tiers of files based on their current access tier. I ended up not using the script when discovering lifecycle policies (see here for a discussion on that topic). But with some changes, I was easily able to adapt it to this requirement.

Amend the following script to suit your needs and then run in PowerShell:

$storageAccountName = “StorageAccountName”

$storageContainer = “ContainerName”

$MaxReturn = 10000

$count = 0

$StorageAccountKey = “StorageAccountKeyYouCanGetFromPortalOrStorageExplorer”

write-host “Starting script”

$ctx = New-AzStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $StorageAccountKey

$Token = $Null

do

{

$listOfBlobs = Get-AzStorageBlob -Container $storageContainer -Context $ctx -MaxCount

$MaxReturn -ContinuationToken $Token

foreach($blob in $listOfBlobs) {

if($blob.Name -like “*.SuffixYouAreLookingFor“)

{

write-host “the blob ” $blob.name “Has the file you looking for”

$count++

}

}

$Token = $blob[$blob.Count -1].ContinuationToken;

write-host “Processed ” ($count) ” items. Continuation token = ” $Token.NextMarker

}

while ($Null -ne $Token)

write-host “Complete processing of all blobs returned

You can also obviously change it to not search suffix but part of the file name or full file name etc.

The script gets a list of blobs and then loops through them and then posts in the shell at every file it finds and at every section it completes. I initially ran these types of scripts with getting a full list of blobs, but with the size of this container and the number of blobs, it would fail. This uses the continuation token to retrieve it in sections.

Run this in the same geographic region as your container and it should run pretty quick.

Did you find this article on Azure Blob Storage helpful? We have many more articles that are just as helpful as this one – browse through them here.

Related Articles