I was browsing the Microsoft Technet forums last week and came across a question if there’s a way to back up files and folders to an Azure Storage Blob by using PowerShell. I know that Microsoft introduced Azure Site Recovery (ASR) and Azure Backup together with the Azure Backup Agent (MARS) (more information on the Microsoft site) to achieve exactly this functionality.
But thinking further, I thought this could be a nice opportunity to create such a script and get some more knowledge about writing to Azure Storage using PowerShell. So this is exactly what I did: create a script which can create a backup of your files on Azure Blob Storage. This script will check either the last write time of the file, or the MD5 hash of the content (depending on the passed parameters), and copies the files to Azure which are either newer, or have a different MD5 hash. In this article I’ll describe how the script works and what the challenges were when creating the script.
The PowerShell script I created is available on the Microsoft Technet Gallery: https://gallery.technet.microsoft.com/Back-up-files-to-Azure-b9e863d0 |
Storage Account
Before using the script, you should create a storage account in Microsoft Azure. So open the portal, and add a storage account with the following properties:
- Deployment model: Resource Manager
- Account kind: Blob storage
- Performance: Standard
- Replication: LRS
- Access tier: Cool
The other properties don’t really matter, just create it in the region you want, in any Resource Group and give it a name that is available. When using the above settings, you will create a cost-optimized storage account (cool data tier and LRS replication is cheaper than eg. hot data and GRS replication). If you’d like the data replicated across different regions or data centers, you can select GRS or ZRS replication.
Once created, you’ll need to update the script and enter the correct storage account name and storage account access key. The access key can be found in the properties blade of the storage account. You can either use the primary or secondary key.
The script
Checking if a file has changed
Now for the script itself. First of all, I needed to think of a way to determine if a local file has changed. The actual file modification time when copying the file to Azure is lost. The only timestamp you’ll be able to find on the Blob is the time the Blob itself was modified on Azure. Meaning that if you upload a file that has been changed 2 days ago, it will store the date and time when it was uploaded or overwritten. This means that I’ll need to store the modification date of the local file in Azure. Luckily, Azure allows you to store metadata on a per-blob basis, which allows me to store the modification time of the local file in Azure. How to read and write metadata in Azure blobs is explained in the next chapter.
The other option on checking file changes, is to use the MD5 hash of the file. When uploading the file to Azure, the MD5 hash of the file content is stored automatically. You can see the Content MD5 property when viewing the Blob properties in the Azure portal:
To compare the MD5, the MD5 of the local file needs to be calculated. This can be done using a few lines of code:
Function Get-MD5Hash { Param ( [Parameter(Mandatory=$true)][String]$Path ) If (Test-Path -Path $Path) { try { # Create the hasher and get the content $crypto = [System.Security.Cryptography.MD5]::Create() $content = Get-Content -Path $Path -Encoding byte $hash = [System.Convert]::ToBase64String($crypto.ComputeHash($content)) } catch { $hash = $null } } Else { # File doesn't exist, can't calculate hash $hash = $null } # Return the Base64 encoded MD5 hash return $hash }
This function returns the MD5 hash of the file, which can be compared to the MD5 hash on Azure. If those two value differ, you’ll know that the file has been changed (either locally or on Azure). How to get the Content MD5 on Azure side is described in the next chapter.
Blob MetaData and Properties
Retrieving Blob metadata and properties is not possible using a “simple” PowerShell cmdlet. What you need to do is retrieve the Blob object using the “Get-AzureStorageBlob” cmdlet and creating a “CloudBlockBlob” object out of it. This can be done using these commands (I’m omitting the blob name, container name and storage context):
$azblob = Get-AzureStorageBlob -Blob $blobname -Container $Container -Context $context $cloudblob = [Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob]$azblob.ICloudBlob
Once the object is create, you can use the $cloudblob variable to access both the metadata and the properties. The metadata is accessible using the MetaData property of the variable, while the MD5 content is available in the Properties.ContentMD5 property (which will only be available after you run the FetchAttributes() command:
$cloudblob.MetaData | Format-Table $cloudblob.FetchAttributes() $cloudblob.Properties | Format-Table Write-Host $cloudblob.Properties.ContentMD5
These properties can then be compared to the values of the local files. If the values differ, the file can be overwritten on Azure.
Containers
Creating and checking container availability can be done using the “Get-AzureStorageContainer” cmdlet. The downside of this cmdlet is that error handling is not very good. If you want nice user-friendly error messages (like I always want in my scripts!), you’ll need to both have the “-ErrorAction SilentlyContinue” parameter set together with a “try {} catch {}” block. In PowerShell you can check if your last command returned an error by checking the built-in “$?” variable, if this variable is false, the command returned an error. Next, you can retrieve $Error[0] (built-in variable as well) to get the exact error message.
The “Get-AzureStorageContainer” cmdlet can throw an error for various reasons:
- The container name is not correct (the container name should contain only lowercase characters, can only contains letter, numbers and dash, more information here)
- There is no internet connection
- The storage account name is not correct
- The storage account access key is not correct
This code executes the cmdlet and checks which error was generated:
try { $azcontainer = Get-AzureStorageContainer -Name $Container -Context $context -ErrorAction SilentlyContinue } catch {} If ($? -eq $false) { # Something went wrong, check the last error message If ($Error[0] -like "*Can not find the container*") { # Container doesn't exist, create a new one Write-Host -Value "Container `"$Container`" does not exist, trying to create container" -ForegroundColor Yellow $azcontainer = New-AzureStorageContainer -Name $Container -Context $context -ErrorAction SilentlyContinue If ($azcontainer -eq $null) { # Couldn't create container Write-Host -Value "ERROR: could not create container `"$Container`"" -ForegroundColor Red return } Else { # OK, container created Write-Host -Value "Container `"$Container`" successfully created" -ForegroundColor Yellow } } ElseIf ($Error[0] -like "*Container name * is invalid*") { # Container name is invalid Write-Host -Value "ERROR: container name `"$Container`" is invalid" -ForegroundColor Red } ElseIf ($Error[0] -like "*(403) Forbidden*") { # Storage Account key incorrect Write-Host -Value "ERROR: could not connect to Azure storage, please check the Azure Storage Account key" -ForegroundColor Red return } ElseIf ($Error[0] -like "*(503) Server Unavailable*") { # Storage Account name incorrect Write-Host -Value "ERROR: could not connect to Azure storage, please check the Azure Storage Account name" -ForegroundColor Red return } ElseIf ($Error[0] -like "*Please connect to internet*") { # No internet connection Write-Host -Value "ERROR: no internet connection found, please connect to the internet" -ForegroundColor Red return } }
It will output a user-friendly error message.
Uploading the blob
Before uploading the blob, the full path of the local file needs to be converted to a naming convention which is allowed by Azure Blob storage. The only “folder” blob storage supports is the container. Everything after the container is considered part of the blob name. So if you want to “emulate” a folder structure in Azure Blob storage, you can use a forward slash (/). So a file “C:\Data\Documents\My Important Document.docx” could be converted to “C/Data/Documents/My Important Document.docx”. The nice thing about the Azure Portal is that if you store a Blob like this, the portal will simulate the folder structure itself (allowing you to traverse through the different folders).
Once the blob name is generated, the file can be uploaded using the “Set-AzureStorageBlobContent” cmdlet. This cmdlet allows you to pass a hashtable to the -MetaData parameter, which will be the set as MetaData on the Azure Blob. Because it’s possible the Blob already exists, you can define the “-Force” parameter to force overwriting of the existing Blob:
$output = Set-AzureStorageBlobContent -File $file.FullName -Blob $blobname -Container $Container -Context $context -Metadata @{"lastwritetime" = $file.LastWriteTimeUTC.Ticks} -Force
Note the -MetaData parameter; it takes a hashtable as value, which can be defined in a single line. If you want multiple Metadata properties to be added, you can split the key-value pairs with semicolon (;):
Set-AzureStorageBlobContent -File $file.FullName -Blob $blobname -Container $Container -Context $context -Metadata @{"property1" = "value1"; "property2" = "value2"; "property3" = "value3"}
MARS Agent
As stated before, Azure already offers different services to backup your files and folders into Azure. One of these is Azure Site Recovery services, which uses a locally installed agent, called the “MARS” agent. So if you’re looking into a light-weight way to backup files to Azure, you can leverage my script. However, if you’re looking for a more rugged and flexible way to backup data (including retention, etc.), take look at the “Back up Windows Server files and folders” article on the Microsoft site.
I hope this article was useful for you. If you have any questions, please don’t hesitate to leave a comment or contact me over email. |
Hi,
I really liked this script; I came across it while looking to do something slightly different, but it is still really useful.
I have one question: Why did you choose to write your own function to get the MD5 hash instead of using the Get-FileHash commandlet?
Cheerio,
Lars
Hi Lars,
thanks for your comment; I built my own function because it’s simply faster. Get-FileHash will load the entire file into memory which could take a while with large files.
Ahhh, that makes sense; now I’m curious, and think I will have to test it against Get-Filehash. I frequently use that functionality.
Thanks again.