X
Closing this message and/or accessing our website tells us you are happy to receive all cookies on the ClearPeople website.
However, if you would like to, you can change your cookies settings at any time.
Have you ever needed to upload a huge amount of data to Azure? Here's some tips and tricks!

Have you ever needed to upload a huge amount of data to Azure?

 

There are some tools that allow to upload data to Azure like Blob Transfer Utility Tool or Microsoft Azure Storage Explorer. However, some of those tools have size limitation or simply they will take too long (even some weeks) to finish the process.

For those cases, Azure allows to create an Import/Export job on Azure Portal to manage this process:

Uploading Big Data to Azure

The process involves sending some physic disks to the Azure centre in the region you choose and they should contain all the data that you want to upload to your Storage Account. Nevertheless, the drives need to have a specific format and the data need to be copied following a specific structure. To do this, Microsoft released a tool to ease this process named "Azure Import/Export tool".

 

To use this tool it’s necessary that your machine meets these prerequisites:

  • You need a computer (the "copy machine") with Windows 7, Windows Server 2008 R2, or a newer Windows operating system installed.
  • The .NET Framework 4 must be installed on the copy machine.
  • BitLocker must be enabled on the copy machine.
  • You need one or more empty 2.5-inch or 3.5-inch SATAII or III or SSD hard drives connected to the copy machine.
  • The files you are planning to import must be accessible from the copy machine, whether they are on a network share or a local hard drive.

If you meet the requirements, when you execute this tool, you need to specify some parameters in order to get the source, destination and configuration information. Then, the tool formats the destination disk and copies all the data into it. During the process, the tool generates a log, but also an important file named “journal” that will be necessary to attach later in your Import/Export job in Azure. This file contains important information that they use to decrypt the data and upload it to your storage account.

 

There was a first version of this tool that could cause you issues if you tried to copy data from different sources or to different disks. So, if you needed to carry out some of those actions, it required to develop a script to manage the different cases. However, last year, Microsoft released a new version of this tool and they have simplified the number of parameters for the tool and the way to provision the information about the source and destination of the data.

 

An example of execution for the latest version that will work for many cases would be:

.\\\\WAImportExport.exe PrepImport /id:"$session" /j:"$JournalFile" /logdir:"$LogDirectory" /sk:"$StorageAccountKey" /InitialDriveSet:"$driveSet" /DataSet:"$data"

  • $session: contains a name for that copy session.
  • $JournalFile: contains the name for the journal file.
  • $LogDirectory: contains the directory where you want to store the log for that copy session.
  • $StorageAccountKey: contains the key for the destination storage account.
  • $driveSet: contains the csv file with information about the destination drives.
  • $data: contains the csv file with information about the copied data.

As you can see in the parameters above, it is necessary to have two csv files to indicate some information. These files will allow us to copy data from different sources or to different drives without creating a complex script to manage those cases.

 

The driveSet.csv file contains the columns:

  • DriveLetter: contains the letter assign to the NTFS volume where you want to copy the data.
  • FormatOption: Format|AlreadyFormatted options.
  • SilentOrPromptOnFormat: SilentMode | PromptOnFormat options.
  • Encryption: Encrypt | AlreadyEncrypted options.
  • ExistingBitLockerKey: only if the drive is already encrypted.

For each row that you add in this file, you are adding a new drive where the data will be copied.

On the other hand, the dataSet.csv file contains the columns:

  • BasePath: contains the source where the data is located.
  • DstItemPathOrPrefix: contains the virtual directory in your Azure storage account.
  • ItemType: block | page options.
  • Disposition: rename | no-overwrite | overwrite options.
  • MetadataFile: contains the name of a xml file that contains metadata for the destination blobs.
  • PropertiesFile: contains the name of a xml file that contains properties for the destination blobs.

Providing different rows in this document, you can copy different data from different sources to the drives that you have specified in the driveSet file.

 

To complete the process, it is necessary to send the drives with the copied data to the Azure centre and attach to the Azure portal the journal files that the tool has generated for each copy session. Once they receive the drives, they usually process and transfer the data in a few days. When they finish to transfer all the data, it will be directly available in your storage account.

Author bio

Ivan Diaz
Ivan Diaz
Junior Developer
I deal with our clients' support tickets related to both SharePoint and Azure. I try to satisfy their requirements in the best way possible in a timely manner, and always with a smile. There's no better sense of satisfaction than fixing a complicated issue that our clients will thanks us for.

Comments


comments powered by Disqus

Related Articles

Sign up to our Newsletter

Every now and then, we'd like to send you information that delivers, develops and promotes our products and services that are relevant to you. Submitting your details tells us that you're OK with this and you also agree to our Privacy & Cookies policy. You can, of course, opt out of these communications at any time.