Uploading Big Data to Azure

Posted 12 February 2018 12:00 AM by Ivan Diaz, Junior SharePoint Developer @ ClearPeople

Have you ever needed to upload a huge amount of data to Azure?

There are some tools that allow to upload data to Azure like Blob Transfer Utility Tool or Microsoft Azure Storage Explorer. However, some of those tools have size limitation or simply they will take too long (even some weeks) to finish the process.
For those cases, Azure allows to create an Import/Export job on Azure Portal to manage this process:


The process involves sending some physic disks to the Azure centre in the region you choose and they should contain all the data that you want to upload to your Storage Account. Nevertheless, the drives need to have a specific format and the data need to be copied following a specific structure. To do this, Microsoft released a tool to ease this process named "Azure Import/Export tool".

To use this tool it’s necessary that your machine meets these prerequisites:

  • You need a computer (the "copy machine") with Windows 7, Windows Server 2008 R2, or a newer Windows operating system installed.
  • The .NET Framework 4 must be installed on the copy machine.
  • BitLocker must be enabled on the copy machine.
  • You need one or more empty 2.5-inch or 3.5-inch SATAII or III or SSD hard drives connected to the copy machine.
  • The files you are planning to import must be accessible from the copy machine, whether they are on a network share or a local hard drive.

If you meet the requirements, when you execute this tool, you need to specify some parameters in order to get the source, destination and configuration information. Then, the tool formats the destination disk and copies all the data into it. During the process, the tool generates a log, but also an important file named “journal” that will be necessary to attach later in your Import/Export job in Azure. This file contains important information that they use to decrypt the data and upload it to your storage account.

There was a first version of this tool that could cause you issues if you tried to copy data from different sources or to different disks. So, if you needed to carry out some of those actions, it required to develop a script to manage the different cases. However, last year, Microsoft released a new version of this tool and they have simplified the number of parameters for the tool and the way to provision the information about the source and destination of the data.

An example of execution for the latest version that will work for many cases would be:

.\\\\WAImportExport.exe PrepImport /id:"$session" /j:"$JournalFile" /logdir:"$LogDirectory" /sk:"$StorageAccountKey" /InitialDriveSet:"$driveSet" /DataSet:"$data"

  • $session: contains a name for that copy session.
  • $JournalFile: contains the name for the journal file.
  • $LogDirectory: contains the directory where you want to store the log for that copy session.
  • $StorageAccountKey: contains the key for the destination storage account.
  • $driveSet: contains the csv file with information about the destination drives.
  • $data: contains the csv file with information about the copied data.

As you can see in the parameters above, it is necessary to have two csv files to indicate some information. These files will allow us to copy data from different sources or to different drives without creating a complex script to manage those cases.

The driveSet.csv file contains the columns:

  • DriveLetter: contains the letter assign to the NTFS volume where you want to copy the data.
  • FormatOption: Format|AlreadyFormatted options.
  • SilentOrPromptOnFormat: SilentMode | PromptOnFormat options.
  • Encryption: Encrypt | AlreadyEncrypted options.
  • ExistingBitLockerKey: only if the drive is already encrypted.

For each row that you add in this file, you are adding a new drive where the data will be copied.

On the other hand, the dataSet.csv file contains the columns:

  • BasePath: contains the source where the data is located.
  • DstItemPathOrPrefix: contains the virtual directory in your Azure storage account.
  • ItemType: block | page options.
  • Disposition: rename | no-overwrite | overwrite options.
  • MetadataFile: contains the name of a xml file that contains metadata for the destination blobs.
  • PropertiesFile: contains the name of a xml file that contains properties for the destination blobs.

Providing different rows in this document, you can copy different data from different sources to the drives that you have specified in the driveSet file.

To complete the process, it is necessary to send the drives with the copied data to the Azure centre and attach to the Azure portal the journal files that the tool has generated for each copy session. Once they receive the drives, they usually process and transfer the data in a few days. When they finish to transfer all the data, it will be directly available in your storage account.


Add your comment





intranet Modern SharePoint teamwork employee engagement digital workspace SharePoint JavaScript Windows Azure Digital Transformation staff satisfaction productivity Microsoft Teams Office 365 Yammer cms content management system agile GDPR Microsoft Graph collaboration Microsoft sharepoint 2016 upgrade migration SharePoint Online 2016 Tech Trends Digital Disruption Context marketing marketing SharePoint 2010 SharePoint 2013 TFS Git security kentico Analytics jquery QA Quality Assurance testing content management websites Sitecore sitecore marketplace sitecore module cloud Microsoft Cloud Storage digital strategy technical consulting sitecore modules Experience database Sitecore 7 Sitecore 8 support account management customer experience Data Storage cms integration front end front end development prototype Cloud Storage StorSimple Front-end Development Layout SharePoint 2013 colour palette UI design website design log viewer sitecore cms website Azure big data business-critical sharepoint accessibility android apple chrome clear people clearpeople debug emulator ios mobile testing opera resize adobe desktop flash ie10 internet explorer 10 metro windows 8 bcsp Advanced System Reporter reporting framework ControlMode form control master page placeholder publishing console SharePoint 2007 SharePoint error search search results search values software testing testing scenario audit content information architecture retention schedules PowerShell QuickLaunch scripts SharePoint server 2010 business solutions metalogix replication replicator storagepoint stena technet UK Technet picture library slideshow web part RTM released to manufacturing caml caml query MOSS 2007 query infopath