Copy Data from Azure Storage

Cloud Journey
7 min readAug 8, 2021

--

Overview

In hybrid architecture of mixed cloud solution, you might face situations where you must migrate digital content from one cloud platform to another. In this article, we will explore the options to securely copy data out of Azure blob storage.

We will focus on managed services with out of box features, we won’t cover developing your own code option.

List of potential design:

  • AWS data pipeline or glue
  • Snowflake external stage
  • Snowflake cross account replication
  • Azure Data Factory S3 Connector
  • Azure Data Factory Rest Connector
  • Azure Data Factory SFTP Connector

AWS data pipeline or glue

Based on AWS doc, data pipeline works with couple of AWS data store, typical use case is to send over log and generate report.

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/datapipeline-related-services.html

AWS Glue is a fully managed ETL service, it does not list Azure storage as either source or target.

https://docs.aws.amazon.com/glue/latest/dg/how-it-works.html

Snowflake external stage

The Snowflake SQL commands to define an external stage support the URL and credential specifications for Azure Blob Storage.

Per snowflake document, in terms of network security, in case Snowflake is on Azure, you many configure storage account firewall to whitelist snowflake vnet/subnet. Data transfer remains in Azure backbone.

For IAM security, use AAD service principal with Azure storage data plane RBAC role, even though storage account SAS is also supported, but it’s not preferred.

When Snowflake is on AWS, it’s not possible to whitelist vnet/subnet from Azure storage account firewall, whitelist IPs is possible, but it’s not recommended by Snowflake, because it’s not static IP, literally you need to whitelist all IPs from an entire Azure region. Data will traverse public network.

IAM Security

Snowflake provides two options for authentication and authorization on Azure storage.

  1. Generate per snowflake account app registration when defining storage integration object. Azure data plane RBAC role is granted to the app registration (Storage Blob Data Reader or Storage Blob Data Contributor)

2. Create SAS token from your storage account in your own subscription.

Option 1 is recommended by vendor, since user wouldn’t need to provide the credential when create or reference Azure storage stage. However per snowflake account app registration has limitation to accommodate least privileges. One security principal will have access to all your storage stages.

Shared access signature are keys that grant permissions to storage resources, and should be protected in the same manner as secret or password, should be performed only over an HTTPS connection. In general this is not recommended approach and it also add more complexity for operation. But with the limitation of snowflake per account app registration, option 2 does provide better segmentation.

Also Azure supports user delegation SAS (https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas), which is less easily compromised than storage access key.

IAM Security Design Options

You will have to accept risk associated with the product feature limitation, one security principal has access to all your storage stage, in case compromise, the impact scope is much broader.

You could also create multiple snow flake accounts to achieve the segmentation.

Azure User delegation SAS is an option worth a try as well.

Snowflake Datatbase Replication

It’s also possible to do replication between Azure snowflake and AWS snowflake. Introduction to Database Replication Across Multiple Accounts — Snowflake Documentation

I couldn’t locate anything regarding network security when replicate between two SaaS snowflake account DB, I guess we have to rely on encryption. It has unique encryption key for each replication job, and can protect with multiple layer encryption via Tri-Secret Secure feature.

When traversing internet is inevitable, we may want to enhance encryption for extra protection.

Azure Data Factory S3 Connector

Azure Data Factory is fully managed, serverless data integration service, with more than 90 built-in, maintenance-free connectors at no added cost.

While Azure storage can be both source and sink in ADF, AWS S3 can only be source, meaning you can use ADF to copy data from S3 and store in Azure, not the other way around.

My best guess is that any vendor might prefer to host the data, instead send data to competitor’s platform.

Azure Data Factory Rest Connector

Is it possible to use ADF rest connector to talk with S3? it seems not working based on the authentication type.

ADF generic rest connector supports Anonymous+API key, Basic, AadServicePrincipal, and ManagedServiceIdentity authentication. None of these options work with AWS S3 rest API.

Azure Data Factory SFTP Connector

Azure data factory supports SFTP connector as both source and sink. AWS Transfer Family is a secure transfer service that enables you to transfer files into and out of AWS storage services.

IAM Security

AWS transfer family provides fine grain access control through IAM role, IAM policy and session policy. You create IAM role with IAM policy to define what S3 permission a user has, you also needs to establish trust relationship between AWS transfer family and the IAM role. Overall AWS transfer family supports three types of identity provider.

For Azure data factory, managed identity should be added, and you only grant storage account RBAC permission to ADF managed identity.

Private Pattern

Network traffic remains private and does not traverse the public internet, but uses your own network bandwidth when cross CSPs. This design works when you have both Azure and AWS directly connect to your on prem data center.

Private Pattern Network Security

ADF self hosted integration runtime resides in your Azure vnet, you could use either service endpoint for private endpoint to access storage blob as source.

In AWS, you create VPC endpoint internal access for AWS transfer SFTP server. Inbound control can be applied at AWS security group.

Internet Facing Pattern

Data transfer does not use your own network bandwidth, it’s PaaS to PaaS and traverse public network.

Internet Facing Pattern Network Security

Azure storage supports resource level firewall rule, to allow ADF as a trusted service. Mostly like the traffic remains in Azure backbone, since it’s between Azure service and Azure service.

Azure IR also has static IP ranges which can be used for filtering in data stores/ Network Security Group (NSG)/ Firewalls for inbound access from Azure Integration runtime. For example, if the Azure region is AustraliaEast, you can get an IP range list from DataFactory.AustraliaEast.

(note: Azure Integration Runtime which enable Managed Virtual Network don’t support the use of fixed IP ranges.)

In AWS, A static elastic IP is attached to the VPC endpoint, allows external client to use the SFTP transfer service. Firewall rule can be applied at AWS security group.

Conclusion

We researched couple of options to copy data from Azure storage blog into SaaS or another cloud platform. Some of the options apparently do not work, some option might work but with security risk.

POC might be the best way to understand more about each design idea and also confirm whether the design will work as expected.

By the way, I use draw.io to draw the diagram.

References

--

--

Cloud Journey
Cloud Journey

Written by Cloud Journey

All blogs are strictly personal and do not reflect the views of my employer. https://github.com/Ronnie-personal

No responses yet