Data warehouse with Microsoft Azure & Snowflake
Faster time to market often drives companies of all sizes to move their legacy data warehouse applications to the cloud. Other factors include Scalability, Increased security, cost reduction, enhanced operations, and elasticity. However, all these benefits could be neutralized by the biggest challenge Information Technology (IT) departments face, namely, choosing the best path to moving legacy data warehouse applications to the cloud.
For many large organizations though, it is the go-to tool to utilize their data effectively. It’s not a matter of whether you go to the cloud to manage your enterprise data, but how and when are the critical questions.
In this article, we highlight some of the key benefits of building a modern Data Warehouse on Microsoft Azure using Snowflake and benefits of moving your Enterprise Data Warehouse (EDW) to the cloud and how you can get the process started.
Why Modernize Data Warehouse?
For many large organizations, Enterprise Data Warehouses (EDWs) are their lifeblood which supports a variety of workloads, including financial reporting, customer satisfaction analysis, manufacturing quality, shipping & logistics, as well as ad hoc workloads from individual business units. This ability to support so many fields makes EDW the go-to tool for any organization looking to employ their data efficiently.
Although many organizations still have their EDWs based on site, there is a growing inclination for moving this data to the cloud. As operational data volumes continue to grow at exponential rates, this gives rise to service-level expectations, and the need to integrate structured warehouse data with unstructured data in a data lake becomes greater.
Here are some critical points due to which enterprises are increasingly thinking to modernize their data warehouse:
- Build and evolve over a decade with too many legacy components
-Too much data silos
- Difficulties in upgrade and maintenance - Not design for Cloud
- Lack of SLAs
- Challenges in scaling on demand - Challenges to have global visibility, reach and monitoring
- Too many systems with custom, complex and duplicate infrastructure having a lack of engineering systems, state of art monitoring and replication mechanism
The benefits of a cloud-based data warehouse
Having a cloud-based data warehouse simplifies time-consuming and costly management, administration, and tuning activities that are typical of on-premises data warehouses. But beyond these obvious advantages, there’s more to be gained from moving to the cloud. Some of the key advantages of a cloud-based data warehouse are listed below.
- Scalability
- Start-up costs are a fraction of on-premises
- Reduce ongoing costs
- Increased security
- Simplified operations
- Allows for new capabilities
- No disruption to internal users
- Easily change user numbers
- Access to a virtual team of experts
Building a Modern Data Warehouse on Microsoft Azure using Snowflake
The first question one might get is - why build Data warehouse with Microsoft Azure & Snowflake? First, let’s get this answered.
Microsoft Azure: It is a modern intelligent cloud platform by Microsoft having a plethora of Platform and infrastructure services embracing Windows and Open Source ecosystems and capable of catering all the Public and Hybrid Cloud enterprise scenarios. It is one of the most secure, compliant with industry norms and standard and most trustworthy cloud when it comes to storing your data.
Snowflake: It is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a very flexible & faster data warehouse which is more user-friendly than any other traditional data warehouse offerings.
Lifecycle of Modern Data Warehouse on Microsoft Azure
In the enterprise applications scenario, we see lot of data getting stored in different forms and features. In the era of Artificial Intelligence and predicative analysis using Machine Learning (which is a subset of AI), we need large, scalable, secure, robust and highly available, on demand data stores.
Image Source – Microsoft reference presentation of “Microsoft Azure Day Workshop”.
Above diagram represents how Modern Microsoft Azure based Data Warehouse should look like. If you see the entire warehouse cycle have 5 main steps –
- Ingest – Building a pipeline or a single endpoint to flow your data into your warehouse irrespective of whether it is structured or unstructured. Traditionally it is like ETL job
- Store – Storing massive amounts of structured/non structured data with optimized IOPS
- Preparation – Data cleansing to ensure redundancy, duplication removed and refined data available for predictive analysis
- Serve – On demand availability of data with enterprise grade SLAs and optimized IOPS for faster processing
- Visualization – Real time state of art visualization of data in different forms. Easy integration into variety of platforms and apps
Data Services in Microsoft Azure
Modern data and analytics estate can transform business with five verticals having a complementary Microsoft Azure services to it.
- On premises
- SQL Server - Lift & Shift and SaaS
- Azure SQL DB
- Azure SQL Managed Instance
- Azure MySQL
- Azure PostgreSQL
- Azure MariaDB - Globally Distributed
- Azure Cosmos DB - Analytics
- Azure SQL Datawarehouse
- Azure HD Insight
- Azure Databricks
- Azure Data Lake
- Azure Data Factory
- Azure Data Catalog
- Azure Stream Analytics - Visualization
- PowerBI
What is Snowflake and how it helps to modernize your data warehouse?
Snowflake is a data warehouse-as-a-service which provides complete relational database support for both structured data, such as CSV files and tables, and semi-structured data, including JSON, Parquet, etc., all within a single integrated solution. It does not require any management and features separate as compute, storage, and cloud services that can scale and change independently.
Around July 12, 2018, Snowflake announced its availability on Microsoft Azure Cloud Platform as DWaaS (Datawarehouse-as-a-Service). Snowflake is now offered in Azure as a cloud computing option to run Snowflake’s cloud-built data warehouse.
Here is the high-level structure of Snowflake on Azure
Image Source – Snowflake documentation here
You can also have Azure Active Directory Integration to manage your account and Sign-on for Snowflake. Read additional information here for integrating Azure AD with Snowflake.
Building Visualization over Snowflake Datawarehouse using PowerBI
Power BI is a business analytics solution (Power BI Desktop & Power BI service) that lets you visualize your data and share insights across your organization or embed them in your application or website. Through Power BI, you can connect to hundreds of data sources and bring your data to life with live dashboards and reports. Most users who work on Business Intelligence projects use Power BI Desktop to create reports, and then use the Power BI service to share their reports with others.
To connect PowerBI Desktop to Snowflake you need to install the Snowflake ODBC driver on your PowerBI Desktop machine. Once installed, you can use the built-in Snowflake data source for PowerBI. To connect to a Snowflake computing warehouse, select Get Data from the Home ribbon in Power BI Desktop. Select Database from the categories on the left, and you see Snowflake.
Image Source – Microsoft documentation here
Hope this article helps in addressing some of the key concerns relating to moving database applications to the cloud and building a modern Data Warehouse on Microsoft Azure using Snowflake. Please leave your thoughts in comments.