e-Zest members share technology ideas to foster digital transformation.

SaaSification for Big Data application

Written by Shailesh Kulkarni | Jul 5, 2017 11:45:00 AM

“Products as a Service” or “Software as a Service” models have innate abilities to potentially scale and be cost effective for both the service owners and the consumers. They also boost the ability to effectively manage the business operations & management services. In one such instance, we have a customer with an on premise Big Data (Hadoop) based application which they were interested to offer as a Multi-tenant SaaS based solution. The customer pressed for a solution that would induce zero to minimal code changes in the existing platform’s codebase. Other specific requirements consisted of having a relatively lower cost of cloud infrastructure per customer while enabling automated on-boarding of new tenants.

So what could be an effective solution for such a case?

In an instance where we deal simultaneously with big data, SaaS and multi-tenancy, a proposed solution could be one that is based on AWS Big Data Stack and S3 as the scalable storage. There would also be a need for the development of an Automated Deployment and Verification Framework. This framework will be responsible for allowing the Admin/Ops team to carry out automated deployment for a new customer. Having a SaaS Management tool for the admin/operations team would assist them in onboarding new customers. The tool shall enable the following functions:

  • Capture customer information and configuration
  • Trigger customer on-boarding job
  • View customer on-boarding jobs
  • View to access information helpful to troubleshoot failed on-boarding jobs

Let us now have a look at the technical details of the solution. AWS S3 would store all of the tenant’s data. Each Tenant will have a standing but lightweight (zero data node) AWS EMR cluster for running M&R and Hive jobs. The other specifications include:

  • Raw access to S3 via HDFS Toolsets and Java APIs
  • Hive JDBC based access for BI and Reporting Tools
  • Hive Metastore stored externally for recovery of Hive metadata in case of failure of EMR cluster

There are a number of advantages associated with using the above solution to provide the big data application as SaaS based solution. Firstly, AWS S3 is a low-cost and scalable data storage option for each of the customers. AWS EMR and S3 have support or supported access methods in Hadoop Ecosystem allowing the access systems to work with the stack with minimum changes to the code. Also, it would enable automated provisioning of big data cloud resources for on-boarding of new customers. This is how the SaaSification for big data application would benefit the service provider and the customer.