Amazon Web Services recently launched its low-cost big data analysis service Redshift. Amazon Redshift is a new data warehouse for big data on cloud. It is a specialized type of relational database, optimized for high-performance analysis and reporting of very large datasets.
Amazon also promises 10 times the performance at one-tenth cost. You can pay by the hour or use reserved instance pricing to lower your price to under $1,000 per TB per year, a tenth the cost of most traditional data warehousing solutions.
Before Redshift, users had to turn to Hadoop for querying over TBs of data. The problem with Hadoop, though, is that its native query mechanism is MapReduce code taking you completely out of your comfort zone of using SQL queries. You query Redshift database with SQL making it easier to integrate it with most of applications. Amazon Redshift is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more. It is also found to be lot faster than Hadoop and Hive in many use cases.
Scalable - The ability to provision huge databases as needed, without going through a costly and slow procurement process to obtain the hardware and software.
Fast - Amazon Redshift delivers fast query and I/O performance for virtually any size dataset.
Fully Managed - Amazon Redshift manages all the work needed to set up, operate, and scale a data warehouse, from provisioning capacity to monitoring and backing up the cluster, and to applying patches and upgrades. By handling all these time consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business insights.
Compatible - Amazon Redshift is certified by Pentaho, Jaspersoft and MicroStrategy, with additional business intelligence tools coming soon. You can connect your SQL client or business intelligence tool to your Amazon Redshift data warehouse cluster using standard PostgreSQL JBDBC or ODBC drivers.
Redshift is based on industry-standard PostgreSQL, so most existing applications will need minimal changes.
Using columnar storage for database tables is an important factor in optimizing analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.
The core infrastructure component of an Amazon Redshift data warehouse is a cluster. A cluster is composed of one or more compute nodes. If a cluster is provisioned with two or more compute nodes, an additional leader node coordinates the compute nodes and handles external communication. Your SQL client communicates with the leader node, which in turn coordinates query execution with the compute nodes.
Redshift integrates with various data loading and ETL tools and best-in-class BI, data mining, and analytics tools. Over the coming days, it can be a real game changer and it will be interesting to see how it takes on big giants in data warehousing.
e-Zest goal is to help companies migrate their large data onto the AWS cloud and also help companies to analyze their data as efficiently as possible using tools like Amazon Redshift.