Saturday, 13 January 2018

Azure Data Lake Introductory

Introduction
This is my first article related to Azure. This article is dedicated to my father Late Subal Chandra Das, who always inspire me to do something new. I missed you lot Baba.


This article is related to the general architecture of Azure Data Lake. Hope it will be a good foundation to start with Azure Data Lake. The article is a representation of my understanding with Azure Data Lake.

In coming days we are going to be more advanced with it. Hope it will be informative.

What is Data Lake
 Before jump into Azure Data Lake, we have to understand the concept behind Data Lake.

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.

A data lake, on the other hand, maintains data in their native formats and handles the three Vs of big data (Volume, Velocity and Variety) while providing tools for analysis, querying, and processing. Data Lake eliminates all the restrictions of a typical data warehouse system by providing unlimited space, unrestricted file size, schema on read, and various ways to access data (including programming, SQL-like queries, and REST calls).



With the emergence of Hadoop (including HDFS and YARN), the benefits of data lake – previously available only to the most resource-rich companies like Google, Yahoo, and Facebook – became a practical reality for just about anyone. Now, organizations who had been generating and gathering data on a large scale but had struggled to store and process them in a meaningful way, have more options.


Feature of Azure Data Lake
Azure Data Lake is a new kind of data lake bock from Microsoft Azure. The features that it offers are mentioned below.

•           The ability to store and analyze data of any kind and size.
•           Multiple access methods including U-SQL, Spark, Hive, HBase, and Storm.
•           Built on YARN and HDFS.
•           Dynamic scaling to match your business priorities.
•           Enterprise-grade security with Azure Active Directory.
•           Managed and supported with an enterprise-grade SLA.


Parts of Azure Data Lake
Broadly the Azure Data Lake is classified into three parts



Azure Data Lake Store



The Data Lake store provides a single repository where organizations upload data of just about infinite volume. The store is designed for high-performance processing and analytics from HDFS applications and tools, including support for low latency workloads. In the store, data can be shared for collaboration with enterprise-grade security.

Azure Data Lake analytics
Data Lake analytics is a distributed analytics service built on Apache YARN that compliments the Data Lake store. The analytics service can handle jobs of any scale instantly with on-demand processing power and a pay-as-you-go model that’s very cost effective for short term or on-demand jobs. It includes a scalable distributed runtime called U-SQL, a language that unifies the benefits of SQL with the expressive power of user code.

Azure HDInsight
Azure HDInsight is a full stack Hadoop Platform as a Service from Azure. Built on top of Hortonworks Data Platform (HDP), it provides Apache Hadoop, Spark, HBase, and Storm clusters.

References:


Hope you like it.

Posted by: MR. JOYDEEP DAS



21 comments:

  1. Good article which gives clear insight about it .

    ReplyDelete
  2. Joydeep da its a nice topic.

    ReplyDelete
  3. Good intro on Data Lake to start

    ReplyDelete
  4. Good introduction on Data Lake to beginners

    ReplyDelete
  5. Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!


    Microsoft Windows Azure Training | Online Course | Certification in chennai | Microsoft Windows Azure Training | Online Course | Certification in bangalore | Microsoft Windows Azure Training | Online Course | Certification in hyderabad | Microsoft Windows Azure Training | Online Course | Certification in pune

    ReplyDelete
  6. Thanks for the wonderful Post

    ReplyDelete
  7. As the name suggests,Data Lakeis a cloud-based repository that can store an almost limitless amount of data in its raw format, whether you have it in a database or as large files. Once the data is in the Lake, you should be able to access it using multiple tools and techniques, including .NET, Java, and Python. (Note: you should be able to do this, as Data Lake is still a preview service, but it's not yet open to the public).

    ReplyDelete
  8. Nice Article. Can you provide me some information on SQL Server Certification

    ReplyDelete
  9. Big data technologies come into play when there is a need to collect, filter, aggregate, store, analyze, and visualize large datasets. These datasets can be of any type (text, video, audio, or image) and can be structured or unstructured. The collection can be for any reason, although it is usually about business operations and marketing. Big data technologies include data integration, data cleansing, data mining, open data, big data storage, and big data analytics.

    ReplyDelete
  10. Thanks for this post...Elegant Training of SQL Server Course Dubai will help you gain a full understanding of this universal programming language. You'll start by learning key concepts and move on to more advanced topics as you progress through the lessons. By the end of the series, you'll have a solid working knowledge of SQL.
    Graphic Designing Course in Dubai
    best web Development courses Dubai
    Programming Languages Course Dubai

    ReplyDelete
  11. Thanks for this wonderful blog, Keep sharing your thoughts like this...
    Azure Training in Chennai
    Microsoft Azure Online Training

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Great Post. Very informative. Keep Sharing!!

    Apply Now Power for Azure Training in Noida

    For more details about the course fee, duration, classes, certification, and placement call our expert at 70-70-90-50-90

    ReplyDelete
  14. Nice article and a great way of learning about the topic in an easy language.
    Get all info on Quickbooks or regarding any error that might happen in system like Intuit Data Protect Has Stopped Working from the link in one place.

    ReplyDelete