Introduction

In this article we are trying to create a Scala program in IntelliJ and debug it locally and then debug the same program within HDInsight spark cluster and finally submit the Spark Job within HDInsight spark cluster remotely.

Steps Involved:

From IntelleJ create a application that can read data from Blob storage and write count output in blob storage
Run the file in HDInsight (IntelleJ HDInsight submission)
Create JAR file
Upload JAR file into BLOB storage
Install cURL utility for command line sumbission
Submit the SPARK job with cURL command line.

Our HDInsight Spark Cluster should be BLOB as Primary Storage. This Process NOT work if the HDInsight Spark cluster Primary storage is Data Lake Storage (ADLS)

Download Link:

https://drive.google.com/file/d/1dMKsgH1dpYsMtgZu2tzhEinxtxGS3d2d/view?usp=sharing