Introduction
In this
article we are trying to create a Scala program in IntelliJ and debug it
locally and then debug the same program within HDInsight spark cluster and
finally submit the Spark Job within HDInsight spark cluster remotely.
Steps Involved:
- From IntelleJ create a
application that can read data from Blob storage and write count output in
blob storage
- Run the file in HDInsight
(IntelleJ HDInsight submission)
- Create JAR file
- Upload JAR file into BLOB storage
- Install cURL utility for command
line sumbission
- Submit the SPARK job with cURL
command line.
Our HDInsight Spark
Cluster should be BLOB as Primary Storage. This Process NOT work if the
HDInsight Spark cluster Primary storage is Data Lake Storage (ADLS)
Download Link: