Saturday 6 October 2018

Debugging and Remote Submission of Spark Job in HDInsight Spark Cluster


Introduction


In this article we are trying to create a Scala program in IntelliJ and debug it locally and then debug the same program within HDInsight spark cluster and finally submit the Spark Job within HDInsight spark cluster remotely.

Steps Involved:
  1. From IntelleJ create a application that can read data from Blob storage and write count output in blob storage
  2. Run the file in HDInsight (IntelleJ HDInsight submission)
  3. Create JAR file
  4. Upload JAR file into BLOB storage
  5. Install cURL utility for command line sumbission
  6. Submit the SPARK job with cURL command line.

Our HDInsight Spark Cluster should be BLOB as Primary Storage. This Process NOT work if the HDInsight Spark cluster Primary storage is Data Lake Storage (ADLS)


Download Link: