Saturday 6 October 2018

Debugging and Remote Submission of Spark Job in HDInsight Spark Cluster


Introduction


In this article we are trying to create a Scala program in IntelliJ and debug it locally and then debug the same program within HDInsight spark cluster and finally submit the Spark Job within HDInsight spark cluster remotely.

Steps Involved:
  1. From IntelleJ create a application that can read data from Blob storage and write count output in blob storage
  2. Run the file in HDInsight (IntelleJ HDInsight submission)
  3. Create JAR file
  4. Upload JAR file into BLOB storage
  5. Install cURL utility for command line sumbission
  6. Submit the SPARK job with cURL command line.

Our HDInsight Spark Cluster should be BLOB as Primary Storage. This Process NOT work if the HDInsight Spark cluster Primary storage is Data Lake Storage (ADLS)


Download Link:



2 comments:

  1. Pengertian Correct Score Sbobet atau Tebak Skor adalah salah satu jenis taruhan yang sangat menguntungkan selain mix parlay. Dikarenakan perkalian Odds / Hadiah Taruhannya yang sangat besar sehingga banyak menarik perhatian para pecandu Judi Bola. Tak hanya itu, dari sistem bermain pun sangatlah gampang karena (Baca Selengkapnya Disini...)

    ReplyDelete
  2. Microsoft SQL Server 2019 Standard provides provides additional capability and improvements database features. like SQL Server database engine, SQL Server Analysis Services, SQL Server Machine Learning Services, SQL Server on Linux, and SQL Server Master Data Services.

    ReplyDelete