Sunday 17 June 2018

SSIS Conditional Split with SPARK SQL


Introduction
Here in this article we are trying to make SSIS conditional Split Transform by using SPARK SQL. It is called Split Data Frame using Filter Transform.
Hope it will be interesting.

What We Want to Do
We have a Flat File Named Student Marks Details mentioned bellow.

101,Joydeep Das,Math,10
102,Deepasree Das,Math,89
103,Shipra Roy Chowdhury,Math,100
104,Rajesh Roy,Math,45
105,Sunita Tendon,Math,98
106,Amit Basu,Math,20
107,Debalina Bhattacharya,Math,99
108,Pritam Das,Math,40
109,Partha Deb Das,Math,89
110,Onkita Gupta,Math,100

The file contains Student Roll, Student Name, Subject and Marks. Based on Marks we need to display records

If [ Marks ] > 50
Display Records
If [ Marks ] < 50
Display Records

Scala Code
//---------------------------------------
// Scala for SPARK to Read Flat File
// Make Conditional Split Depends on Makes (Marks>50)
// Creation Date: 06/172018
//-----------------------------------------
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import spark.implicits._

case class Student(roll: Long, name: String, subject: String, marks: Long)

//Read FLAT File
val studentDF = spark.sparkContext.textFile("d:/spark/bin/examples/src/main/resources/studentmarksdetails.txt").map(_.split(",")).map(attributes => Student(attributes(0).trim.toInt, attributes(1).trim, attributes(2).trim, attributes(3).trim.toInt)).toDF()

studentDF.cache() // recommended to prevent repeating the calculation

val condition = col("marks") > 50 // Condition
val studentDF1 = studentDF.filter(condition)
val studentDF2 = studentDF.filter(not(condition))

//Making View for FLAT file
studentDF1.createOrReplaceTempView("studentrecord1")
studentDF2.createOrReplaceTempView("studentrecord2")

val youngstersDF1 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord1")
youngstersDF1.show

val youngstersDF2 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord2")
youngstersDF2.show

Output





Hope you like it.

Posted by: MR. JOYDEEP DAS

4 comments:

  1. Pengertian Correct Score Sbobet atau Tebak Skor adalah salah satu jenis taruhan yang sangat menguntungkan selain mix parlay. Dikarenakan perkalian Odds / Hadiah Taruhannya yang sangat besar sehingga banyak menarik perhatian para pecandu Judi Bola. Tak hanya itu, dari sistem bermain pun sangatlah gampang karena (Baca Selengkapnya Disini...)

    ReplyDelete
  2. Microsoft SQL Server 2019 Standard provides provides additional capability and improvements database features. like SQL Server database engine, SQL Server Analysis Services, SQL Server Machine Learning Services, SQL Server on Linux, and SQL Server Master Data Services. Microsoft SQL Server Standard can build rich content management applications

    ReplyDelete