Sunday, 17 June 2018

SSIS Conditional Split with SPARK SQL

Here in this article we are trying to make SSIS conditional Split Transform by using SPARK SQL. It is called Split Data Frame using Filter Transform.
Hope it will be interesting.

What We Want to Do
We have a Flat File Named Student Marks Details mentioned bellow.

101,Joydeep Das,Math,10
102,Deepasree Das,Math,89
103,Shipra Roy Chowdhury,Math,100
104,Rajesh Roy,Math,45
105,Sunita Tendon,Math,98
106,Amit Basu,Math,20
107,Debalina Bhattacharya,Math,99
108,Pritam Das,Math,40
109,Partha Deb Das,Math,89
110,Onkita Gupta,Math,100

The file contains Student Roll, Student Name, Subject and Marks. Based on Marks we need to display records

If [ Marks ] > 50
Display Records
If [ Marks ] < 50
Display Records

Scala Code
// Scala for SPARK to Read Flat File
// Make Conditional Split Depends on Makes (Marks>50)
// Creation Date: 06/172018
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import spark.implicits._

case class Student(roll: Long, name: String, subject: String, marks: Long)

//Read FLAT File
val studentDF = spark.sparkContext.textFile("d:/spark/bin/examples/src/main/resources/studentmarksdetails.txt").map(_.split(",")).map(attributes => Student(attributes(0).trim.toInt, attributes(1).trim, attributes(2).trim, attributes(3).trim.toInt)).toDF()

studentDF.cache() // recommended to prevent repeating the calculation

val condition = col("marks") > 50 // Condition
val studentDF1 = studentDF.filter(condition)
val studentDF2 = studentDF.filter(not(condition))

//Making View for FLAT file

val youngstersDF1 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord1")

val youngstersDF2 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord2")


Hope you like it.

Posted by: MR. JOYDEEP DAS