Sunday, 17 June 2018

SSIS Conditional Split with SPARK SQL


Introduction
Here in this article we are trying to make SSIS conditional Split Transform by using SPARK SQL. It is called Split Data Frame using Filter Transform.
Hope it will be interesting.

What We Want to Do
We have a Flat File Named Student Marks Details mentioned bellow.

101,Joydeep Das,Math,10
102,Deepasree Das,Math,89
103,Shipra Roy Chowdhury,Math,100
104,Rajesh Roy,Math,45
105,Sunita Tendon,Math,98
106,Amit Basu,Math,20
107,Debalina Bhattacharya,Math,99
108,Pritam Das,Math,40
109,Partha Deb Das,Math,89
110,Onkita Gupta,Math,100

The file contains Student Roll, Student Name, Subject and Marks. Based on Marks we need to display records

If [ Marks ] > 50
Display Records
If [ Marks ] < 50
Display Records

Scala Code
//---------------------------------------
// Scala for SPARK to Read Flat File
// Make Conditional Split Depends on Makes (Marks>50)
// Creation Date: 06/172018
//-----------------------------------------
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import spark.implicits._

case class Student(roll: Long, name: String, subject: String, marks: Long)

//Read FLAT File
val studentDF = spark.sparkContext.textFile("d:/spark/bin/examples/src/main/resources/studentmarksdetails.txt").map(_.split(",")).map(attributes => Student(attributes(0).trim.toInt, attributes(1).trim, attributes(2).trim, attributes(3).trim.toInt)).toDF()

studentDF.cache() // recommended to prevent repeating the calculation

val condition = col("marks") > 50 // Condition
val studentDF1 = studentDF.filter(condition)
val studentDF2 = studentDF.filter(not(condition))

//Making View for FLAT file
studentDF1.createOrReplaceTempView("studentrecord1")
studentDF2.createOrReplaceTempView("studentrecord2")

val youngstersDF1 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord1")
youngstersDF1.show

val youngstersDF2 = spark.sql("SELECT roll, name, subject, marks FROM studentrecord2")
youngstersDF2.show

Output





Hope you like it.

Posted by: MR. JOYDEEP DAS

2 comments: