Sunday, 24 June 2018

Spark working with Unstructured data


Introduction
In my previous article with Spark, we worked with structure and semi structure data source. Here in this article we are trying to work with unstructured data source.
Hope it will be interesting.

Case Study
We have a note book and we want to find number of work count in it.



Scala Code

val dfsFilename = "D:/spark/bin/examples/src/main/resources/notebook.txt"

val text = sc.textFile(dfsFilename)
val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
counts.collect.foreach(println)

Output






Hope it will be interesting.

Posted by: MR. JOYDEEP DAS