Sunday, 24 June 2018

Spark working with Unstructured data


Introduction
In my previous article with Spark, we worked with structure and semi structure data source. Here in this article we are trying to work with unstructured data source.
Hope it will be interesting.

Case Study
We have a note book and we want to find number of work count in it.



Scala Code

val dfsFilename = "D:/spark/bin/examples/src/main/resources/notebook.txt"

val text = sc.textFile(dfsFilename)
val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
counts.collect.foreach(println)

Output






Hope it will be interesting.

Posted by: MR. JOYDEEP DAS

3 comments:

  1. I love this blog . This is one of the best blog i ever seen. It's all about what i'm searching for. I love to read this blog again and again . Every time i enter this blog i get something new. This blog inspire me to write new blog. I write a blog name http://tutorialabc.com. It's about sql,c#,net etc

    ReplyDelete
  2. We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

    http://www.coepd.com/AnalyticsInternship.html

    ReplyDelete