Saturday 10 February 2018

Azure Data Lake – U-SQL – SELECT Transformation Rowsets


Introduction

In this article we are trying to discuss about basic U-SQL SELECT query Transformation Rowsets Technique. Hope it will be informative.

 

What is U-SQL SELECT query Transformation Rowsets

In my previous article we are just retrieving the data from “SearchLog.tsv” file to “SearchLog-scalar-variables.csv”. It is just a simple file to file movement of data.

Now we have to think some extra operation on the Row set data before storing it into final destination. For example: some filtration, Grouping, aggregate function in numerical values etc.

 

When we are going to perform such kind of operation with Row set before moving it to the destination, it is called SELECT Transformation Rowsets.

Let’s Take a Simple Example to understand it

We are here taking the same example that we used previous article to simply copy data from a file named “SearchLog.tsv” and store it into “SearchLog-scalar-variables.csv”.

What we are doing in the Transformation part is, we are just filtering the Row set region wise. We mean to say we are using a Boolean expression in the WHERE clause of SELECT statement. Feeling quite comfortable ….

@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM "/Samples/Data/SearchLog.tsv"
    USING Extractors.Tsv();

@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "en-gb";

OUTPUT @rs1  
    TO "/output/SearchLog-transform-rowsets.csv"
    USING Outputters.Csv();

Here in U-SQL Part the transformation part is





The WHERE clause contains a C# Boolean expression. We can use the C# expression language to do our own expressions and functions. We can even perform more complex filtering by combining them with logical conjunctions (ANDs) and disjunctions (ORs).


We can use More than One Transformation Together

In the previous example, we are using a single Transformation by using a WHERE clause with region.
We can again use the previous filtered Row set for new Transformation. We can do it multiple times.

Please look at the bellow example. First we are using Transformation to filtering Region. Now we are using DateTime.Parse() method and a conjunction.

@searchlog =
    EXTRACT UserId          int,
            Start           DateTime,
            Region          string,
            Query           string,
            Duration        int?,
            Urls            string,
            ClickedUrls     string
    FROM "/Samples/Data/SearchLog.tsv"
    USING Extractors.Tsv();

@rs1 =
    SELECT Start, Region, Duration
    FROM @searchlog
WHERE Region == "en-gb";

@rs1 =
    SELECT Start, Region, Duration
    FROM @rs1
    WHERE Start >= DateTime.Parse("2012/02/16")
          AND Start <= DateTime.Parse("2012/02/17");

OUTPUT @rs1  
    TO "/output/SearchLog-transform-datetime.csv"
    USING Outputters.Csv();




The second query is operating on the result of the first rowset, which creates a composite of the two filters. We can also reuse a variable name, and the names are scoped lexically.



Hope you like it.




Posted by: MR. JOYDEEP DAS


4 comments:

  1. Thanks Joydeep Das... It is very informative...

    ReplyDelete
  2. means we can overwrite the rowsets


    thanks joydeep

    ReplyDelete
  3. Thanks for providing such useful information, keep updating like this Azure Online Training

    ReplyDelete