Introduction
In this article we are trying to discuss about basic
U-SQL SELECT query Transformation Rowsets Technique. Hope it will be
informative.
What is U-SQL SELECT query Transformation Rowsets
In my previous article we are just retrieving the
data from “SearchLog.tsv” file to “SearchLog-scalar-variables.csv”.
It is just a simple file to file movement of data.
Now we have to think some extra operation on
the Row set data before storing it into final destination. For example: some filtration,
Grouping, aggregate function in numerical values etc.
When we are going to perform such kind of operation with
Row set before moving it to the destination, it is called SELECT Transformation
Rowsets.
Let’s Take a Simple Example to understand it
We are here taking the same example that we used previous
article to simply copy data from a file named “SearchLog.tsv” and store
it into “SearchLog-scalar-variables.csv”.
What we are doing in the Transformation part is, we are
just filtering the Row set region wise. We mean to say we are using a Boolean expression
in the WHERE clause of SELECT statement. Feeling quite comfortable ….
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "en-gb";
OUTPUT @rs1
TO "/output/SearchLog-transform-rowsets.csv"
USING Outputters.Csv();
Here in U-SQL Part the transformation part is
The WHERE clause contains a C# Boolean expression. We can use the C# expression language to do our own expressions
and functions. We can even perform more complex filtering by combining them
with logical conjunctions (ANDs) and disjunctions (ORs).
We can use More than One Transformation Together
In the previous example, we are using a single
Transformation by using a WHERE clause with region.
We can again use the previous filtered Row set for new Transformation.
We can do it multiple times.
Please look at the bellow example. First we are using
Transformation to filtering Region. Now we are using DateTime.Parse() method
and a conjunction.
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "en-gb";
@rs1 =
SELECT Start, Region, Duration
FROM @rs1
WHERE Start >= DateTime.Parse("2012/02/16")
AND Start <= DateTime.Parse("2012/02/17");
OUTPUT @rs1
TO "/output/SearchLog-transform-datetime.csv"
USING Outputters.Csv();
The second query is operating on the result of the first
rowset, which creates a composite of the two filters. We can also reuse a
variable name, and the names are scoped lexically.
Hope you like it.
Posted by: MR. JOYDEEP DAS
Thanks Joydeep Das... It is very informative...
ReplyDeleteThanks @ Pritam
Deletemeans we can overwrite the rowsets
ReplyDeletethanks joydeep
Thanks for providing such useful information, keep updating like this Azure Online Training
ReplyDelete