Introduction
In this article we are trying to discuss about basic
U-SQL SELECT query Aggregation of Rowsets Technique. Hope it will be
informative.
Aggregation of Rowset
Here we are using example to understand Aggregation,
ORDER BY, GROUP BY and HAVING.
The bellow query display the total duration of each
region using Aggregation SUM().
DECLARE @outpref string = "/output/Searchlog-aggregation";
DECLARE @out1
string = @outpref+"_agg.csv";
DECLARE @out2
string = @outpref+"_top5agg.csv";
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region;
OUTPUT @rs1
TO @out1
ORDER BY TotalDuration DESC
USING Outputters.Csv();
Now we want to display Total duration of each reason and
with this we want top 5 durations in order.
DECLARE @outpref string = "/output/Searchlog-aggregation";
DECLARE @out1
string = @outpref+"_agg.csv";
DECLARE @out2
string = @outpref+"_top5agg.csv";
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region;
@res =
SELECT *
FROM @rs1
ORDER BY TotalDuration DESC
FETCH 5 ROWS;
OUTPUT @rs1
TO @out1
ORDER BY TotalDuration DESC
USING Outputters.Csv();
OUTPUT @res
TO @out2
ORDER BY TotalDuration DESC
USING Outputters.Csv();
Please note that, U-SQL
Rowsets do not preserve their order for the next query. Thus, to order an
output, we need to add ORDER BY to the OUTPUT statement.
Having Clause
Here we are using HAVING clause to restrict the output
effected by GROUP BY clause.
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@res =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region
HAVING SUM(Duration) > 200;
OUTPUT @res
TO "/output/Searchlog-having.csv"
ORDER BY TotalDuration DESC
USING Outputters.Csv();
Built-in Function in U-SQL
We can find all the built-in Function in U-SQL
Hope you like it.
Posted by: MR. JOYDEEP DAS
I just unearthed your glib blog and craved to say that I have truly delighted in perusing your extremely elegantly composed blog entries. I will be your regular guest, that is without a doubt.Cheap steroids online
ReplyDelete