Introduction
In this article we are trying to discuss about basic
U-SQL query construction Technique. Hope it will be informative.
First U-SQL Project Solution Explorer
View à Solution Explorer
The solution contains single U-SQL project. Here in this
example it is “Test” project. Inside that project is the empty Script.usql file.
Notice that we can expand the .usql file node. It
contains a C# file. This is called the U-SQL
Code-Behind file. U-SQL Code-behind is a convenience
feature of ADLToolsForVS and not a part of the U-SQL language itself.
Input and Output
Compiling U-SQL file required Input. U-SQL script
transforms Input into Output. There are two types of Input and Output.
·
Files
·
U-SQL
Table
Location of Input and Output
During U-SQL Cloud Execution the inputs/outputs must all
be in the cloud - typically this means Azure Data Lake Store.
During U-SQL Local Execution the inputs/outputs must all
be on your own box. There's a special name for this location: The U-SQL Local
Data Root.
We can find the local data root by going to Tools
à
Data Lake à
Options and Settings.
It in the field called DataRoot at the top.
Sample U-SQL Script
This script don’t have any transformation steps.
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO "/output/SearchLog-first-u-sql.csv"
USING Outputters.Csv();
This script doesn't have any
transformation steps. It reads from the source file called SearchLog.tsv,
schematizes it, and writes the rowset back into a file called
SearchLog-first-u-sql.csv.
Notice the question mark next to the data type in the Duration field.
It means that the Duration field could be null.
Here we have to understand Rowset, Extract and Output.
·
Rowset variables: Each query expression that produces a
rowset can be assigned to a variable. U-SQL follows the T-SQL variable naming
pattern (@searchlog, for example) in the script.
·
The EXTRACT keyword reads data
from a file and defines the schema on read. Extractors.Tsv is a
built-in U-SQL extractor for tab-separated-value files. We can develop custom
extractors.
·
The OUTPUT writes data from a
rowset to a file. Outputters.Csv() is a built-in U-SQL outputter to
create a comma-separated-value file. You can develop custom outputters.
The EXTRACT and OUTPUT statements use file paths. File
paths can be absolute or relative.
Here we can use scalar variable to make the script maintenance
easier.
DECLARE @in string = "/Samples/Data/SearchLog.tsv";
DECLARE @out string = "/output/SearchLog-scalar-variables.csv";
@searchlog =
EXTRACT UserId int,
Start
DateTime,
Region
string,
Query
string,
Duration
int?,
Urls
string,
ClickedUrls string
FROM @in
USING Extractors.Tsv();
OUTPUT @searchlog
TO @out
USING Outputters.Csv();
Hope you like it.
Posted by: MR. JOYDEEP DAS
Good article that help me to understand the basic of usql .... Waiting for next in this series... Thanks
ReplyDeleteWaiting for next article. Hope it will come as early as possible.
ReplyDeleteYes... I too layman of Azure... Its help me... to know basic of U-SQl...Waiting for next in this series...
ReplyDeleteIt is a good article for understanding basic U-Sql.
ReplyDelete