Microsoft SQL Server Knowledge Bank: Azure Data Lake

Wednesday, 7 February 2018

Azure Data Lake – U-SQL Script Construction

Introduction

In this article we are trying to discuss about basic U-SQL query construction Technique. Hope it will be informative.

First U-SQL Project Solution Explorer

View à Solution Explorer

The solution contains single U-SQL project. Here in this example it is “Test” project. Inside that project is the empty Script.usql file.

Notice that we can expand the .usql file node. It contains a C# file. This is called the U-SQL

Code-Behind file. U-SQL Code-behind is a convenience feature of ADLToolsForVS and not a part of the U-SQL language itself.

Input and Output

Compiling U-SQL file required Input. U-SQL script transforms Input into Output. There are two types of Input and Output.

· Files

· U-SQL Table

Location of Input and Output

During U-SQL Cloud Execution the inputs/outputs must all be in the cloud - typically this means Azure Data Lake Store.

During U-SQL Local Execution the inputs/outputs must all be on your own box. There's a special name for this location: The U-SQL Local Data Root.

We can find the local data root by going to Tools à Data Lake à Options and Settings. It in the field called DataRoot at the top.

Sample U-SQL Script

This script don’t have any transformation steps.

@searchlog =

EXTRACT UserId int,

Start DateTime,

Region string,

Query string,

Duration int?,

Urls string,

ClickedUrls string

FROM "/Samples/Data/SearchLog.tsv"

USING Extractors.Tsv();

OUTPUT @searchlog

TO "/output/SearchLog-first-u-sql.csv"

USING Outputters.Csv();

This script doesn't have any transformation steps. It reads from the source file called SearchLog.tsv, schematizes it, and writes the rowset back into a file called SearchLog-first-u-sql.csv.

Notice the question mark next to the data type in the Duration field. It means that the Duration field could be null.

Here we have to understand Rowset, Extract and Output.

· Rowset variables: Each query expression that produces a rowset can be assigned to a variable. U-SQL follows the T-SQL variable naming pattern (@searchlog, for example) in the script.

· The EXTRACT keyword reads data from a file and defines the schema on read. Extractors.Tsv is a built-in U-SQL extractor for tab-separated-value files. We can develop custom extractors.

· The OUTPUT writes data from a rowset to a file. Outputters.Csv() is a built-in U-SQL outputter to create a comma-separated-value file. You can develop custom outputters.

The EXTRACT and OUTPUT statements use file paths. File paths can be absolute or relative.

Here we can use scalar variable to make the script maintenance easier.

DECLARE @in string = "/Samples/Data/SearchLog.tsv";

DECLARE @out string = "/output/SearchLog-scalar-variables.csv";

@searchlog =

EXTRACT UserId int,

Start DateTime,

Region string,

Query string,

Duration int?,

Urls string,

ClickedUrls string

FROM @in

USING Extractors.Tsv();

OUTPUT @searchlog

TO @out

USING Outputters.Csv();

Hope you like it.

Posted by: MR. JOYDEEP DAS

4 comments:

HealthWealth10 February 2018 at 09:52
Good article that help me to understand the basic of usql .... Waiting for next in this series... Thanks
ReplyDelete
Replies
Unknown10 February 2018 at 11:41
Waiting for next article. Hope it will come as early as possible.
ReplyDelete
Replies
Unknown11 February 2018 at 08:40
Yes... I too layman of Azure... Its help me... to know basic of U-SQl...Waiting for next in this series...
ReplyDelete
Replies
Unknown23 February 2018 at 11:22
It is a good article for understanding basic U-Sql.
ReplyDelete
Replies