Wednesday, 21 February 2018

Azure Data Lake – U-SQL – Using Wildcard or Virtual Columns


Introduction

In this article we are trying to discuss about Wildcard character and Virtual Columns.

 

Scenario

We have two CSV files named “ItemDetails_1.csv” and “ItemDetails_2.csv”. Both CSV file have Item code, Item Name and Quantity.

We want to grouping by Item Code and Item Name and what the Summation of Item Quantity.




Please note that here the input file name is mostly similar. So, we are using wildcard character or Virtual Columns.

 

Using Wildcard Character

 

@searchlog =
    EXTRACT ItemCode     int,
            ItemName     string,
            Qty          int
    FROM "C:/Users/Joydeep/AppData/Local/USQLDataRoot/Input-1/{*}.csv"
    USING Extractors.Csv();

@filtering =
    SELECT ItemCode,
           ItemName,
           SUM(Qty) AS Total
    FROM @searchlog
    GROUP BY ItemCode, ItemName;

OUTPUT @filtering 
    TO "C:/Users/Joydeep/AppData/Local/USQLDataRoot/output/Output-1/Result.csv"
USING Outputters.Csv();

It takes any CSV file as input from location
C:/Users/Joydeep/AppData/Local/USQLDataRoot/Input-1/

Job Graph:



 

Using Virtual Columns

@searchlog =
    EXTRACT ItemCode     int,
            ItemName     string,
            Qty          int,
            filename     string
    FROM "C:/Users/Joydeep/AppData/Local/USQLDataRoot/Input-1/{filename:}.csv"
    USING Extractors.Csv();

@filtering =
    SELECT ItemCode,
           ItemName,
           SUM(Qty) AS Total
    FROM @searchlog
    GROUP BY ItemCode, ItemName;

OUTPUT @filtering 
    TO "C:/Users/Joydeep/AppData/Local/USQLDataRoot/output/Output-1/Result.csv"

    USING Outputters.Csv();

 

Job Graph:



 

 

 

Hope you like it.

 

 

Posted by: MR. JOYDEEP DAS


1 comment: