Thursday, March 27, 2014

SAS in Mainframes(z/Os) Tutorial with xamples - Part 2 ( Creating csv/excel file from mainframe dataset using SAS)

1. We will see how we can Merge two or more  input files in SAS and routing it to one output dataset and USE the same dataset to prepare a report in excel format
Creating .xls file on Z/os
(Refer to  Previous  posts to know basic steps in sas)
Lets take two input file, RXX.TEST.FILE3 and RXX.TEST.FILE4 with the fields
CITY,DATE,STATE,AMT in FILE3 and CITY,DATE,STATE,BANK in FILE4.We want to merge both the files so that the output contains CITY DATE STATE BANK.For achieving this, we need have atleast one common field in both the files based on which we can join these two datasets. We will be joining based on CITY.So the steps should be as follows.
Step1.Create the SAS dataset from input file 3 and SORT it on the key field
Step2.Create the SAS dataset from input file 4 and SORT it on the key field
Step3.create a new SAS dataset Using  the MERGE keyword in SAS along with the key field and finally
Step4. take the fields which we need

//SAS01     EXEC SAS                                
//POLIN    DD DSN=RXX.TEST.FILE3,DISP=SHR

//POLIN2   DD DSN=RXX.TEST.FILE4,DISP=SHR
//OUTFILE   DD DSN=RXX.TEST.FILEOUT,DISP=(,CATLG),
//             SPACE=(TRK,(20,20),RLSE),LRECL=180,RECFM=FB
//WORK      DD SPACE=(CYL,(50,10),RLSE)                   
//SYSIN     DD *   

OPTION NOCENTER;                   
OPTION SORTLIB='';                 
  DATA POLIN;                      
   INFILE POLIN;                   
   INPUT @01 CITY   $CHAR02.       
         @06 DATE   $CHAR08.       
         @14 STATE  $CHAR02.       
         @16 AMT    COMMA9.2;      
   PROC SORT DATA=POLIN NODUPS;    
     BY CITY;                                                          
  DATA POLIN2;                     
   INFILE POLIN2;                  
   INPUT @01 CITY   $CHAR02.       
         @06 DATE   $CHAR08.       
         @14 STATE  $CHAR02.       
         @16 AMT    COMMA9.2       
         @25 BANK   $CHAR5;        
                                   
   PROC SORT DATA=POLIN2 NODUPS;   
     BY CITY;                      
  DATA COMMON;                     
  MERGE POLIN(IN=D1) POLIN2(IN=D2);
  BY CITY;                         
  IF D1 AND D2 THEN OUTPUT;        
  PROC PRINT DATA=COMMON;          
     VAR CITY DATE STATE BANK;     
 RUN;                              
 DATA _NULL_;                           
    SET COMMON;                         
    FILE OUTFILE;                       
    PUT CITY ',' DATE ',' STATE ',' BANK;
RUN;                                    
Here goes the output for the same:
***********************
CA ,20130320 ,WB ,BANK1
CA ,20130120 ,TN ,BANK6
CA ,20130120 ,TN ,BANK6
CA ,20130320 ,KA ,BANK6
MI ,20130120 ,KA ,BANK1
MI ,20130320 ,AP ,BANK1
RR ,20130120 ,AP ,BANK8
************************

Why Do we use DATA _NULL_ in SAS ?  This simply is used when we want to make a report.
_NULL_  is a SAS keyword which does not create any SAS dataset.

2. Creating the excel report / CSV file from the mainframe dataset.
To add column headings in SAS to be used in excel sheet, we can use the DATA _NULL_ statement as well.
FILE OUTFILE  DLM=',';
IF _N_=1 THEN DO;
 PUT     'CITY,' 
         'DATE,'
         'STATE,' 
         'BANK'

;
END;

PUT  CITY 
     DATE
     STATE
     BANK
;

To Create a CSV File from a mainframe Dataset we can use the same above code with The delimiter option. DLM=','.(Imp point to remember. Delimeter is the key in creating .xls file)
 (If you remember we need to use delimited option in excel to prepare a formatted report from notepad. DLM option in SAS takes care of that ).
The line of code _N_=1 has special significance. We will check it later. However you can try running the program without using the specific line and see what happens.:)
So we  can download the dataset from command shell (option 6) in ISPF and use 'Receive from Host' option. Save the File in .csv format.
Or otherwise  put one FTP step (where u want to put the report) after the mainframe DATASET is created and save the file in filename_youwant.csv. No need to create a text file and convert it into excel sheet. The FTP location will contain the .xls file and ready to use!!

Sunday, March 23, 2014

SAS in Mainframes(z/Os) Tutorial with xamples - Part 2

We have seen how the data step and Proc functions in the Part 1 of this SAS  blog. We need to remember that the sas datasets or variables created in One Data steps remains defined only to that step unless we specify some condition, using which we can refer to the SAS dataset created in prior steps.In the example below,we are creating the sas Dataset RYAN in the first step.The input file being used in POLIN. Using the statement  'PROC PRINT DATA=RYAN; '  we are printing the output in spool.
Using SET keyword in SAS,in DATA RYAN2 we are referring to the dataset created in the first step and printing the same data in step RYAN2;  SET is the keyword in SAS.
  DATA RYAN;                              
   INFILE POLIN;                           
   INPUT @01 CITY   $CHAR02.

         @06 DATE   $CHAR08.
         @14 STATE  $CHAR02.
         @16 AMT    COMMA9.2;
   PROC SORT DATA=RYAN NODUPS;            
     BY CITY;                     
   PROC PRINT DATA=RYAN;      

                                     
   DATA RYAN2;                            
   SET RYAN; /* this is referring to the dataset created  above*/

   TITLE "I AM SHOWING SAME DATA OF RYAN";
   PROC PRINT DATA=RYAN2;                 
 RUN;                                      
//*

Output:
Obs    CITY      DATE      STATE       AMT   
 1      CA     20130320     WB      100000.55
 2      CA     20130320     WB      200000.55
 3      CA     20130120     TN      500000.55
 4      CA     20130320     KA      600000.55
 5      MI     20130120     KA      300000.55
 6      MI     20130320     AP      400000.55
 7      RR     20130120     AP      700000.55
 8      ST     20130320     TN      800000.55


I AM SHOWING SAME DATA OF POLIN             
Obs    CITY      DATE      STATE       AMT  
 1      CA     20130320     WB      100000.55
 2      CA     20130320     WB      200000.55
 3      CA     20130120     TN      500000.55
 4      CA     20130320     KA      600000.55
 5      MI     20130120     KA      300000.55
 6      MI     20130320     AP      400000.55
 7      RR     20130120     AP      700000.55
 8      ST     20130320     TN      800000.55


Continuing from the program above, we can add more datasets or add more data and carry on with creating the reports.

Friday, March 21, 2014

SAS in Mainframes(z/Os) Tutorial with xamples

To know the SAS Basics, check  Chapter 1  . It demonstrates the very basic working principle of sas.
With your understanding of  the basics in sas, we will start the SAS in mainframe (Z/OS) environment .
The very first thing to know:  Turning Raw Data Into Information is what SAS is all about !!!!

This is the basic principle of how SAS works. The raw data(input file in JCL) is read into SAS through INFILE keyword.This has the same name as the DD name in the JCL.
Once the file is read, the next step is to structure/format the data in SAS Dataset through INPUT keyword as explained below.
 
DATA RYAN;
  INFILE POLIN ;
  INPUT @10  FNAME    $CHAR10.
              @24  ACCTNO   $CHAR05;


Here the keyword DATA  implies the starting of DATA step. RYAN is the name of the Data step.It can be any name.
INFILE POLIN; This is where the raw data is read by sas. POLIN is the JCL DD name for the input.
INPUT @ FNAME $CHAR10...; These statements structures the input dataset read above(here POLIN) and creates a dataset which is internal to SAS(commonly called the SAS data set). Internally SAS would be using this data structure created.
So here the SAS dataset contains FNAME (10 bytes) which is @ 10th position in input file and ACCTNO(5 bytes) which is @ 24th position in input.It wont consider the other records which might be present in the input dataset.
After that we manipulate the data as per our needs and do various functions.In all SAS programs only two steps are of utmost importance namely the DATA step and PROC step.

SAS Syntax Rules:  (A few very handy rules to make life simple!)Can begin and end in any column.
One or more blanks or special characters can be used to separate words.
A single statement can span multiple lines.
Several statements can be on the same line.
/* to begin a comment and */ to end it
 A SAS program generally Mostly comes with the default installation of Z/OS just like DFSORT.Only u need to know the library.
Lets start with few of the basic inbuilt functions and see the output in spool.No input file is required here.Lets see the DATE Function in SAS
//X15122RY  JOB (10,&SYSUID),'RYAN',CLASS=T,

//    MSGCLASS=V,NOTIFY=&SYSUID
//*                                   
//SAS01     EXEC SAS
//WORK      DD SPACE=(CYL,(50,10),RLSE)
//SYSIN     DD *                                            
OPTION NOCENTER;                                                  
OPTION SORTLIB='';
  DATA RYAN;       
    THISYEAR = YEAR(TODAY());           
    THISMONTH= MONTH(TODAY());          
    THISDAY  = DAY(TODAY());            
    LASTYEAR = YEAR(TODAY()) - 1;       
    DATE = TRIM(LEFT(THISYEAR))|| '1101';
  PROC PRINT DATA= RYAN;
   VAR THISYEAR LASTYEAR THISMONTH DATE THISDAY
//*

Output:
The SAS  System
Obs    THISYEAR    LASTYEAR    THISMONTH      DATE      THISDAY
 1       2013        2012          6        20131101       20   

Looks Cool!
Now , as we can see the value of  THISMONTH is 6. What if we want to get and display like 06
So we need to modify the date function output to get two digits output .
THISMONTH= PUT(MONTH(TODAY()),Z2.);
HDR = TRIM(LEFT(THISYEAR))|| PUT(LASTMONTH,Z2.);
Explanation:
Option NOCENTER; is SAS statement which aligns the output. Lets not be bothered about that.
We can see the data step starts with DATA RYAN;
Since no input file is used here, so we do not have infile and input statements here.
Today() is a function in SAS. It can be used with various combinations to give us the dates we want.
PROC PRINT DATA=RYAN is the proc step. We are using the keyword VAR to include or select the variables from  the data step to print.

Lets go to the Next step will be to add an input file and print its contents.
//SAS01     EXEC SAS
//POLIN    DD DSN=I15122.TEST.FILE,DISP=SHR <== input dataset
//WORK      DD SPACE=(CYL,(50,10),RLSE)
//SYSIN     DD *                                            
OPTION NOCENTER;                                                  
OPTION SORTLIB='';

DATA RYAN;       
  INFILE POLIN;
  INPUT @01 CITY   $CHAR02.
        @06 DATE   $CHAR08.
        @14 STATE  $CHAR02.
        @16 AMT    COMMA9.2;
  PROC PRINT DATA= RYAN;

This is how the input looks like.
Input File:
----+----1----+----2----+-
**************************
CA00020130320WB100000.55 
CA00020130320WB200000.55 
MI00020130120KA300000.55 
MI00020130320AP400000.55 
CA00020130120TN500000.55 
CA00020130320KA600000.55 
RR00020130120AP700000.55 
ST00020130320TN800000.55 
**************************

Output:
The SAS System
Obs    CITY      DATE      STATE       AMT  
 1      CA     20130320     WB      100000.55
 2      CA     20130320     WB      200000.55
 3      MI     20130120     KA      300000.55
 4      MI     20130320     AP      400000.55
 5      CA     20130120     TN      500000.55
 6      CA     20130320     KA      600000.55
 7      RR     20130120     AP      700000.55
 8      ST     20130320     TN      800000.55


The fist column is inserted by SAS in the output called observation column,ie number of rows processed.Use of NOOBS in the proc statement will suppress the First column.
If we would run it with NODUPKEYS with Key on the CITY, then it would remove all the duplicate key values.The below code shows how PROC SORT can be used to sort out the data and filter out duplicate.. We need to give one key value if we use NODUPKEYS. If we do not give we will get the below error.
ERROR: No BY statement used or no BY variables specified. A BY statement must be used with variable names to sort on.
DATA RYAN;       
  INFILE POLIN;
  INPUT @01 CITY   $CHAR02.
        @06 DATE   $CHAR08.
        @14 STATE  $CHAR02.
        @16 AMT    COMMA9.2;
  PROC SORT DATA= RYAN NODUPKEYS;

       BY CITY;
     PROC PRINT DATA= RYAN
Output:
Obs    CITY      DATE      STATE       AMT
 1      CA     20130320     WB      100000.55
 2      MI     20130120     KA      300000.55
 3      RR     20130120     AP      700000.55
 4      ST     20130320     TN      800000.55

How ever we can also use NODUPS; This will check the entire observation and filter out only if entire observation is duplicate.

Limiting the number of observations in SAS
Suppose You have a input of million records , but you want to proceed or test your code with 10 records. In this scenario, it is advisable to use OBS=number-of-records  along with the infile statement. This will restrict the number of records to 10
DATA RYAN;       
  INFILE POLIN OBS=10 ;
....

Using a simple IF loop and THEN OUTPUT in SAS
lets add the below line before we print the SAS data using PROC PRINT.
IF(( CITY NE 'RR') OR (CITY='SS')) THEN OUTPUT;
PROC PRINT DATA=RYAN;

It wil print all the records except the observation with city value of RR.

Next,Lets do some more modifications to make the output dataset more meaningful:
Say, we want to add one more column which will tell us the city name in full, ie for the observation where the CITY is MI, it should say MICHIGAN.
 So we need to add one more variable which will be added in the output. 
INFILE POLIN;
  INPUT @01 CITY   $CHAR02.
        @06 DATE   $CHAR08.
        @14 STATE  $CHAR02.
        @16 AMT    COMMA9.2;

LENGTH CITY_NAME $20;
IF CITY = 'RR' THEN CITY_NAME ='ROCK VINE' ;
ELSE                                        
IF CITY = 'MI' THEN CITY_NAME ='MICHIGAN' ; 
ELSE                                        
IF CITY = 'CA' THEN CITY_NAME ='CALIFORNIA'; 

     PROC PRINT DATA= RYAN;
       VAR CITY DATE STATE AMT CITY_NAME; 
Here goes the output:

CITY      DATE      STATE       AMT       CITY_NAME
 CA     20130320     WB      100000.55    CALIFORNI
 CA     20130320     WB      200000.55    CALIFORNI
 MI     20130120     KA      300000.55    MICHIGAN 
 MI     20130320     AP      400000.55    MICHIGAN 
 CA     20130120     TN      500000.55    CALIFORNI
 CA     20130320     KA      600000.55    CALIFORNI
 RR     20130120     AP      700000.55    ROCK VINE
 ST     20130320     TN      800000.55             


Why did we add the line LENGTH CITY_NAME $20; ???. Ideally our job would have run without that part also, but the City name would have been truncated. Using that line we are making sure it takes 20 bytes and no truncation happens.!
To be continued.....  SAS in Mainframes(z/Os)  - Part 2  !! drop a note if u liked it.

Thursday, March 20, 2014

Working towards a better Index - How DB2 index affects performance ? DB2 PERFORMANCE TUNING - PART 2

In the last post db2-performance-tuning we saw how adding a column into an index can improve DB2 performance. To refresh a bit, DB2 index is nothing but ordered set of values with one or more pointers called RID(record Ids) that indicate where those values can be found on a table.
So if a table have 3 indexes X1,X2,X3 ,the optimizer will choose that index (say X3) which will have all the columns needed in the query. In this case, there is no need to access the table at all. This type of access is called INDEX ONLY ACCESS.  But,then again,if putting everything into index solves the problem, then life would have been simple.:).
Indexes aren't free. The SQL insert, delete and update cause maintenance to the index when an indexed column is modified. So,While Determining the Access paths involving indexes, the optimizer comes up with these choices:
1. Will Index be used to satisfy the query ?
      a)  YES: Implies an index scan is selected
      b)  NO :  Implies only table scan is selected.
YES does not necessarily mean DB2 will scan or read index only. It means Index will be involved somehow in fetching the query result
2. Is INDEX SCAN sufficient for the query ?
     a) YES : means, only index scan is selcted. No involvement of the DB2 table
     b)  NO:  means both the index and table will be used.
With this comes the concept of matching column (MATCHCOLS=X) in the PLAN Table.In short,matching columns are nothing but all of those columns in the where predicate of the SQL query which are present in the Index being selected by optimizer.
So,how to find out Matching columns ? There are some basic rules which are used by optimizer to figure out the matching columns.
RULE 1
We can look at the index from left to right and compare the predicates in our query like below:
Example 1: Suppose we have defined INDEX on STATE,CITY in EMP table.(Point to Note  !!)

Written in blue are the SQL queries for which the matching columns will be examined.

Select * from EMP where STATE='AR' and CITY='AUR'
Here the MC will be 2 because the predicates STATE and CITY are present in the index and ordering from left to right.
Example 2: Select * from EMP where STATE='AR' and PIN=200
Here MC=1 
Example 3: Select * from EMP where CITY='AUR' and PIN=200
Here MC=0 

RULE 2
At most one IN-list predicate can be a matching predicate on an index. (ACCESSTYPE=N)
An IN-list index scan is a special type of matching index scan. At most one IN-list predicate can be a matching predicate on an index.
Example1: Select * from EMP where STATE='AR' and CITY  IN ('AUR',AUX')
Here MC = 2. As we see atmost one IN predicate is used.
Example2: Select * from EMP where STATE IN ('AR',CK') and CITY  IN ('AUR',AUX')
Here MC=1
As a generic rule, we want matching columns,ie MATCHCOLS to be on a higher side.
For a better design of index ,we evaluate placing the query predicate columns in the index, starting with the most unique column first, followed by the second most unique column and so on

What is a DB2 prefetch ? DB2 PERFORMANCE TUNING - PART 3

The concept of PREFETCH: What exactly is a DB2 prefetch ?
We know before executing a query ,DB2 determines its access path during bind time and writes it on to the plan table.However there can be cases when DB2 decides that there is even a better way to access the data during run time, after analyzing the way the data is accessed. Prefetching is one such way to enhance the access strategy dynamically during Run time.  (Note: Its run time, not bind time!!!)
The concept is to read a set of pages,generally 32 pages into buffer pool with only one asynchronous I/O operation with the expectation that they will be required by an application in future. In this way,it can save CPU and IO cost in a huge amount.  Elimination or reduction of time waits in DB2 from  the application perspective  is what prefetch is about. There are several ways in which prefetching can be done.
Types of prefetching : 
The PREFETCH column in PLAN_TABLE will contains one of these values: S, D, L or blank.

1. Sequential Prefetch: (Value of 'S' in PLAN_TABLE)  Sequential prefetch  brings consecutive data pages in chunks into Buffer pool instead of one page at a time. At Bind time DB2 sets PREFETCH=S in PLAN table whenever sequential prefetch is chosen for selected access path. 
Sequential prefetch is generally chosen when the only way to access the data  is by doing a Table Space Scan & Index Scans.
Sequential prefetch is activated if a number of data or index pages are accessed in sequence.One simple technique to enable a sequential prefetch is " not to use any predicate in query". However this is not recommended. Check the below query :


Query like 'SELECT CITY FROM TEST.EMPLOYEE' invokes sequential prefetch
 
 Obviously without a predicate, table space scan will be selected by optimizer  and will most definitely use Sequential prefetch.Use of ORDER BY, Range predicates like 'between', '< 'than, '> ' than encourages the use of sequential prefetch.
We can define the number of pages the database manager will prefetch, using the PREFETCHSIZE clause while creating the Table space or altering the table space.
Thus reading several pages in one shot will reduce the IO overhead for our application and improve its performance. 

2. Dynamic Prefetch: (value of 'D' in PLAN_TABLE) DB2 v9 uses Dynamic prefetch in almost all situation except table space scans when it uses sequential prefetch. We can say DB2 perform the sequential prefetch dynamically at execution time. It reduces paging and improve performance especially for access to data that might be arranged sequentially for some set of pages but scattered randomly on other pages. At run time, dynamic prefetch might or might not be used.
It uses Sequential detection algorithm to detect the future reading of pages.The most recent eight pages are tracked, and data access is determined to be sequential if more than four of the last eight pages are page-sequential.
 With dynamic prefetch, DB2 can automatically adjust between multi page prefetch and single page .
 This information is stored till commit point,ie if we had used  RELEASE(COMMIT) in bind card. To store this information beyond commit point we need to use RELEASE(DEALLOCATE).

3. List Prefetch: (value of 'L' in PLAN_TABLE). This is the way to access the data pages effectively even when data pages are not consecutive.
This is used when the optimizer sees ,at bind time; that skip sequential processing would be efficient. The fetched data pages need not be contiguous.
How List prefetch works?:  We all know Index contains key values from data pages and RID(record id) which contains the page number where the record is physically located. Rather than accessing the rows in the table directly from index, DB2 at times decides to build a RID in its own memory list. This list will be sorted and merged with another RID list
Example:
Select Fname, Lname from EMP
where City = 'AR' ;
Now when the optimizer decides that RID list should be used with this query,it would gather the RIDs of the matching rows into its memory:
4,34
1,17
2,11.
This RID list would be sorted into page number and slot number order.
1,17
2,11
4,34
When DB2 access physical pages, it will sequentially start with page 1, the sequentially access page2.
It will skip page 3, and again sequentially access and read Page 4 and so on.
Note: List Prefetch may fail when DB2 finds that the RID list is large. In that case, the access path will be changed to table scan.

4.  Prefetch = BLANK: When DB2 optimizer does not anticipate ever needing to use prefetch operations.However at run time,sequential detection is still employed.

Friday, March 14, 2014

Report Generation from DB2 Table on Z/Os using QMF EXPORT command

Many times we need to run queries in QMF. The steps below states how to generate report from QMF using EXPORT command.
Step1. Logon to QMF and run the query . Suppose we take a sample query like below
SELECT EMPNO,FNAME,LNAME
FROM EMPLOYEE                        
WHERE SALARY >= 10000
Once we run the query we will get the output in QMF.

EMPNO FNAME  LNAME
----------  ----------  ------------
0001      RYAN      ORLANDO
0002      MIKE      DUTCHER
0003      EN           BOSHER  


Step2.   Down below give the command  'EXPORT REPORT’  as shown below in the command line
EXPORT REPORT TO 'R0XXX.JCL.REPORT ' and press ENTER.
where R0XXX.JCL.REPORT is the dataset we want the QMF data to be put in.
Once done, we will get once notification indicating that the export was successful.
OK, REPORT was  exported to 'R0XXX.JCL.REPORT' . 

Step 3.   Report would contain some extra fields. We need to delete them





Then use the block command   ((16  and ((   to bring format the left alignment and you r done with ur report in ur dataset.

Wednesday, March 12, 2014

DB2 Performance Issues and Tuning - part 1

There are many reasons as to why DB2 might perform poorly. Some of the most obvious reasons are
1.Inadequate Index : There might be non-matching index as compared to the select statement in query.Explained below ,how placing proper columns in index avoids DB2 sort when ORDER BY clause is present. Later we will check here how to improve index effectively to optimize query.
2.Lock Waits : Application might be waiting for  long time before acquiring lock on a resource which is currently being held by other processes
3. Wrong Clustering Sequence:  A wrong index was chosen as a clustering sequence.As a result there are unnecessary sorts in application.
4.Environmental issues: The size of buffer pool, Disk cache,and others can also impact the query response time.
5. Runstats Out-of-Date:  Runstats generally gathers information about the table space,partitions,index, index spaces.We need these in designing the databases.These information should be updated so as to gain efficient usage of the DB2 resources while designing a new database.Optimizer uses this information in building efficient access path.

There can be many more reasons for poor DB2 performance.Lets see how DB2 performance tuning can be done step by step with simple examples.

1.Effective Index can upgrade the Query response time:

Suppose we have one table CUST having the columns CUSTNO, FNAME,LNAME,CITY.  We  defined index(X1) on the column CITY.
We run a simple query as below:
SELECT LNAME,CUSTNO from CUST 
where FNAME=:FNAME 
            and CITY=:CITY 
order by LNAME.

While executing this, DB2 first performs a SORT and materialize the whole query when the cursor is OPENED because ORDER BY clause is present with LNAME.This sort operation because of the 'order by' clause can take long time when data is huge.
To prevent DB2 from sorting the data, what we can do is declare a new index X2 on the columns CITY,LNAME or modify the current index and add the column LNAME to it, since cursor contains 'ORDER BY LNAME'. Now Optimizer will see the data in requested order without a sort.
So now,DB2 will materialize the query during the FETCH time, ie the Disk I/O s will take place during the fetch time  and not during Open time because results will be in requested order without a sort.

So,How long does it take to run a SQL Statement? How is the response time calculated?
There are many components that contribute to the response time like network time,Disk IO time,cpu time, lock wait etc....
The major components that constitute the Local Response time(LRT)  are the the time spent on performing the IO operations(from disk, disk cache, buffer pool ), processing time in CPU, and wait times.(certain other items like package loading, authorization are insignificant)
LRT can be calculated with VQUBE2 method.(Very quick upper bound estimate v2). We will check that in other posts.
So here we see that there are two indexes.Now the optimizer can choose one index and fix the access path. We can view which Access path is chosen by optimizer with the SQL EXPLAIN.

So,How to use SQL EXPLAIN and PLAN_TABLE to view the access path chosen by DB2?
We need to wrap the SQL EXPLAIN statement around the targeted SQL Query. The Optimizer  writes the access path to the PLAN_TABLE. This PLAN_TABLE is a shared table, or every user can have their own schema of the plan table.
Some of the important columns in the PLAN_TABLE which provides us the desired information are
1.QUERYNO
2.ACCESSTYPE
3.MATCHCOLS
4.ACCESSNAME
5.INDEXONLY
6.SORTx_ORDERBY
7.PREFETCH

Step 1. The SQL statement in blue is the main Query for which we want to know the access path.
So we wrap the statement with the 'EXPLAIN ALL SET QUERYNO = 1 FOR'  like below

EXPLAIN ALL SET QUERYNO = 1 FOR
SELECT CUSTNO, CUSTLNAME                           
FROM CUST                                                            
WHERE CUSTNO LIKE '%0A';

Once we execute the above query, optimizer first writes the access path onto the Plan Table and then gives the output. Step below depicts how we can get the information from plan table

Step 2. Below statement queries the plan table using the QUERYNO used above in step1.

SELECT QUERYNO,METHOD,ACCESSTYPE,MATCHCOLS,
ACCESSNAME,INDEXONLY,PREFETCH
FROM PLAN_TABLE
WHERE QUERYNO = 1;


Brief explanation of the output columns of the the PLAN Table with its possible values: 
QUERYNO:  Gives the Queryno which we set in step1
ACCESSTYPE:  This is of much importance. Value of  'I' indicates data fetching is done by using an Index.Value of 'R' indicates Table scan or Range Scan.
MATCHCOLS :  It indicates the number of matching columns in the index.
ACCESSNAME:  Indicates the Name of the index if index access is selected.
INDEXONLY: Can have value of 'Y' or 'N'.  Value of 'Y' indicates that all data requested in the select query is present in index.  Value of 'N' indicates table access is also needed.
PREFETCH: Value of 'D' is Dynamic prefetch. 'S' is sequential prefetch.

Continue reading the below links if this interests u..!
2. Db2 Prefetch
3.working towards a better index 

Saturday, March 1, 2014

OUTREC BUILD and OUTREC OVERLAY in SORT

We sometimes tend to get confused between the BUILD and OVERLAY of OUTREC statement.
Lets see  how each works and figure out when to use BUILD and OUTREC in SORT.
Scenario 1.
Input file is same as we used before
---+----1----+----2----+----3----+
************************  
HARLEY   123456MEXICO   
DAVID    658999CANADA    
*********************** 
We want to add date to the records from 35th position using BUILD.Sort syntax to do it with use of OUTREC BUILD would be:
SORT FIELDS=COPY                         
OUTREC BUILD=(1:1,32,35:&DATE)    
Output:
----+----1----+----2----+----3----+-          
*********************************
HARLEY   123456MEXICO   03/02/14
DAVID    658999CANADA   03/02/14 
********************************  
However,we can get the same output by use of OVERLAY in OUTREC as well like below:
SORT FIELDS=COPY                   
OUTREC OVERLAY=(35:&DATE)
Thus we can see, while using BUILD, we had to build the output records specifying the starting position,and records needed in output by using the syntax '1:1,32' and then specify '35:&date' which puts the date into 35th position.

However, OVERLAY reduced the coding effort.We specified only where the change needs to be done, keeping the entire record structure same.This becomes very handy when handling large and complex sort conditions since it eliminates the need to build the output record by record.