Select (SQL) | Ali Tarhini

Data mining in Sql Server 2008 & Visual Studio

May 25, 2011 Leave a comment

Image via Wikipedia

Creating a Project in the Business Intelligence Development Studio

Follow these steps to create a new project. To start BIDS, click the Start button and go to All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. In BIDS, select File New Project. You will see the Business Intelligence Projects template. Click the Analysis Services Project template. Type “AnalysisServices2008Tutorial” as the project name and select the directory in which you want to create this project. Click OK to create the project.

The Solution Explorer Pane

The Solution Explorer contains the following:

1) Data source objects: They contain details of a connection to a data source, which include server name, catalog or database name, and login credentials. You establish connections to relational servers by creating a data source for each one.

2) Data Source Views: When working with a large operational data store you don’t always want to see all the tables in the database. With Data Source Views (DSVs), you can limit the number of visible tables by including only the tables that are relevant to your analysis.

3) Cubes: A collection of measure groups (from the fact tables) and a collection of dimensions form a cube. Each measure group is composed of a set of measures. Cubes can have more than three dimensions and not necessarily the three – dimensional objects as their name suggests.

4) Dimensions: They are the set of tables that are used for building the cube. Attributes that are needed for the analysis task are selected from each table.

5) Mining Structures: Data mining is the process of analyzing raw data using algorithms that help discover interesting patterns not typically found by ad – hoc analysis. Mining Structures are objects that hold information about a data set. A collection of mining models form a mining structure. Each mining model is built using a specific data mining algorithm and can be used for analyzing patterns in existing data or predicting new data values.

The Properties Pane

If you click an object in the Solution Explorer, the properties for that object appear in the Properties pane. Items that cannot be edited are grayed out. If you click a particular property, the description of that property appears in the Description pane at the bottom of the Properties pane.

Data mining in sql server 2008

The data mining process is regarded as a series of steps to be followed which include the following:

1) Creating a Data Source:

Cubes and dimensions of an Analysis Services database must retrieve their data values from tables in a relational data store. This data store, typically part of a data warehouse, must be defined as a data source.

To create a data source, follow these steps:

a) Select the Data Sources folder in the Solution Explorer.

b) Right – click the Data Sources folder and click New Data Source. This launches the Data Source Wizard.

c) In the data source wizard you will provide the connection information about the relational data source that contains the “Adventure Works DW 2008” database. Click the New button under Data Connection Properties to specify the connection details. You will enter here the server name, the database name, and choose one of the two authentication modes either sql server authentication or windows authentication.

d) In the Impersonation Information page you need to specify the impersonation details that Analysis Services will use to connect to the relational data source. There are four options. You can provide a domain username and password to impersonate or select the Analysis Service instance’s service account for connection. The option Use the credentials of the current user is primarily used for data mining where you retrieve data from the relational server for prediction. If you use the Inherit option, Analysis Services uses the impersonation information specified for the database.

e) On the final page, the Data Source Wizard chooses the relational database name you have selected as the name for the data source object you are creating. You can choose the default name specified or specify a new name here.

2) Creating a Data Source View ( DSV )

The Adventure Works DW database contains 25 tables. The cube you build in this chapter uses 10 tables. Data Source Views give you a logical view of the tables that will be used within your OLAP database.

To create a Data Source View, follow these steps:

a) Select the Data Source Views folder in the Solution Explorer.

b) Right – click Data Source Views and select New Data Source View. This launches the Data Source View Wizard.

c) In the data source view wizard you can select the tables and views that are needed for the Analysis Services database you are creating. Click the > button

so that the tables move to the Included Objects list. We will include in the data source view here the following set of tables:

FactInternetSales, FactResellerSales, DimProduct, DimReseller, DimPromotion, DimCurrency, DimEmployee, DimSalesTerritory, DimTime, DimCustomer, Dim Geography.

d) At the final page of the DSV Wizard you can specify your own name for the DSV object or use the default name. Specify the “Adventure Works DW” for the DSV Name in the wizard and click Finish.

If you open the data source view in the solution explorer the data source view editor opens which contains three main areas: Diagram Organizer, the Tables view, and the Diagram view. In the diagram view you can see a diagram of all the added tables with their relationships among each other. In the tables view you can see all the tables that are contained in this data source view. In the diagram organizer, you can right click in the pane here to create a new diagram and drag and drop the tables that u wish to add, or simply add any table u want then right click on it and choose add related tables, this will add all the related tables to the given chosen table. In order to add a new field to a given table, you simply right click on the table in the diagram view and choose add named reference, a dialog will appear where you can enter the name of the new field and the formula upon which it is derived. For example, to add a new field named FullName to the table employee, you write the following formula: FirstName + ‘ ‘ + MiddleName + ‘ ‘ + LastName.

There are different layouts in the data source view. You can switch between rectangular layout and diagonal layout in the DSV by right – clicking in the DSV Designer and selecting the layout type of your choice.

To see a sample of the data specified by your DSV, right – click a table in the DSV Designer and select Explore Data. The data presented is only a subset of the underlying table data. By default the first 5,000 rows are retrieved and shown within this window. You can change the number of rows retrieved by clicking the Sampling Options button. Clicking the Sampling Options button launches the Data Exploration Options dialog where you can change the sampling method, sample count, and number of states per chart, which is used for displaying data in the chart format.

When you click the Pivot Table tab you get an additional window called PivotTable Field List that shows all the columns of the table. You can drag and drop these columns inside the pivot table in the row, column, details, or filter areas. The values in the row and column provide you with an intersection point for which the detailed data is shown.

3) Creating New Dimensions

Dimensions help you define the structure of your cube so as to facilitate effective data analysis. Specifically, dimensions provide you with the capability of slicing data within a cube, and these dimensions can be built from one or more dimension tables.

a) Create the DimGeography dimension:

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension.

 In the Select Creation Method screen select the “Use an existing table” option and click next.

 In the Specify Source Information page, you need to select the DSV for creating the dimension, select the main table from which the dimension is to be designed, specify the key columns for the dimension, and optionally specify a name column for the dimension key value. By default, the first DSV in your project is selected. Because the current project has only one DSV (the Adventure WorksDW DSV), it is selected. Select the DimGeography table from the Main table drop – down list.

 Click the Next button to proceed to the next step in the Dimension Wizard.

 The Dimension Wizard now analyzes the DSV to detect any outward – facing relationships from the DimGeography table. An outward – facing relationship is a relationship between the DimGeography table and another table, such that a column in the DimGeography table is a foreign key related to another table. The Select Related Tables screen shows that the wizard detected an outward relationship between the DimGeography table and the DimSalesTerritory table. In this example you will be modeling the DimGeography table as a star schema table instead of snowflake schema. Deselect the DimSalesTerritory table and click next.

 The Select Dimension Attributes screen of the Dimension Wizard displays the columns of the main table that have been selected for the dimension you’re creating.

 Select all the attributes of the DimGeography table (all the attributes in the screen), leave their Attribute Type as Regular, allow them to be browsed, and click next.

 The final screen of the Dimension Wizard shows the attributes that will be created for the dimension based on your choices in the wizard. Click the Finish button.

Open the DimGeography dimension by double clicking on it in the solution explorer. In the Dimension structure tab you can see all the table attributes that have been added to this dimension. In the hierarchies’ pane, drag and drop the English country region name attribute followed by the State Province Name followed by the city and then the postal code. Then you have to build the relationships among these attributes in the hierarchy by clicking on the attribute relationships tab, and then dragging the postal code attribute towards the city, this means that the postal code value determines

the city. Drag the city towards the state. Drag the state towards the country. This will build the functional dependencies among the attributes in the hierarchy. Then you have to ensure that the city value is unique in determining the state name value by setting the key columns property of the city attribute to both the state province code and city, and setting its name columns to the city attribute. Similarly set the key columns of the postal code attribute to the postal code, the city, and the state province code attributes, and set its name columns to the postal code.

Deploy the project, by right clicking the project name and choosing deploy. After a successful deployment, you can browse the dimension by selecting the browse tab, where you can see all the data of the dimgeography table arranged according to their hierarchical levels.

b) Create the DimTime dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. When the welcome screen of the Dimension Wizard opens up, click next.

 In the Select Creation Method page of the wizard, select the “Use an existing table” option and click next.

 In the Specify Source Information page, select DimTime as the main table from which the dimension is to be designed and click next.

 In the Select Dimension Attributes page, in addition to the Date Key attribute, enable the checkboxes for the following attributes: Calendar Year, Calendar Semester, Calendar Quarter, English Month Name, and Day Number of Month.

 Set the Attribute Type for the “Calendar Year” attribute to Date Calendar Year.

 Set the Attribute Type for the “Calendar Semester” attribute to Date Calendar Half Year.

 Set the Attribute Type for the “Calendar Quarter” attribute to Date Calendar Quarter.

 Set the Attribute Type for the “English Month Name” attribute to Date Calendar Month.

 Set the Attribute Type for the “Day Number of Month” attribute to Date Calendar Day of Month.

 Create a multilevel hierarchy Calendar Date with the levels Calendar year, Calendar Semester, Calendar Quarter, Month (rename English Month Name), and Day (rename Day Number Of Month).

 Save the project and deploy it to the analysis services instance.

 Switch to the Browser pane of the DimTime dimension, where you can see that the date hierarchy is arranged according to the hierarchy that we defined above.

c) Create the DimEmployee dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. If the welcome screen of the Dimension Wizard opens up, click next.

 Make sure the “Use an existing table” option is selected and click next.

 In the Specify Source Information page, select DimEmployee as the main table from which the dimension is to be designed and click next.

 On the Select Related Tables screen, uncheck the DimSalesTerritory table and click next.

 In the Select Dimensions Attributes dialog, the Dimension Wizard has detected three columns of the DimEmployee table to be included as attributes. The Dimension Wizard will select columns if they are either the primary key of the table or a foreign key of the table or another table in the DSV. The attributes suggested by the Dimension Wizard in this example are the key attribute Employee Key, the parent – child attribute Parent Employee Key, and the Sales Territory Key, which is a foreign key column to the DimSalesTerritory table.

 Select all the columns of the DimEmployee table as attributes and click next.

 Double – click the DimEmployee dimension in the Solution Explorer to open the Dimension Designer.

 Change the NameColumn property of the Key attribute Dim Employee to FullName and deploy the project to your Analysis Services instance.

When you browse the Parent – Child hierarchy, you will see the members of the hierarchy showing the full names of the employees.

4) Creating a Cube Using the Cube Wizard

Cubes are the principal objects of an OLAP database that help in data analysis. Cubes are multidimensional structures that are primarily composed of dimensions and facts. The data from a fact table that is stored within the cube for analysis are called measures.

To build a new cube, follow these steps:

a) Right – click the Cubes folder and select New Cube. Click next on the introduction page to proceed.

b) In the Select Creation Method page you have the option to build a cube from existing tables, create an empty cube, or create a cube based on a template and generate new tables in the data source. Choose to build the cube from the existing tables in the Adventure Works DW data source. Click Next to proceed to the next step in the Cube Wizard.

c) The next page of the Cube Wizard is the Measure Group Tables selection page. You now must select one or more tables that will serve as fact tables for your Measure Group. The Suggest button on this screen can be used to have the Cube Wizard scan the DSV to detect the fact tables in the DSV and

detect fact tables. Click the Suggest button to have the Cube Wizard automatically select potential Measure Group tables. The Cube Wizard now scans the DSV to detect the fact and dimension tables in the DSV, automatically selects the candidate tables. Any table that has an outgoing relationship is identified as a candidate fact table, whereas a table that has an incoming relationship is detected as a dimension table. Select both the FactResellerSales and the FactInternetSales as the fact tables. And then select the measures that you need to include from these fact tables for the analysis task.

d) In the Select Existing Dimensions page, the Cube Wizard displays a list of all existing dimensions defined in the project. Accept the selection of all the dimensions and click next.

e) The Cube Wizard asks you to select any new dimensions to be created from existing tables in the data source that are not already used for dimensions in the project. You can deselect dimensions that are not needed for your cube on this page. This illustration will use the Fact tables only as measure groups and not for dimensions. Deselect the Fact Reseller Sales and Fact Internet Sales dimensions on this page and click next.

f) In the final page of the Cube Wizard you can specify the name of the cube to be created and review the measure groups, measures, dimensions, attributes, and hierarchies. Use the default name Adventure Works DW suggested by the Cube Wizard and click Finish.

After creating the cube, the new dimensions are automatically created. But these dimensions will have only their primary and foreign keys selected. You have to open each created dimension and select the attributes that you need to add from each table.

g) Press F5 to deploy, build and process the cube. Deploying the cube means building the cube according to the structure that you have defined, while processing the cube means computing all the aggregation values for all the cells in the cube.

You can add a new calculated measure to the cube by Right – clicking in the Script Organizer pane of the Calculation Scripts tab and entering the formula for this new measure.

Now that the cube has been deployed, switch the BIDS Cube Designer view to the Browser page. In the Browser page you will see three panes: a Measure Group pane, a Filter pane, and a Data pane. Suppose you want to analyze the Internet sales of products based on the promotions offered to customers and the marital status of those customers. First you would need to drag and drop [DimPromotion].[English Promotion Type] from the Measure Group pane to the OWC rows area. Next, drag and drop [Dim Customer].[Marital Status] from the Measure Group pane to the OWC columns area. Finally, drag and drop the measure [Sales Amount] from the Fact Internet Sales measure group to the Drop Totals or Detail Fields Here area of the OWC pane.

You can also use MDX queries to query the cube. These MDX queries are similar to the sql server queries. Just as SQL (Structured Query Language) is a query language used to retrieve data from relational databases, MDX (Multi – Dimensional expressions) is a query language used to retrieve data from multidimensional databases.

The format of MDX query is shown below:

SELECT [< axis expression >, [< axis expression > …]]

FROM [< cube_expression >]

[WHERE [slicer expression]]

5) Creating a Mining Structure

Analysis Services 2008 provides nine data mining algorithms that can be utilized to solve various business problems. These algorithms can be broadly classified into five categories based on the nature of the business problem they can be applied to. They are:

1) Classification

2) Regression

3) Segmentation

4) Sequence analysis

5) Association

We aim at grouping customers that undergo similar characteristics.

To create a relational mining model, follow the following steps:

a) Right – click the Mining Structures folder in the Solution Explorer and select New Mining Structure as to launch the Data Mining Wizard that helps you to create data mining structures and models. Click the Next button.

b) Select the “From existing cube” radio button and click next.

c) Select Microsoft Clustering and click next.

d) Choose the Customer table as the primary table and enter the following attributes as inputs for building clusters:

Age, Yearly Income, Number of cars owned, Number of Children at home and Occupation.

You will now see the clustering mining model represented as several nodes with lines between these nodes. By default the clustering mining model groups the customer into ten different clusters. The number of clusters generated can be changed from a property for the cluster mining model. Each cluster is shown as a node in the cluster viewer. Darker shading on the node indicates that the cluster favors a specific input column and vice versa. If there is a similarity between two clusters, it is indicated by a line connecting the two nodes. Similar to the shade of the color node, if the relationship is stronger between two nodes, it is indicated via a darker line. You can move the slider on the left of the cluster diagram from All Links to Strongest Links. As you do this you can see the weaker relationships between the clusters are not displayed. You can change the cluster name by right – clicking the cluster and selecting Rename. You can select desired input columns of the mining model from the Shading Variable drop –

down to see the effect of the column on the various clusters. When you choose a specific shading variable column you need to choose one of the states of the column to be used as the shading variable for the clusters.

The Cluster Profiles view shows the relationship between the mining columns of the model and the clusters in a matrix format. The intersection cell of a specific column and a cluster shows a histogram bar of the various values of the column that are part of the cluster. The size of each bar reflects the number of items used to train the model.

The cluster Characteristics tab shows the characteristics of a single cluster and how the various states of the input columns make up the cluster.

The Cluster Discrimination tab shows the characteristics of a Cluster in comparison with the characteristics of the complement of this Cluster.

Filed under Databases, Technology Tagged with Business intelligence, Data mining, Database, Microsoft, Microsoft Analysis Services, Microsoft SQL Server, Select (SQL), SQL

DEVELOPMENT AND CODING STANDARDS: SQL AND Database Guidelines

March 20, 2011 Leave a comment

Image via Wikipedia

SQL AND DATABASE RULES
NAMING CONVENTIONS
DECLARING VARIABLES
SELECT STATEMENTS
CURSORS
WILDCARD CHARACTERS
NOT EQUAL OPERATORS
DERIVED TABLES
SQL BATCHES
ANSI-STANDARD JOIN CLAUSES
STORED PROCEDURES NAMING CONVENTION
USING VIEWS
TEXT DATA TYPES
INSERT STATEMENTS
ACCESSING TABLES
STORED PROCEDURE RETURNING VALUES
OBJECT CASE
T-SQL VARIABLES
OFFLOAD TASKS
CHECK FOR RECORD EXISTENCE
OBJECT OWNER
UPSERT STATEMENTS
DATETIME COLUMNS
MEASURE QUERY PERFORMANCE
INDEXES

Naming Conventions
All T-SQL Keywords must be upper case.
All declared variable names must be Camel Case while all stored procedure names, function names, trigger names, Table names and Columns names in query must be Pascal Case.
All view names must start with the letter ‘v’ followed by the name of the view in Pascal Case
Example:

SELECT * FROM Employee WHERE ID = 2
DECLARE @minSalary int
CREATE PROCEDURE GetEmployees

If you are creating a table belonging to a specific module, make sure to append a 3 character prefix before the name of each table, example:

LABResult
LABSpecimen
LABOrder
RADImage
RADResult

Note that all table names must be singular.
When creating columns, make sure to append a ‘_F’ to the end of each column you intend to use as a flag. If there are exactly two statuses for the flag, use ‘bit’ data type, if there are 3 or more statuses, use ‘char(1)’ data type. If the column is foreign key reference, append ‘_FK’ to the end of the column name. This makes it easy to distinguish flag and foreign key columns:

CREATE TABLE Employee(
ID INT IDENTITY NOT NULL PRIMARY KEY,
FirstName varchar(max),
Sex_F BIT,
Person_FK int,
Status_F CHAR(1)
)

Declaring Variables
Always declare variables at the top of your stored procedure and set their values directly after declaration. If your database runs on SQL Server 2008, you can declare and set the variable on the same line. Take a look at the following statement under SQL 2000/SQL 2005 and the second statement under SQL 2008. Standard programming language semantics are added in SQL 2008 for short assignment of values:

DECLARE @i int
SET @i = 1
SET @i = @i + 1
-------------------
DECLARE @i int = 1
SET @i +=1

Select Statements
Do not use SELECT * in your queries. Always write the required column names after the SELECT statement. This technique results in reduced disk I/O and better performance:

SELECT CustomerID, CustomerFirstName, City From Customer

If you need to write a SELECT statement to retrieve data from a single table, don’t SELECT the data from a view that points to multiple tables. Instead, SELECT the data from the table directly, or from a view that only contains the table you are interested in. If you SELECT the data from the multi-table view, the query will experience unnecessary overhead, and performance will be hindered.

Cursors
Try to avoid server side cursors as much as possible. Always stick to a ‘set-based approach’ instead of a ‘procedural approach’ for accessing and manipulating data. Cursors can often be avoided by using SELECT statements instead.
If a cursor is unavoidable, use a WHILE loop instead. A WHILE loop is always faster than a cursor. But for a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely.

Wildcard Characters
Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that result in an index scan, which defeats the purpose of an index. The following statement results in an index scan, while the second statement results in an index seek:

SELECT EmployeeID FROM Locations WHERE FirstName LIKE '%li'
SELECT EmployeeID FROM Locations WHERE FirsName LIKE 'a%i'

Not Equal Operators
Avoid searching using not equals operators (<> and NOT) as they result in table and index scans.

Derived Tables
Use ‘Derived tables’ wherever possible, as they perform better. Consider the following query to find the second highest salary from the Employees table:

SELECT MIN(Salary) FROM Employees WHERE EmpID IN (SELECT TOP 2 EmpID FROM Employees ORDER BY Salary Desc)

The same query can be re-written using a derived table, as shown below, and it performs twice as fast as the above query:

SELECT MIN(Salary) FROM (SELECT TOP 2 Salary FROM Employees ORDER BY Salary DESC)

This is just an example, and your results might differ in different scenarios depending on the database design, indexes, volume of data, etc. So, test all the possible ways a query could be written and go with the most efficient one.

SQL Batches
Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments.
This suppresses messages like ‘(1 row(s) affected)’ after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.

ANSI-Standard Join Clauses
Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins, the WHERE clause is used only for filtering data. Whereas with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows the old style join, while the second one show the new ANSI join syntax:

SELECT a.au_id, t.title FROM titles t, authors a, titleauthor ta WHERE
a.au_id = ta.au_id AND
ta.title_id = t.title_id AND
t.title LIKE '%Computer%'
----------------------------------------------
SELECT a.au_id, t.title
FROM authors a
INNER JOIN titleauthor ta
ON
a.au_id = ta.au_id
INNER JOIN titles t
ON
ta.title_id = t.title_id WHERE t.title LIKE '%Computer%'

Stored Procedures Naming Convention
Do not prefix your stored procedure names with “sp_”. The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_, it first tries to locate the procedure in the master database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner.
So you can really save time in locating the stored procedure by avoiding the “sp_” prefix.

Using Views
Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission only on views. Yet another significant use of views is that they simplify your queries.
Incorporate your frequently required, complicated joins and calculations into a view so that you don’t have to repeat those joins/calculations in all your queries. Instead, just select from the view.

Text Data Types
Try not to use TEXT or NTEXT data types for storing large textual data.
The TEXT data type has some inherent problems associated with it and will be removed from future version of Microsoft SQL Server.
For example, you cannot directly write or update text data using the INSERT or UPDATE
Statements. Instead, you have to use special statements like READTEXT, WRITETEXT and UPDATETEXT.
There are also a lot of bugs associated with replicating tables containing text columns.
So, if you don’t have to store more than 8KB of text, use CHAR(8000) or VARCHAR(8000) data types instead.
In SQL 2005 and 2008, you can use VARCHAR(max) for storing unlimited amount of textual data.

Insert Statements
Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding or dropping a column).

Accessing Tables
Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are:
1. Keep your transactions as short as possible. Touch as few data as possible during a transaction.
2. Never, ever wait for user input in the middle of a transaction.
3. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed.
4. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205.
5. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.

Stored Procedure Returning Values
Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures.
The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
If your stored procedure always returns a single row result set, consider returning the result set using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than result sets returned by SELECT statements.

Object Case
Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case.
For example, if you create a table in SQL Server or a database that has a case-sensitive or binary sort order; all references to the table must use the same case that was specified in the CREATE TABLE statement.
If you name the table as ‘MyTable’ in the CREATE TABLE statement and use ‘mytable’ in the SELECT statement, you get an ‘object not found’ error.

T-SQL Variables
Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code. Consider the following example:

SELECT OrderID, OrderDate FROM Orders WHERE OrderStatus IN (5,6)

The same query can be re-written in a mode readable form as shown below:

DECLARE @ORDER_DELIVERED, @ORDER_PENDING
SELECT @ORDER_DELIVERED = 5, @ORDER_PENDING = 6
SELECT OrderID, OrderDate FROM Orders
WHERE OrderStatus IN (@ORDER_DELIVERED, @ORDER_PENDING)

Offload tasks
Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications if these operations are going to consume more CPU cycles on the database server.
Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.

Check for record Existence
If you need to verify the existence of a record in a table, don’t use SELECT COUNT (*) in your Transact-SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:
Here’s how you might use COUNT(*):

IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')

Here’s a faster way, using IF EXISTS:

IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')

The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the text is proven true, while COUNT(*) must count go through every record, whether there is only one, or thousands, before it can be found to be true.

Object Owner
For best performance, all objects that are called from within the same stored procedure should all be owned by the same owner, preferably dbo. If they are not, then SQL Server must perform name resolution on the objects if the object names are the same but the owners are different. When this happens, SQL Server cannot use a stored procedure “in-memory plan” over, instead, it must re-compile the stored procedure, which hinders performance.
There are a couple of reasons, one of which relates to performance. First, using fully qualified names helps to eliminate any potential confusion about which stored procedure you want to run, helping to prevent bugs and other potential problems. But more importantly, doing so allows SQL Server to access the stored procedures execution plan more directly, and in turn, speeding up the performance of the stored procedure. Yes, the performance boost is very small, but if your server is running tens of thousands or more stored procedures every hour, these little time savings can add up.

Upsert Statements
SQL Server 2008 introduces Upsert statements which combine insert, update, and delete statements in one ‘Merge’ statement.
Always use the Merge statement to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table

MERGE table1 AS target
USING (
SELECT
ID,Name
FROM table2
) AS source (ID,Name)
ON
(
target.Table2ID = source.ID
)
WHEN NOT MATCHED AND target.Name IS NULL THEN
DELETE
WHEN NOT MATCHED THEN
INSERT (name, Table2ID)
VALUES(name + ' not matched', source.ID)
WHEN MATCHED THEN
UPDATE
SET target.name = source.name + ' matched'
OUTPUT $action,inserted.id,deleted.id;

DateTime Columns
Always use ‘datetime2’ data type in SQL 2008 instead of the classic ‘datetime’. Datetime2 offers optimized data storage by saving 1 additional byte from the classic datetime. It has a larger date range, a larger default fractional precision, and optional user-specified precision.
If your column is supposed to store the date only portion, use the ‘date’ date type while if you want to store the time portion, use the ‘time’ data type. Below is a list of examples of these new data types look like:

time 12:35:29. 1234567
date 2007-05-08
smalldatetime 2007-05-08 12:35:00
datetime 2007-05-08 12:35:29.123
datetime2 2007-05-08 12:35:29. 1234567
datetimeoffset 2007-05-08 12:35:29.1234567 +12:15

Measure Query Performance
Always use statistics time feature to measure your important query and stored procedure’s performance. Use statistics time to optimize your queries Take a look at this example:

SET STATISTICS TIME ON
EXEC GetMedicalProcedures 1,10
SET STATISTICS TIME OFF

The below information will be displayed in the Messages tab:
SQL Server parse and compile time:
CPU time = 6 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 24 ms, elapsed time = 768 ms.
(10 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 125 ms.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 131 ms.
This provides a good estimation of how long the query took to be executed, showing the CPU time (processing time) and elapsed time (CPU + I/O).

Indexes
Create indexes on tables that have high querying pressure using select statements. Be careful not to create an index on tables that are subject to real-time changes using CRUD operations.
An index speeds up a select clause if the indexed column is included in the query, especially if it is in the WHERE clause. However, the same index slows down an insert statement whether or not the indexed column is included in the query. This downside occurs because indexes readjust and update statistics every time the table structure is changed. So use indexes wisely for optimizing tables having high retrieval rate and low change rate.

Filed under Databases, Technology Tagged with Data type, Database, Join (SQL), Microsoft SQL Server, Select (SQL), SQL, SQL 2008, Stored procedure

Cross Product of arrays using LINQ

November 20, 2010 1 Comment

Cross Product is usually a database operation on two tables similar to a Join operation with the only difference is that the cross product will yield all the possible combinations between the two tables. Database Management Systems such as SQL Server and Oracle provided the cross product operation easily using the cross join keyword. The problem however in real life applications appears when we want to do cross product within the application logic. Suppose you have two arrays of numbers and you want to find out all the possible combinations of the two arrays together, then pick one of those combinations that meet your specific business needs. Fortunately, with the introduction of LINQ in the past few years, cross product has become as simple as writing any other naive query. Lets take a look at the example below: consider two arrays A and B having 4 values each:

        Dim A() As Integer = {1, 2, 3, 4}
        Dim b() As Integer = {5, 6, 7, 8}

Cross product is simply obtained using the following LINQ query followed by writing out the results to console:

        Dim crossProduct = From x In A, y In b Select x, y
        For Each i In crossProduct
            Console.WriteLine(i)
        Next

This will yield the following results:

Filed under .Net, Algorithms, Databases Tagged with Cross product, Database, Database management system, Language Integrated Query, LINQ, Microsoft SQL Server, Select (SQL), SQL Server

How to get non working devices using Vb.net

October 31, 2010 Leave a comment

When it comes to hardware interaction, many .net developers feel frustrated and uncertain about the capability of doing what they have in mind. One common problem we face is to enumerate the devices installed on your computer. A harder problem is to get the devices that are not working properly or disabled!. If you are writing an application that requires the existence of a specific hardware device, it makes common sense to test whether the desired device is working and enabled.

In Windows, you can view the lis of devices from the Device Manager. Disabled devices appear with a down arrow next to the device and devices that are not working properly appear with a yellow exclamation mark icon next to it:

The solution to such problems in .Net Framework languages becomes simple if we make us of WMI Queries or Windows Management Instrumentation. Writing a WMI query to retrieve system information is as simple as wrting an SQL Query.In fact, it’s almost exactly the same syntax except the objects we are querying do differ. Take a look at the WMI query below that retrieves all installed devices on the local system:

    Select * from Win32_PnPEntity

As simple as that, the Win32_PnpEntity stores information about installed devices. So if want to filter out the results by getting only the non working devices, we simply add a Where clause just like we do if we filtering data from an sql table.

    Select * from Win32_PnPEntity WHERE ConfigManagerErrorCode  0

The ConfigManagerErrorCode property stores the state of the device, a value of 0 means Working and any other different value means the device is not working or disabled.

Lets create a Device class to convert each retrieved device to an object that we can deal with in our application. After all, we would want to list out all non working devices and bind the list to some bindable data object such as a DataGrid or a ListView.

Below is our base Device class having all the properties we need to show:

Public Class Device

    Private mName As String
    Private mManufacturer As String
    Private mDescription As String
    Private mService As String
    Private mDeviceID As String
    Private mPNPDeviceID As String
    Private mClassGUID As String

    Public Property Name() As String
        Get
            Return mName
        End Get
        Set(ByVal value As String)
            mName = value
        End Set
    End Property

    Public Property Manufacturer() As String
        Get
            Return mManufacturer
        End Get
        Set(ByVal value As String)
            mManufacturer = value
        End Set
    End Property

    Public Property Description() As String
        Get
            Return mDescription
        End Get
        Set(ByVal value As String)
            mDescription = value
        End Set
    End Property

    Public Property Service() As String
        Get
            Return mService
        End Get
        Set(ByVal value As String)
            mService = value
        End Set
    End Property

    Public Property DeviceID() As String
        Get
            Return mDeviceID
        End Get
        Set(ByVal value As String)
            mDeviceID = value
        End Set
    End Property

    Public Property PNPDeviceID() As String
        Get
            Return mPNPDeviceID
        End Get
        Set(ByVal value As String)
            mPNPDeviceID = value
        End Set
    End Property

    Public Property ClassGUID() As String
        Get
            Return mClassGUID
        End Get
        Set(ByVal value As String)
            mClassGUID = value
        End Set
    End Property
End Class

We are ready now to implement our two main function, GetAllDevices and GetNonWorkingDevices and add them to the Device class. Both methods will be Shared methods because they are generic and produce the same result across all instances of the Device class. Below is the impelemtation of both methods, note the use of the GetObject method which retrieves an instance of WMI on the local computer system:

Public Shared Function GetAllDevices() As List(Of Device)
    Dim pc As String = "." 'local
    Dim wmi As Object = GetObject("winmgmts:\\" & pc & "\root\cimv2")
    Dim allDevices As New List(Of Device)
    Dim devices As Object = wmi.ExecQuery("Select * from Win32_PnPEntity")
    Dim device As Device
    For Each d As Object In devices
        device = New Device
        With Device
        .mClassGUID = IIf(IsDBNull(d.ClassGuid), 0, d.ClassGuid)
        .mDescription = IIf(IsDBNull(d.Description), 0, d.Description)
        .DeviceID = IIf(IsDBNull(d.DeviceID), 0, d.DeviceID)
        .Manufacturer = IIf(IsDBNull(d.Manufacturer), 0, d.Manufacturer)
        .Name = IIf(IsDBNull(d.Name), 0, d.Name)
        .PNPDeviceID = IIf(IsDBNull(d.PNPDeviceID), 0, d.PNPDeviceID)
        .Service = IIf(IsDBNull(d.Service), 0, d.Service)
        End With
        allDevices.Add(device)
    Next
    Return allDevices
End Function

Public Shared Function GetNonWorkingDevices() As List(Of Device)
    Dim pc As String = "." 'local
    Dim wmi As Object = GetObject("winmgmts:\\" & pc & "\root\cimv2")
    Dim notWorking As New List(Of Device)
    Dim devices As Object = wmi.ExecQuery("Select * from " & _
          "Win32_PnPEntity WHERE ConfigManagerErrorCode  0")
    Dim device As Device
    For Each d As Object In devices
        device = New Device
        With Device
        .mClassGUID = IIf(IsDBNull(d.ClassGuid), 0, d.ClassGuid)
        .mDescription = IIf(IsDBNull(d.Description), 0, d.Description)
        .DeviceID = IIf(IsDBNull(d.DeviceID), 0, d.DeviceID)
        .Manufacturer = IIf(IsDBNull(d.Manufacturer), 0, d.Manufacturer)
        .Name = IIf(IsDBNull(d.Name), 0, d.Name)
        .PNPDeviceID = IIf(IsDBNull(d.PNPDeviceID), 0, d.PNPDeviceID)
        .Service = IIf(IsDBNull(d.Service), 0, d.Service)
        End With
        notWorking.Add(device)
    Next
    Return notWorking
End Function

Our Device class is now ready, we can simply now bind the results of each method to a DataGrid as i have done in the figure below:

Here is the code that produces the above results after adding a DatagridView to a form:

Private Sub Form1_Load(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles MyBase.Load
    DataGridView1.DataSource = Device.GetNonWorkingDevices
End Sub

The project with its source code can be downloaded here(remove the .jpg extension).

Filed under .Net Tagged with .NET Framework, Component Frameworks, Programming, Select (SQL), Windows, Windows Management Instrumentation

Ali Tarhini

Data mining in Sql Server 2008 & Visual Studio

DEVELOPMENT AND CODING STANDARDS: SQL AND Database Guidelines

Cross Product of arrays using LINQ

How to get non working devices using Vb.net

Welcome to Ali Tarhini’s blog

Check these out!

Email Subscription

Categories

Top Posts

Recent Posts

Archive