Face Recognition: Image Preprocessing

Red, green and blue lights showing secondary c...

Image via Wikipedia

1. Introduction

Human eyes have about 10:1 absolute range from fully adapted dark vision to fully adapted lighting conditions at noon on the equator. We can see about 3X10:1 range of luminance when we are adapted to a normal working range. Due to the limited dynamic ranges of current imaging and display devices, images captured in real world scenes with high dynamic ranges usually exhibit poor visibility of either over exposure or shadows and low contrast, which may make important image features lost or hard to tell by human eyes. Computer vision algorithms also have difficulty processing those images. To cope with this problem, various image processing techniques have been developed. Some of those techniques are spatially-independent methods, like gamma adjustment, logarithmic compression, histogram equalization (HE), and levels/curves methods.

Human face detection plays an important role in applications such as intelligent human computer interface (HCI), biometric identification, and face recognition. The goal of any face detection technique is to identify the face regions within a given image. The reliable detection of faces has been an ongoing research topic for decades. There are several face detection techniques proposed in the literature both in gray scale and color. The appearance based algorithms process gray scale images. A typical color based face detection system on the other hand would first do a skin color region extraction on color images based on either pixel based or a combination of pixels and shape based systems in different color spaces. The next step would in general be region merging followed by classification or application of any appearance-based method to classify the skin color regions into faces and non-faces by converting them into gray scale images.


2. Grayscale

In photography and computing, a grayscale or grayscale digital image is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest.

Grayscale images are distinct from one-bit black-and-white images, which in the context of computer imaging are images with only the two colors, black, and white (also called bi-level or binary images). Grayscale images have many shades of gray in between. Grayscale images are also called monochromatic, denoting the absence of any chromatic variation.

Grayscale images are often the result of measuring the intensity of light at each pixel in a single band of the electromagnetic spectrum (e.g. infrared, visible light, ultraviolet, etc.), and in such cases they are monochromatic proper when only a given frequency is captured. But also they can be synthesized from a full color image; see the section about converting to grayscale.


2.1 Numerical representations

The intensity of a pixel is expressed within a given range between a minimum and a maximum, inclusive. This range is represented in an abstract way as a range from 0 (total absence, black) and 1 (total presence, white), with any fractional values in between. This notation is used in academic papers, but it must be noted that this does not define what “black” or “white” is in terms of colorimetry.

Another convention is to employ percentages, so the scale is then from 0% to 100%. This is used for a more intuitive approach, but if only integer values are used, the range encompasses a total of only 101 intensities, which are insufficient to represent a broad gradient of grays. Also, the percentile notation is used in printing to denote how much ink is employed in halftoning, but then the scale is reversed, being 0% the paper white (no ink) and 100% a solid black (full ink).

In computing, although the grayscale can be computed through rational numbers, image pixels are stored in binary, quantized form. Some early grayscale monitors can only show up to sixteen (4-bit) different shades, but today grayscale images (as photographs) intended for visual display (both on screen and printed) are commonly stored with 8 bits per sampled pixel, which allows 256 different intensities (i.e., shades of gray) to be recorded, typically on a non-linear scale. The precision provided by this format is barely sufficient to avoid visible banding artifacts, but very convenient for programming due to a single pixel occupies then a single byte.

Technical uses (e.g. in medical imaging or remote sensing applications) often require more levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to guard against round off errors in computations. Sixteen bits per sample (65,536 levels) is a convenient choice for such uses, as computers manage 16-bit words efficiently. The TIFF and the PNG (among other) image file formats supports 16-bit grayscale natively, although browsers and many imaging programs tend to ignore the low order 8 bits of each pixel.

No matter what pixel depth is used, the binary representations assume that 0 is black and the maximum value (255 at 8 bpp, 65,535 at 16 bpp, etc.) is white, if not otherwise noted.


2.2 Converting color to grayscale

Conversion of a color image to grayscale is not unique; different weighting of the color channels effectively represents the effect of shooting black-and-white film with different-colored photographic filters on the cameras. A common strategy is to match the luminance of the grayscale image to the luminance of the color image.

To convert any color to a grayscale representation of its luminance, first one must obtain the values of its red, green, and blue (RGB) primaries in linear intensity encoding, by gamma expansion. Then, add together 30% of the red value, 59% of the green value, and 11% of the blue value (these weights depend on the exact choice of the RGB primaries, but are typical). Regardless of the scale employed (0.0 to 1.0, 0 to 255, 0% to 100%, etc.), the resultant number is the desired linear luminance value; it typically needs to be gamma compressed to get back to a conventional grayscale representation.

This is not the method used to obtain the luma in the Y’UV and related color models, used in standard color TV and video systems as PAL and NTSC, as well as in the L*a*b color model. These systems directly compute a gamma-compressed luma as a linear combination of gamma-compressed primary intensities, rather than use linearization via gamma expansion and compression.

To convert a gray intensity value to RGB, simply set all the three primary color components red, green and blue to the gray value, correcting to a different gamma if necessary.

3. Histogram equalization

A histogram can represent any number of things, since its sole purpose is to graphically summarize the distribution of a single-variable set of data1. Each specific use targets some different features of histogram graphs, and when it boils down to image analysis, histograms are the de facto standard.

When viewing an image represented by a histogram, what we’re really doing is analyzing the number of pixels (vertically) with a certain frequency (horizontally). In essence, an equalized image is represented by an equalized histogram where the number of pixels is spread evenly over the available frequencies.

An overexposed image is defined as an image in which there are an excessive number of pixels with a high pixel frequency, while there is a shortage of low frequencies (Figure 3.2). The data in a histogram representing an overexposed image therefore is not spread evenly over the horizontal axis, instead skewing non-symmetrically to the absolute right edge of the graph (Figure 3.3).

Usually, when the number of pixels with very high pixel frequencies is so high – as shown in the example – it means that some image data has been lost; it is then impossible to restore detail in areas where the pixel frequencies have been cropped to a maximum value.

The same, of course, goes for underexposed images (Figure 3.4); images where the histogram is skewed non-symmetrically to the absolute left edge of the graph (Figure 3.5).

There are situations where we would want to reveal detail in an image that cannot easily be seen with the naked eye. One of the several techniques to enhance an image in such a manner is histogram equalization, which is commonly used to compare images made in entirely different circumstances. Extreme examples may include comparisons of photographs with different exposures, lighting angles and shadow casts3.

The concept of histogram equalization is to spread otherwise cluttered frequencies more evenly over the length of the histogram. Frequencies that lie close together will dramatically be stretched out. These respective areas of the image that first had little fluctuation will appear grainy and rigid, thereby revealing otherwise unseen details.

A histogram equalization algorithm will determine the ideal number of times each frequency should appear in the image and, theoretically, re-plot the histogram appropriately. However, because image data isn’t stored in an analogue manner, this is usually not possible. Instead, the image data is stored digitally and limited to n bits of color depth, thus the image cannot be requantified to meet our requirement.

Nevertheless, by ensuring that the number of times a frequency in a certain range remains as close to the ideally equalized histogram as possible, we can work around the issue of requantification. The solution is simple and elegant.

The ideal number of pixels per frequency i is the total number of pixels in the image divided by the total number of possible image frequencies N. The algorithm counts the frequencies from 0 to N and shifts as many pixel frequencies into that position as long as this number of pixels is less than or equal to a certain delimiter that increases linearly to

the frequency. If a pixel frequency doesn’t fit, it is pushed to the right along the horizontal axis until a place is found.

In the simplest scenario, histogram equalization is used on grayscale images. However, by converting a RGB image to HSV, we can equalize the Value channel without altering the Hue or Saturation. After converting the result back to RGB, a properly equalized image is produced (Figure 3.6 and Figure 3.7).

4. Dynamic Range Compression

We have used a sigmoid transfer function for increasing the dynamic range of an image. A hyperbolic tangent function is used for the reason of overcoming the natural loss in perceived lightness contrast that results when performing dynamic range compression. We have developed an enhancement strategy that will perform the range compression while maintaining the image details. The proposed solution is to develop the hyperbolic tangent functions that are tunable based on the statistical characteristics of the image. That is, the function will enhance the dark part of the image while preserving the light part of the image based on:

Where x, y the V component pixel is value in HSV color space and 0 ≤ I x,y ≤ 255 at (x, y) location of the image, ρ is the statistics of the image, and Ix,y is the enhanced pixel value which is normalized. The parameter ρ controls the curvature of the hyperbolic tangent function. This means that when the processing image is dark, ρ should be small and therefore the curvature of the hyperbolic tangent function will be steep and this will help the darker pixels to have brighter values. ρ can be expressed as:

Where x, y is the local mean of an image and k is the bias pixel intensity value. The local mean of each pixel is calculated based on the center surrounded property (k = 3) of a perceptual field and perceptual processes of human vision. The form of the surround function we used is Gaussian because it provides good dynamic range compression over a wide range of environments. Consequently, the local mean of the image is calculated by:

Where sis the standard deviation of the Gaussian distribution and c is selected so that òò x, y dxdy=1.The choice of spresents a tradeoff between the dynamic range compression and color rendition of the image

A smaller s will yield larger dynamic range compression but causes the image to lose its color. Conversely, a larger s will yield better color rendition but the shadow of the image will remain constant. Fig. 4.1 illustrates the variability of the hyperbolic tangent function based on Eq. (1) to (3). The output intensity range is converted to [0 255]. It can be observed that when the local mean of an image is small, the hyperbolic tangent function reshapes its curve towards the brighter pixel value to facilitate the rescaling of the range of the dark pixel to the brighter region. Conversely, when the local mean of an image is large, the hyperbolic tangent function compresses the brighter pixels to the darker region.


5. Contrast Enhancement

The dark shadows in images can be brightened while the local intensity contrast will be degraded using Eq. (1) – (3) because the nonlinear dynamic range compression decreases the intensity variation when darker pixels are brightened more with a larger ‘accelerate factor’ than those of lighter pixels. Fig. 5.1 illustrates the d

egradation of image contrast compared to that of original images (e.g. the clouds and the sky) due to the dynamic range compression. In order to improve the visual quality of images produced by the dynamic range compression, a contrast enhancement method is used to enhance the local contrast of these images. Therefore, after dynamic range compression and contrast enhancement, the visual quality of the original images with shadows created by high dynamic range scenes can be largely improved. In addition, enhancing the local contrast can also be useful for improving the performance of convolutional face finder, which is sensitive to local intensity variation (e.g. first and second derivative image information).

In the proposed contrast enhancement algorithm, the local intensity variation Iv is defined as in:

Where Ix, y and Iavg are the intensity enhanced image and its low-pass version, respectively. Iavg is computed using 2D convolution with a Gaussian kernel in    Eq. (3) in which 5≤ s≤10 by experiments. Iv, the difference between Ix, y and Iavg, can be either positive or negative, which accordingly represents a brighter or darker pixel compared with its neighbor pixels. The magnitude of Iv determines the local contrast of an image: larger magnitude indicates higher contrast and vice versus. Therefore, increasing the magnitude of Iv is an effective way to enhance local image contrast. The proposed contrast enhancement technique increases the magnitude of Iv using the power law operation as described in:

Β is tunable for adjusting the image contrast and β< 1. Where β has a default value of 0.75. Since β can be either an odd or even number, |Iv| instead of Iv is used to keep the result of Eq.

(5) Positive. Eq. (5) can increase low contrast (small |Iv|) while preserving high contrast (large |Iv|) because 0 ≤|Iv|≤1 Based on the result of |Iv,EN| and the sign of Iv, the enhanced local intensity variation Iv,EN can be obtained by restoring the sign no matter β is odd or even:

where sign(.) is defined as:

Finally, the intensity image (Ic,EN) with enhanced local contrast can be achieved by adding Iv,EN to Iavg as in:

Where the maximum of (Iv,EN + Iavg) is used to normalize (Iv,EN + Iavg) because (Iv,EN + Iavg) can be larger than 1.

The local contrast enhancement process can be illustrated using the images shown in Fig. 5.2. The luminance enhanced intensity image (Ix, y) and its local averaging result (Iavg) are shown in Fig. 5.1(b) and 5.1(a), respectively. Fig. 5.2(b) shows the magnitude image of |Iv|, the ‘bright’ regions represent those pixels which are either brighter or darker tha its neighboring pixels in the luminance enhanced intensity image (Ix, y). |Iv,EN|, the enhanced result of |Iv|, is shown in Fig. 5.2(c) where the edges (or features) are more obvious than those in Fig. 5.2(b), which indicates larger intensity variation compared to that represented by |Iv|. The final result of the local contrast enhancement is presented in Fig. 5.2(d) where image details are improved greatly due to the contrast enhancement algorithm defined by Eq.(4) – (8).


6. Color Remapping

For color images, a linear color remapping process based on the chromatic information of the original image is applied to the enhanced intensity image to recover the RGB color bands (r’, g’, b’) as in:

Where τ is the V component pixel value of HSV color space, which essentially is the maximum value among the original r, g and b values at each pixel location. In this case, the ratio of the original r, g and b can be maintained by applying the above linear color remapping. Hence, the color information of hue and saturation in the original image is preserved in the enhanced color image. One example of color image enhancement is presented in Fig. 6.1. The color consistency can be observed between the original image and enhanced image. Observe also that the local contrast of the enhanced image is capable of bringing out the fine details from the image.

Parallel k-Nearest Neighbor

Example of k-nearest neighbour classification

Image via Wikipedia

K-Nearest Neighbor or KNN algorithm is part of supervised learning that has been used in many applications including data mining, statistical pattern recognition, and image processing. The algorithm doesn’t build a classification model but instead it is based on values found in storage or memory. To identify the class of an input, the algorithm chooses the class to which the majority of the input’s k closest neighbors belong to. The KNN algorithm is considered as one of the simplest machine learning algorithms. However it is computationally expensive especially when the size of the training set becomes large which would cause the classification task to become very slow. Several attempts have been made to parallelize KNN on the GPU by taking advantage of the natural parallel architecture of GPU [5]. However, in this paper the KNN algorithm was parallelized on CPU by distributing the distance computations of the k nearest neighbors among different processors. The parallel implementation greatly increased the speed of the KNN algorithm by reducing its time complexity from O(D) ,where D is the number of records, to O(D/p) where p is the number of processors.

Keywords: K Nearest Neighbor, GPU, manycore, CPU, parallel processors.


The KNN algorithm is a widely applied method for classification in machine learning and pattern recognition. It was known to be computationally intensive when given large training sets, and did not gain popularity until the 1960s when increased computing power became available.

Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it. The training tuples are described by n attributes. Each tuple represents a point in an n-dimensional space. In this way, all of the training tuples are stored in an n-dimensional pattern space. When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k “nearest neighbors” of the unknown tuple. “Closeness” is defined in terms of a distance metric, such as Euclidean distance. The Euclidean distance between two points or tuples, say, X1 = (x11, x12, …, x1n) and X2 = (x21, x22, …, x2n), is:


dist(X1,X2) = (1)


The above algorithm applies for numerical data. For categorical attributes, a simple method is to compare the corresponding values of the attributes in tuple X1 with those in tuple X2. If the two are identical, then the difference between the two is taken as 0. If the two are different, then the difference is considered to be 1. Other methods may incorporate more sophisticated schemes for differential grading.

When computed in this way on a serial machine, the time complexity is clearly linear with respect to the number of data points. Hence, there is an interest in mapping the process onto a highly parallel machine in order to further optimize the running time of the algorithm. It should be noted, however, that serial implementations of the k-NN rule employing branch and bound search algorithms[1] (a systematic method for solving optimization problems) can scale sublinearly, such that the asymptotic time complexity may be constant with respect to the number of data points. Nonetheless, a fully parallel hardware implementation should still be much faster than the most efficient serial implementations.

Many parallel methods were conducted to increase the speed of the KNN algorithm including:

1.     The method uses Neural Networks for constructing a multi-layer feed-forward network that implements exactly a 1-NN rule. The advantage of this approach is that the resulting network can be implemented efficiently. The disadvantage is that the training time can’t grow exponentially for high dimensional pattern spaces, which could make it impractical.


2.     A CUDA implementation of the “brute force” kNN search described in [6] is performed. The advantage of this method is the highly parallelizable architecture of the GPU.


In this paper, the “brute force” kNN is studied and implemented on CPU rather than a GPU where the degree of parallelism is indicated by the number of available cores or processors. The proposed algorithm is not expected to outperform state-of-the art GPU implementations but rather, to provide an equivalent performance on CPU. Hence, the benefit becomes the ability of load sharing between CPU and GPU without degradation or loss of speed upon switching between any of the two processor architectures.



The nature of the brute force kNN algorithm can be assumed to be highly parallelizable[2] by nature, since computation of the distance between the input sample and any single training sample is independent of the distance computation to any other sample. This allows for partitioning the computation work with least synchronization effort. In fact, no inter communication or message passing is required at all during the time each processor is computing the distance between samples in its local storage and the input sample. When all processors terminate the distance computation procedure, the final step is to select a master processor to collect the results from all processors, sort the distances in ascending order, and then use the first 8 measures to determine the class of the input sample.


The proposed algorithm is described in the following steps:

1.     Select 1 processor to be the master, the other N-1 processors are slaves.

2.     Master divides the training samples to N subsets, and distributes 1 subset for each processor, keeping 1 subset for local processing (Master participates in distance computation too).

3.     Each individual processor now computes the distance measures independently and storing the computes measures in a local array

4.     When each processor terminates distance calculation, the processor transmits a message to the master indicating end of processing

5.     Master then notes the end of processing for the sender processor and acquires the computes measures by copying them into its own array

6.     After the master has claimed all distance measures from all processors, the following steps are performed:

a.      Sort all distance measures in ascending order

b.     Select top k measures

c.      Count the number of classes in the top k measures

d.     The input element’s class will belong the class having the higher count among top k measures



The goal of this experiment was to study the performance of parallelizing KNN on CPU for large data sets versus parallel GPU implementations. Iris database was used to train the system which is “perhaps the best known database to be found in the pattern recognition literature” [7]. The data set contains three classes of fifty instances each, where each class refers to a type of iris plant. Five attributes are present in this database which are Sepal Length, Sepal Width, Petal Length, Petal Width, and the class label attribute which can contain three values: “Iris-setosa”, “Iris-virginica” and “Iris-versicolour”.  These datasets were preprocessed before running the algorithm by building new data structures so that they can fit in memory. However, given the small number of records in the Iris database, the experiment would not reflect solid results, thus all the 50 records were cloned and randomly appended 1000 times on to a new larger Iris database of 50,000 total records. The experiment was implemented on an Intel 8-core machine and the obtained results were compared with respective implementation on a 64-core GPU. In order to accommodate for the biased number of cores on the GPU, the degree of parallelism was set to 8 in order to properly compare with the number of cores on the CPU, so 8 cores were used out of the available 64-cores. By applying the above mentioned parallel procedures, the CPU program was able to compete with the GPU showing equivalent performance but outperforming the GPU after repeating the test for several times. This was due to cache locality because after several repeated runs, chances of cache misses became less frequent and therefore fetching time from memory incurred by continuous trips was diminished. This also improved Bus utilization and hence power consumption was reduced.  These results were expected because of the nature of KNN is that it highly parallelizable and it scales well with many-core architectures. Complexity of the parallel algorithm is O(D/p) where D is the number of records and p is the number of available cores. Hence, given p processors, complexity reduces to constant time O(1).




In this paper parallel KNN algorithm was implemented by applying a new approach to facilitate computation of the distance measures of all data points. The implementation of the parallel technique reduced the running time of the algorithm on CPU which would make the algorithm a faster, more efficient than the serial kNN and competitor to state-of-the art GPU based implementation.

Parallel Apriori algorithm for frequent pattern mining

Apriori is a frequent pattern mining algorithm for discovering association rules. It is one of the most well-known algorithms for discovering frequent patterns along with FP-Growth algorithm. However, as a result of the current advances in the area of storage of very large databases and the tremendous growth in number of transactions, sequential Apriori becomes a bottleneck because of the long running time of the algorithm. In this paper, our goal is to develop a new parallel version of Apriori that aims to reduce the overall running time of the algorithm. Although Apriori is not known to be highly parallelizable, several attempts have been made to parallelize it in various ways, either by using parallel I/O hardware to optimize database scans or by distributing the workload on multiple processors. However, many of the parallel approaches suffer from noticeable latencies in synchronizing results being collected from each individual processor after a parallel iteration terminates. Our approach focuses on trying to maximize the workload being executed in parallel and to minimize the synchronization point delays by adopting a parallel pre-computing scheme during generation of the superset. By applying our new approach, the running time of the algorithm is reduced by an order of magnitude compared to other parallel implementations of the same algorithm.


Apriori is a frequent pattern mining algorithm for discovering association rules originally developed by Rakesh Agrawal and Ramakrishnan Srikant[4]. It operates on a list of transactions containing items (for example, products in a supermarket). Frequent occurrences of items with each other are mined by Apriori to discover relationship among different items. A single transaction is called an Itemset. Apriori uses a minimum support value as the main constraint to determine whether a set of items is frequent. In the first pass of the algorithm, it constructs the candidate 1-itemsets. The algorithm then generates the frequent 1-itemsets by pruning some candidate 1-itemsets if their support values are lower than the minimum support. After the algorithm finds all the frequent 1-itemsets, it joins the frequent 1-itemsets with each other to construct the candidate 2-itemsets and prune some infrequent itemsets from the candidate 2-itemsets to create the frequent 2-itemsets. This process is repeated until no more candidate itemsets can be created.

Apriori has a bottleneck when the number of transactions is very large since multiple database scans will be performed by apriori in every iteration. In order to improve the scalability of Apriori, several data structures and methods were constructed that can boost performance. These include:

Transaction Reduction: Atransaction that does not contain any frequent k-itemsets cannot contain any frequent (k+1)-itemsets. Therefore, such a transaction can be marked or removed from further consideration

Partitioning: A partitioning technique can be used that requires just two database scans to mine the frequent itemsets. It consists of two phases. In Phase I, the algorithm subdivides the transactions of D into n nonoverlapping partitions. If the minimum support threshold for transactions in D is min sup, then the minimum support count for a partition is minsup, the number of transactions in that partition. For each partition, all frequent itemsets within the partition are found. These are referred to as local frequent itemsets. In Phase II, a second scan of D is conducted in which the actual support of each candidate is assessed in order to determine the global frequent itemsets. Partition size and the number of partitions are set so that each partition can fit into main memory and therefore be read only once in each phase.

Sampling: The basic idea of the sampling approach is to pick a random sample S of the given data D, and then search for frequent itemsets in S instead of D. In this way, we trade off some degree of accuracy gainst efficiency. The sample size of S is such that the search for frequent itemsets in S can be done in main memory, and so only one scan of the transactions in S is required overall.

Dynamic itemset counting: A dynamic itemset counting technique [1] was proposed in which the database is partitioned into blocks marked by start points. In this variation, new candidate itemsets can be added at any start point, unlike in Apriori, which determines new candidate itemsets only immediately before each complete database scan. The technique is dynamic in that it estimates the support of all of the itemsets that have been counted so far, adding new candidate itemsets if all of their subsets are estimated to be frequent.

However, it seems that even with the above mentioned approaches, performance of Apriori decreases substantially when large number of transactions is involved. Therefore, many parallel approaches have been conducted to increase the speed of the Apriori algorithm including:

1. Partitioning the transactions into N partitions where N is the number of processors then for each iteration computes local support then exchange results with all N-1 processors to generate the next superset.[2]

2. Use a master/slave scheme by selecting one processor to be the master and N-1 processors to be working processors. In this approach, the input transactions are not partitioned but the master processor generates unique itemsets and divides the list of itemsets equally to the N-1 processors to compute their support. Once computations of supports complete, all N-1 processors send their results to the master in order to compute the next superset and divide the work further.[3]

In this paper, a variation of the 2nd approach is studied and implemented, where an additional data structure is used to predict itemsets of the next iteration and cache them for later retrieval thus reducing database scans. The 2nd approach is selected because it provides high flexibility with the way itemsets get computed between different cores and mainly because of the similarity of the master-slave architecture between this method and the proposed one.


Our approach focuses on the algorithm part of Apriori by trying to maximize the workload being executed in parallel and to minimize, as much as possible, the synchronization point delays by allowing a portion of the superset generation forming the candidates to be generated in parallel as well. Only the remaining part of the superset will be generated at the synchronization point.

The proposed algorithm is described in the following steps:

1.     Select one processor to be the master, the other N-1 processors are slaves

2.     Create a local Hash Table, at the master; we refer to it as G during the remaining part of this paper.

3.     Initialize K, the itemset size, to 1

4.     Generate the 1-itemsets normally using the master processor only

5.     At the master, divide the k-itemsets into N-1 groups and assign each group to exactly 1 of the available N-1 processors. Then for each processor:

a.      Lookup the support of each local itemset in G, if it’s not found, compute support.

b.     Prune itemsets that have their support less than the minimum support where minimum support is the threshold value to decide upon whether to drop or keep an itemset. It is also global to the entire system.

c.      Generate the candidates of the K+1 itemsets from the local items and find their support.

d.     Store the results of the K+1 itemsets (the itemset and its support) into G.

e.      Send the master processor a signal indicating end of processing.

6.      At the master processor, at the end of each arrival of a finalization signal:

a.      Generate the K+1 superset using itemsets in G.

7.     Increment K by 1.

8.     Go to step 2 and repeat until there are no more itemsets to generate.


The goal of this experiment was to study the performance of parallelizing Apriori on CPU for large data sets. Iris database was used to train the system which is “perhaps the best known database to be found in the pattern recognition literature” [7]. The data set contains three classes of fifty instances each, where each class refers to a type of iris plant. Five attributes are present in this database which are Sepal Length, Sepal Width, Petal Length, Petal Width, and the class label attribute which can contain three values: “Iris-setosa”, “Iris-virginica” and “Iris-versicolour”.  These datasets were preprocessed before running the algorithm by building new data structures so that they can fit in memory. However, given the small number of records in the Iris database, the experiment would not reflect solid results, thus all the 50 records were cloned and randomly appended 1000 times on to a new larger Iris database of 50,000 total records. The experiment was implemented on an Intel 8-core machine and the obtained results were recorded were compared to performance statistics of state-of-the art Apriori. By applying the above mentioned parallel procedures, the CPU program was able to compete with the state-of-the art Apriori. Results showed that the proposed method is as fast as state-of-the art method but is better regarding BUS utilization where 75% bus utilization was incurred by state-of-the art apriori and 34% bus utilization was incurred by our proposed method. This greatly reduced total power consumption of the algorithm.


In this paper parallel Apriori algorithm was implemented by applying a new approach using intermediate data structure for computing and caching k+1 superset for use in future iterations to facilitate computation of the superset in parallel and increase degree of parallelism. The implementation of the parallel technique showed competing results with state-of-the art apriori in speed but outperformed it in bus utilization by 40% the running time of the algorithm which would make the algorithm efficient for energy consumption than the current best Apriori.