Face Recognition: Image Preprocessing
February 27, 2011 Leave a comment
Human eyes have about 10:1 absolute range from fully adapted dark vision to fully adapted lighting conditions at noon on the equator. We can see about 3X10:1 range of luminance when we are adapted to a normal working range. Due to the limited dynamic ranges of current imaging and display devices, images captured in real world scenes with high dynamic ranges usually exhibit poor visibility of either over exposure or shadows and low contrast, which may make important image features lost or hard to tell by human eyes. Computer vision algorithms also have difficulty processing those images. To cope with this problem, various image processing techniques have been developed. Some of those techniques are spatially-independent methods, like gamma adjustment, logarithmic compression, histogram equalization (HE), and levels/curves methods.
Human face detection plays an important role in applications such as intelligent human computer interface (HCI), biometric identification, and face recognition. The goal of any face detection technique is to identify the face regions within a given image. The reliable detection of faces has been an ongoing research topic for decades. There are several face detection techniques proposed in the literature both in gray scale and color. The appearance based algorithms process gray scale images. A typical color based face detection system on the other hand would first do a skin color region extraction on color images based on either pixel based or a combination of pixels and shape based systems in different color spaces. The next step would in general be region merging followed by classification or application of any appearance-based method to classify the skin color regions into faces and non-faces by converting them into gray scale images.
In photography and computing, a grayscale or grayscale digital image is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest.
Grayscale images are distinct from one-bit black-and-white images, which in the context of computer imaging are images with only the two colors, black, and white (also called bi-level or binary images). Grayscale images have many shades of gray in between. Grayscale images are also called monochromatic, denoting the absence of any chromatic variation.
Grayscale images are often the result of measuring the intensity of light at each pixel in a single band of the electromagnetic spectrum (e.g. infrared, visible light, ultraviolet, etc.), and in such cases they are monochromatic proper when only a given frequency is captured. But also they can be synthesized from a full color image; see the section about converting to grayscale.
2.1 Numerical representations
The intensity of a pixel is expressed within a given range between a minimum and a maximum, inclusive. This range is represented in an abstract way as a range from 0 (total absence, black) and 1 (total presence, white), with any fractional values in between. This notation is used in academic papers, but it must be noted that this does not define what “black” or “white” is in terms of colorimetry.
Another convention is to employ percentages, so the scale is then from 0% to 100%. This is used for a more intuitive approach, but if only integer values are used, the range encompasses a total of only 101 intensities, which are insufficient to represent a broad gradient of grays. Also, the percentile notation is used in printing to denote how much ink is employed in halftoning, but then the scale is reversed, being 0% the paper white (no ink) and 100% a solid black (full ink).
In computing, although the grayscale can be computed through rational numbers, image pixels are stored in binary, quantized form. Some early grayscale monitors can only show up to sixteen (4-bit) different shades, but today grayscale images (as photographs) intended for visual display (both on screen and printed) are commonly stored with 8 bits per sampled pixel, which allows 256 different intensities (i.e., shades of gray) to be recorded, typically on a non-linear scale. The precision provided by this format is barely sufficient to avoid visible banding artifacts, but very convenient for programming due to a single pixel occupies then a single byte.
Technical uses (e.g. in medical imaging or remote sensing applications) often require more levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to guard against round off errors in computations. Sixteen bits per sample (65,536 levels) is a convenient choice for such uses, as computers manage 16-bit words efficiently. The TIFF and the PNG (among other) image file formats supports 16-bit grayscale natively, although browsers and many imaging programs tend to ignore the low order 8 bits of each pixel.
No matter what pixel depth is used, the binary representations assume that 0 is black and the maximum value (255 at 8 bpp, 65,535 at 16 bpp, etc.) is white, if not otherwise noted.
2.2 Converting color to grayscale
Conversion of a color image to grayscale is not unique; different weighting of the color channels effectively represents the effect of shooting black-and-white film with different-colored photographic filters on the cameras. A common strategy is to match the luminance of the grayscale image to the luminance of the color image.
To convert any color to a grayscale representation of its luminance, first one must obtain the values of its red, green, and blue (RGB) primaries in linear intensity encoding, by gamma expansion. Then, add together 30% of the red value, 59% of the green value, and 11% of the blue value (these weights depend on the exact choice of the RGB primaries, but are typical). Regardless of the scale employed (0.0 to 1.0, 0 to 255, 0% to 100%, etc.), the resultant number is the desired linear luminance value; it typically needs to be gamma compressed to get back to a conventional grayscale representation.
This is not the method used to obtain the luma in the Y’UV and related color models, used in standard color TV and video systems as PAL and NTSC, as well as in the L*a*b color model. These systems directly compute a gamma-compressed luma as a linear combination of gamma-compressed primary intensities, rather than use linearization via gamma expansion and compression.
To convert a gray intensity value to RGB, simply set all the three primary color components red, green and blue to the gray value, correcting to a different gamma if necessary.
3. Histogram equalization
A histogram can represent any number of things, since its sole purpose is to graphically summarize the distribution of a single-variable set of data1. Each specific use targets some different features of histogram graphs, and when it boils down to image analysis, histograms are the de facto standard.
When viewing an image represented by a histogram, what we’re really doing is analyzing the number of pixels (vertically) with a certain frequency (horizontally). In essence, an equalized image is represented by an equalized histogram where the number of pixels is spread evenly over the available frequencies.
An overexposed image is defined as an image in which there are an excessive number of pixels with a high pixel frequency, while there is a shortage of low frequencies (Figure 3.2). The data in a histogram representing an overexposed image therefore is not spread evenly over the horizontal axis, instead skewing non-symmetrically to the absolute right edge of the graph (Figure 3.3).
Usually, when the number of pixels with very high pixel frequencies is so high – as shown in the example – it means that some image data has been lost; it is then impossible to restore detail in areas where the pixel frequencies have been cropped to a maximum value.
The same, of course, goes for underexposed images (Figure 3.4); images where the histogram is skewed non-symmetrically to the absolute left edge of the graph (Figure 3.5).
There are situations where we would want to reveal detail in an image that cannot easily be seen with the naked eye. One of the several techniques to enhance an image in such a manner is histogram equalization, which is commonly used to compare images made in entirely different circumstances. Extreme examples may include comparisons of photographs with different exposures, lighting angles and shadow casts3.
The concept of histogram equalization is to spread otherwise cluttered frequencies more evenly over the length of the histogram. Frequencies that lie close together will dramatically be stretched out. These respective areas of the image that first had little fluctuation will appear grainy and rigid, thereby revealing otherwise unseen details.
A histogram equalization algorithm will determine the ideal number of times each frequency should appear in the image and, theoretically, re-plot the histogram appropriately. However, because image data isn’t stored in an analogue manner, this is usually not possible. Instead, the image data is stored digitally and limited to n bits of color depth, thus the image cannot be requantified to meet our requirement.
Nevertheless, by ensuring that the number of times a frequency in a certain range remains as close to the ideally equalized histogram as possible, we can work around the issue of requantification. The solution is simple and elegant.
The ideal number of pixels per frequency i is the total number of pixels in the image divided by the total number of possible image frequencies N. The algorithm counts the frequencies from 0 to N and shifts as many pixel frequencies into that position as long as this number of pixels is less than or equal to a certain delimiter that increases linearly to
the frequency. If a pixel frequency doesn’t fit, it is pushed to the right along the horizontal axis until a place is found.
In the simplest scenario, histogram equalization is used on grayscale images. However, by converting a RGB image to HSV, we can equalize the Value channel without altering the Hue or Saturation. After converting the result back to RGB, a properly equalized image is produced (Figure 3.6 and Figure 3.7).
4. Dynamic Range Compression
We have used a sigmoid transfer function for increasing the dynamic range of an image. A hyperbolic tangent function is used for the reason of overcoming the natural loss in perceived lightness contrast that results when performing dynamic range compression. We have developed an enhancement strategy that will perform the range compression while maintaining the image details. The proposed solution is to develop the hyperbolic tangent functions that are tunable based on the statistical characteristics of the image. That is, the function will enhance the dark part of the image while preserving the light part of the image based on:
Where x, y the V component pixel is value in HSV color space and 0 ≤ I x,y ≤ 255 at (x, y) location of the image, ρ is the statistics of the image, and Ix,y is the enhanced pixel value which is normalized. The parameter ρ controls the curvature of the hyperbolic tangent function. This means that when the processing image is dark, ρ should be small and therefore the curvature of the hyperbolic tangent function will be steep and this will help the darker pixels to have brighter values. ρ can be expressed as:
Where x, y is the local mean of an image and k is the bias pixel intensity value. The local mean of each pixel is calculated based on the center surrounded property (k = 3) of a perceptual field and perceptual processes of human vision. The form of the surround function we used is Gaussian because it provides good dynamic range compression over a wide range of environments. Consequently, the local mean of the image is calculated by:
Where sis the standard deviation of the Gaussian distribution and c is selected so that òò x, y dxdy=1.The choice of spresents a tradeoff between the dynamic range compression and color rendition of the image
A smaller s will yield larger dynamic range compression but causes the image to lose its color. Conversely, a larger s will yield better color rendition but the shadow of the image will remain constant. Fig. 4.1 illustrates the variability of the hyperbolic tangent function based on Eq. (1) to (3). The output intensity range is converted to [0 255]. It can be observed that when the local mean of an image is small, the hyperbolic tangent function reshapes its curve towards the brighter pixel value to facilitate the rescaling of the range of the dark pixel to the brighter region. Conversely, when the local mean of an image is large, the hyperbolic tangent function compresses the brighter pixels to the darker region.
5. Contrast Enhancement
The dark shadows in images can be brightened while the local intensity contrast will be degraded using Eq. (1) – (3) because the nonlinear dynamic range compression decreases the intensity variation when darker pixels are brightened more with a larger ‘accelerate factor’ than those of lighter pixels. Fig. 5.1 illustrates the d
egradation of image contrast compared to that of original images (e.g. the clouds and the sky) due to the dynamic range compression. In order to improve the visual quality of images produced by the dynamic range compression, a contrast enhancement method is used to enhance the local contrast of these images. Therefore, after dynamic range compression and contrast enhancement, the visual quality of the original images with shadows created by high dynamic range scenes can be largely improved. In addition, enhancing the local contrast can also be useful for improving the performance of convolutional face finder, which is sensitive to local intensity variation (e.g. first and second derivative image information).
In the proposed contrast enhancement algorithm, the local intensity variation Iv is defined as in:
Where Ix, y and Iavg are the intensity enhanced image and its low-pass version, respectively. Iavg is computed using 2D convolution with a Gaussian kernel in Eq. (3) in which 5≤ s≤10 by experiments. Iv, the difference between Ix, y and Iavg, can be either positive or negative, which accordingly represents a brighter or darker pixel compared with its neighbor pixels. The magnitude of Iv determines the local contrast of an image: larger magnitude indicates higher contrast and vice versus. Therefore, increasing the magnitude of Iv is an effective way to enhance local image contrast. The proposed contrast enhancement technique increases the magnitude of Iv using the power law operation as described in:
Β is tunable for adjusting the image contrast and β< 1. Where β has a default value of 0.75. Since β can be either an odd or even number, |Iv| instead of Iv is used to keep the result of Eq.
(5) Positive. Eq. (5) can increase low contrast (small |Iv|) while preserving high contrast (large |Iv|) because 0 ≤|Iv|≤1 Based on the result of |Iv,EN| and the sign of Iv, the enhanced local intensity variation Iv,EN can be obtained by restoring the sign no matter β is odd or even:
where sign(.) is defined as:
Finally, the intensity image (Ic,EN) with enhanced local contrast can be achieved by adding Iv,EN to Iavg as in:
Where the maximum of (Iv,EN + Iavg) is used to normalize (Iv,EN + Iavg) because (Iv,EN + Iavg) can be larger than 1.
The local contrast enhancement process can be illustrated using the images shown in Fig. 5.2. The luminance enhanced intensity image (Ix, y) and its local averaging result (Iavg) are shown in Fig. 5.1(b) and 5.1(a), respectively. Fig. 5.2(b) shows the magnitude image of |Iv|, the ‘bright’ regions represent those pixels which are either brighter or darker tha its neighboring pixels in the luminance enhanced intensity image (Ix, y). |Iv,EN|, the enhanced result of |Iv|, is shown in Fig. 5.2(c) where the edges (or features) are more obvious than those in Fig. 5.2(b), which indicates larger intensity variation compared to that represented by |Iv|. The final result of the local contrast enhancement is presented in Fig. 5.2(d) where image details are improved greatly due to the contrast enhancement algorithm defined by Eq.(4) – (8).
6. Color Remapping
For color images, a linear color remapping process based on the chromatic information of the original image is applied to the enhanced intensity image to recover the RGB color bands (r’, g’, b’) as in:
Where τ is the V component pixel value of HSV color space, which essentially is the maximum value among the original r, g and b values at each pixel location. In this case, the ratio of the original r, g and b can be maintained by applying the above linear color remapping. Hence, the color information of hue and saturation in the original image is preserved in the enhanced color image. One example of color image enhancement is presented in Fig. 6.1. The color consistency can be observed between the original image and enhanced image. Observe also that the local contrast of the enhanced image is capable of bringing out the fine details from the image.