Swap 2 variables without using temporary variable!

Perhaps the oldest technique most of programmers use, whether beginners or professional coders, to swap 2 variables a and b, is to declare a new temporary variable c, copy a to c then assign b to a and finally assign c to b. The following code should look familiar:

        Dim a As Integer = 10
        Dim b As Integer = 20
        Dim c As Integer = a
        a = b
        b = c

This technique is really simple and easy to understand and perhaps this is the main reason of its widespread use. However, there is a minor problem using this technique, which is the allocation of a new variable(c) which incurs more memory usage. Although its just 1 more variable overhead, a total of 4 bytes integer it still represents a major drawback for memory-critical applications or on machines where memory space is too low or limited(such as mobile devices). The first new technique is also simple and has the advantage of not allocating space for a new temporary variable. This method uses addition and subtraction in a clever way in order to keep track of the a and b:

        Dim a As Integer = 10
        Dim b As Integer = 20
        a = a + b
        b = a - b
        a = a - b

Lets analyze this code line by line:

  1. a=10
  2. b=20
  3. a= 10 + 20 = 30
  4. b= 30 – 20 = 10
  5. a= 30 – 10 = 20
  6. Final result: a=20 and b=10

This is really a very nice and smart way of swapping. You can encapsulate it in a function that takes two arguments by reference so that you don’t have to remember this arithmetic steps each time but I will leave this as an exercise to the reader:). Now let’s take a look at the next method which involves multiplication and division instead of addition and subtraction:

        Dim a As Integer = 10
        Dim b As Integer = 20
        a = a * b
        b = a / b
        a = a / b

Again,Lets analyze this code line by line:

  1. a=10
  2. b=20
  3. a= 10 * 20 = 200
  4. b= 200 / 20 = 10
  5. a= 200 / 10 = 20
  6. Final result: a=20 and b=10

I have created a console application and added all the 3 methods then I surrounded each method with a very long loop in order to simulate a compute intensive operation in order to measure the speed of each one using a Stopwatch. The complete code looks like this:

    Sub Main()
        Dim a As Integer = 10
        Dim b As Integer = 20
        Console.WriteLine("Original values")
        Console.WriteLine("a: {0}, b: {1}", a, b)
        Dim s As New Stopwatch
        s.Start()
        For i As Integer = 0 To 10000002
            a = a + b
            b = a - b
            a = a - b
        Next
        s.Stop()
        Console.WriteLine("Swap using addition subtraction")
        Console.WriteLine("a: {0}, b: {1}, Time: {2}", a, b, s.ElapsedMilliseconds)

        a = 10
        b = 20

        s.Restart()
        For i As Integer = 0 To 10000002
            Dim c As Integer = a
            a = b
            b = c
        Next

        s.Stop()
        Console.WriteLine("Swap with temporary variable")
        Console.WriteLine("a: {0}, b: {1}, Time: {2}", a, b, s.ElapsedMilliseconds)

        a = 10
        b = 20

        s.Restart()
        For i As Integer = 0 To 10000002
            a = a * b
            b = a / b
            a = a / b
        Next
        s.Stop()
        Console.WriteLine("Swap using multiplication division")
        Console.WriteLine("a: {0}, b: {1}, Time: {2}", a, b, s.ElapsedMilliseconds)

        Console.ReadKey()
    End Sub

Running this application produced interesting measurements, 1 in particular that I did not expect:

As you can see, the method using a temporary variable outperformed the other two. I was expecting that the multiplication and division method to be the worst because of the division overhead but I thought that the addition and subtraction method will compete, but it looks like using the temporary variable is about twice faster. This can be explained because memory allocation in the .net framework is incredibly fast and the heap manager does some magic in allocation. In fact, memory allocation in .net is as fast as allocating memory in C using the malloc function and it’s even faster sometimes.

Final thoughts:
If you are using the temporary variable method for swapping variables, keep using it:) it’s the right thing to do unless you are a memory geek or developing applications for mobile devices then perhaps you should switch to the addition/subtraction method.

Performance issues in 2D array traversal

So whats so special about traversing a 2D array. From the view of a developer, it’s just a couple of nested loops over the width and height of the array and applying some computation over each element. For example, consider the problem of finding the sum of numbers in a 2D array which involves creating two nested loops to read the value of each number and adding it to a global sum. Another problem that heavily involves using 2D arrays is matrix multiplication using the naive matrix multiplication algorithm which requires nesting of 3 loops. Although faster algorithms were developed for matrix multiplication, such as the Strassen algorithm which is the fastest known algorithm until today. However, in this post, we will focus on computing the sum of a 2D arrays to simplify the problem and show the performance drawbacks that might be incured if the developer is not aware about the programming language used. Without any further due, lets take a look at the following example. As a first step we will generate a random 2D integer array of 13000 rows and 13000 rows, a total of 169000000 integers and filling the array with random values using the Rnd() function:

    Private Function GenerateRandomArray() As Integer(,)
        Dim ar(12999, 12999) As Integer
        For i As Integer = 0 To ar.GetLength(0) - 1
            For j As Integer = 0 To ar.GetLength(1) - 1
                ar(i, j) = Rnd() * 1000
            Next
        Next
        Return ar
    End Function

We proceed to compute the sum of the array traversing over the rows in the outer loop and the columns in the inner loop. The implementation is simple as shown below:

    Private Function RowMajorSum(ByVal ar(,) As Integer) As Long
        Dim sum As Long = 0
        For i As Integer = 0 To ar.GetLength(0) - 1
            For j As Integer = 0 To ar.GetLength(1) - 1
                sum += ar(i, j)
            Next
        Next
        Return sum
    End Function

Running this function on my personal laptop showed that the sum is computed correctly. I also used a stop watch to record the time it took for the function to complete:

        Dim time As New Stopwatch
        time.Start()
        Dim sum1 As Long = RowMajorSum(ar)
        time.Stop()
        Label1.Text = "Time:" & time.ElapsedMilliseconds & ", Sum:" & sum1

Label1 displayed the following:

Time:2109, Sum:84513231237

We will now create a second function which does the same as the first one, but now the order of traversal will be reversed so that the outer loop iterates over the columns and the inner loop iterated over the rows:

    Private Function ColumnMajorSum(ByVal ar(,) As Integer) As Long
        Dim sum As Long = 0
        For i As Integer = 0 To ar.GetLength(1) - 1
            For j As Integer = 0 To ar.GetLength(0) - 1
                sum += ar(j, i)
            Next
        Next
        Return sum
    End Function

Running this function on the same machine again produces the following:

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        Dim time As New Stopwatch
        time.Start()
        Dim sum2 As Long = ColumnMajorSum(ar)
        time.Stop()
        Label1.Text = "Time:" & time.ElapsedMilliseconds & ", Sum:" & sum2
    End Sub
Time:3402, Sum:84513231237

As you can see, the second function took almost twice the time! although the idea of computing the sum is the same. So what really happened under the hoods of this program turns out to be quiet interesting. In fact, because we are using Vb.net, the RowMajor function was faster than the ColumnMajor one. Had we used Fortran or Matlab to do the same thing, the results would be the opposite!
The reason is related to how the numbers are stored in memory. In the .Net CLR, arrays are stored in Row-Major order, that is they are laid contiguously in linear memory. So for example, if we have the following array:

Languages that adopt the Row-Major order method will have the numbers laid out like this:

 1  2  3  4  5  6

While languages that adopt the Column-Major order method will have the same number laid out like this:

1  4  2  5  3  6

So how does this relate to the performance difference we got earlier. In the RowMajorSum method, every time we read an item from the array, the value being read and some neighboring values are retrieved into the cache. In reality, cache line size varies between 64 bytes and 128bytes. On my machine the cache line size is 64 bytes, so 16 integers are retrieved from main memory and cached so that when the next number is retrieved, no trip is required to main memory, instead it will be retrieved from the cache and we know that cache access is much faster than main memory access. So for our example, 13000*13000/16 memory accesses are required to finalize the sum computation. In the ColumnMajorSum method, the cache behaves in the same way, but unfortunately the next number being read will never exist on the cache because the layout of the numbers in memory are contiguously laid out in rows so this method will end up going every time to main memory to read the next value so the required memory accesses in this case is about 13000*13000 .

As a conclusion, I want to highlight on the issue that 2D array traversal is an important issue and choosing which vectors to be on the outer and inner loop is critical. Developers need to take a closer look on the type of application they are working on and have a deeper knowledge of the programming language they are using. In a future post, i will explain some tips and tricks to overcome cache problems and explain how to better take advantage of the cache.

Finding the Majority element in an array

In its simplest terms, the Majority element is the element that occurs the most within a given array of numbers. Here, we mean by “the most” that the number of occurrences of the element must be more than half the size of the array. In mathematical terms, given an unsorted array of integers from 1 to n, where n is the length of the array, an element is a Majority if it occurs at least (n/2) + 1 times.

The Majority element is an important problem in computer science and is used as a basis for more complex algorithms. We also encounter the majority element in our daily life. For example, the ownership of a large company is determined by shareholders having more than half the total share. Another example is the elections in your country where the winner is determined by obtaining more than half the voices.

In this article, we will look at a fast algorithm that runs in linear time, in exactly O(n) + O(n) to find the Majority element. The algorithm was discovered by Rengakrishnan Subramanian in December 2001. His official paper can be found here.

The algorithm approaches the problem in two steps, in the first step we will call the method FindElement that takes an array of integers as an input and returns an element that is “most likely” to be Majority. In the second step, the algorithm will call FindMajority which takes the same array as an input along with the most probable element we determined earlier and will return the element if it turns out to be Majority, or null if its not.

The FindElement algorithm works as follows:

1-declare two variables, we’ll call the first one count and the second one canbeMajority to store the candidate Majority.

2-assume that the first number in the array is Majority and assign its value to canbeMajority.

3-traverse the input array from start to end while applying the following conditions:

  • If the next element in the array is equal to canbeMajority, then count is incremented by 1.
  • If the next element in the array is not equal to canbeMajority, then count is decremented by 1.
  • If at the next element, the count is equal to zero, then count is set to one and this element is considered as canbeMajority

Here is the implementation of FindElement and FindMajority in vb.net:

    Private Function FindElement(ByRef arr() As Integer) As Integer
        Dim count As Integer = 0
        Dim element As Integer = 0
        For i As Integer = 0 To arr.Length - 1
            If count = 0 Then
                count = 1
                element = arr(i)
            ElseIf arr(i) = element Then
                count += 1
            Else
                count -= 1
            End If
        Next
        Return element
    End Function

    Private Function FindMajority(ByRef arr() As Integer) As Integer
        Dim element As Integer = FindElement(arr)
        Dim count As Integer = 0
        For i As Integer = 0 To arr.Length - 1
            If arr(i) = element Then count += 1
        Next
        If count > arr.Length / 2 Then Return element Else Return Nothing
    End Function

I have created a simple application that reads or generates arbitrary array of numbers and find its Majority but I will leave the implementation of the application as an exercise to the reader :). However you can compare your results with mine by looking at the picture below

MajorityElement

MajorityElement