Performance issues in 2D array traversal

So whats so special about traversing a 2D array. From the view of a developer, it’s just a couple of nested loops over the width and height of the array and applying some computation over each element. For example, consider the problem of finding the sum of numbers in a 2D array which involves creating two nested loops to read the value of each number and adding it to a global sum. Another problem that heavily involves using 2D arrays is matrix multiplication using the naive matrix multiplication algorithm which requires nesting of 3 loops. Although faster algorithms were developed for matrix multiplication, such as the Strassen algorithm which is the fastest known algorithm until today. However, in this post, we will focus on computing the sum of a 2D arrays to simplify the problem and show the performance drawbacks that might be incured if the developer is not aware about the programming language used. Without any further due, lets take a look at the following example. As a first step we will generate a random 2D integer array of 13000 rows and 13000 rows, a total of 169000000 integers and filling the array with random values using the Rnd() function:

    Private Function GenerateRandomArray() As Integer(,)
        Dim ar(12999, 12999) As Integer
        For i As Integer = 0 To ar.GetLength(0) - 1
            For j As Integer = 0 To ar.GetLength(1) - 1
                ar(i, j) = Rnd() * 1000
            Next
        Next
        Return ar
    End Function

We proceed to compute the sum of the array traversing over the rows in the outer loop and the columns in the inner loop. The implementation is simple as shown below:

    Private Function RowMajorSum(ByVal ar(,) As Integer) As Long
        Dim sum As Long = 0
        For i As Integer = 0 To ar.GetLength(0) - 1
            For j As Integer = 0 To ar.GetLength(1) - 1
                sum += ar(i, j)
            Next
        Next
        Return sum
    End Function

Running this function on my personal laptop showed that the sum is computed correctly. I also used a stop watch to record the time it took for the function to complete:

        Dim time As New Stopwatch
        time.Start()
        Dim sum1 As Long = RowMajorSum(ar)
        time.Stop()
        Label1.Text = "Time:" & time.ElapsedMilliseconds & ", Sum:" & sum1

Label1 displayed the following:

Time:2109, Sum:84513231237

We will now create a second function which does the same as the first one, but now the order of traversal will be reversed so that the outer loop iterates over the columns and the inner loop iterated over the rows:

    Private Function ColumnMajorSum(ByVal ar(,) As Integer) As Long
        Dim sum As Long = 0
        For i As Integer = 0 To ar.GetLength(1) - 1
            For j As Integer = 0 To ar.GetLength(0) - 1
                sum += ar(j, i)
            Next
        Next
        Return sum
    End Function

Running this function on the same machine again produces the following:

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        Dim time As New Stopwatch
        time.Start()
        Dim sum2 As Long = ColumnMajorSum(ar)
        time.Stop()
        Label1.Text = "Time:" & time.ElapsedMilliseconds & ", Sum:" & sum2
    End Sub
Time:3402, Sum:84513231237

As you can see, the second function took almost twice the time! although the idea of computing the sum is the same. So what really happened under the hoods of this program turns out to be quiet interesting. In fact, because we are using Vb.net, the RowMajor function was faster than the ColumnMajor one. Had we used Fortran or Matlab to do the same thing, the results would be the opposite!
The reason is related to how the numbers are stored in memory. In the .Net CLR, arrays are stored in Row-Major order, that is they are laid contiguously in linear memory. So for example, if we have the following array:

Languages that adopt the Row-Major order method will have the numbers laid out like this:

 1  2  3  4  5  6

While languages that adopt the Column-Major order method will have the same number laid out like this:

1  4  2  5  3  6

So how does this relate to the performance difference we got earlier. In the RowMajorSum method, every time we read an item from the array, the value being read and some neighboring values are retrieved into the cache. In reality, cache line size varies between 64 bytes and 128bytes. On my machine the cache line size is 64 bytes, so 16 integers are retrieved from main memory and cached so that when the next number is retrieved, no trip is required to main memory, instead it will be retrieved from the cache and we know that cache access is much faster than main memory access. So for our example, 13000*13000/16 memory accesses are required to finalize the sum computation. In the ColumnMajorSum method, the cache behaves in the same way, but unfortunately the next number being read will never exist on the cache because the layout of the numbers in memory are contiguously laid out in rows so this method will end up going every time to main memory to read the next value so the required memory accesses in this case is about 13000*13000 .

As a conclusion, I want to highlight on the issue that 2D array traversal is an important issue and choosing which vectors to be on the outer and inner loop is critical. Developers need to take a closer look on the type of application they are working on and have a deeper knowledge of the programming language they are using. In a future post, i will explain some tips and tricks to overcome cache problems and explain how to better take advantage of the cache.

Wolfram Alpha

Wolfram Alpha is a “computational knowledge engine” developed by Stephen Wolfram for the purpose of finding definitive answers to questions and presenting them with maximum clarity. It is an online service that computes the answers to the questions submitted by the users in natural language by using a pre-structured database. Here “compute” word is used rather than “find” because this service doesn’t simply return documents that might contain the answers, like Google does, and it isn’t just a giant database of knowledge, like the Wikipedia. Instead, it understands and then computes answers to certain kinds of questions, like questions that have factual answers such as “What is the location of Timbuktu?” or “How many protons are in a hydrogen atom?” or  “What was the average rainfall in Boston last year?“. It computes answers instead of just looking them up.

Wolfram Alpha was announced in March 2009 by Stephen Wolfram, and was released to the public on May 15, 2009.

Wolfram Alpha perhaps represents what may be a new approach to creating an “intelligent machine”. It’s simpler than top down artificial intelligence and easier than the original vision of Semantic Web. Its goal is to build on the achievements of science to provide a single source that can be relied on by everyone for definite answers to questions.

Algorithm:

Enabling Wolfram Alpha to do its computations required the implementation of tens of thousands of algorithms. Some are as simple as the quadratic formula; others are among the most sophisticated intellectual actions of our time. These algorithms were assembled under the help of mathematica.

Mathematica is a remarkably efficient programming language in which complex algorithms can be implemented. Complex computational processes can be represented using arbitrarily structured symbolic expressions that are easy for those who don’t know programming. As a result, the five million lines of Mathematica code that make up Wolfram Alpha are equivalent to many tens of millions of lines of code in a lower-level language like C or Java.

Mathematica was the starting point for building Wolfram Alpha that made use of a comparable number of its built in algorithms that are considered as some of the most sophisticated ever developed, and then added to it additional algorithms. These algorithms cover wide areas including logical, numerical, graphical, symbolic, and other computations.

Wolfram’s team manually entered, and in some cases automatically pulled in, masses of raw factual data about various fields of knowledge, plus models and algorithms for doing computations with the data. By building all of this in a modular fashion on top of the Mathematica engine, they have built a system that is able to actually synthesize sophisticated computations from simple computations over vast data sets representing real-world knowledge.

Wolfram Alpha would have been much difficult to be made without Mathematica. In fact the easiest way to create Wolfram Alpha without mathematica is to write mathematica, and this has been done in the past 23 years.

How to use it:

Enter to this website http://www.wolframalpha.com then type in the textbox the required question to be solved. If the “Show Steps” button is pressed, then a step by step explanation of how the answer is computed would be shown.

Final thoughts:

There is no doubt that that Wolfram alpha is one of the world’s most successful data mining projects ever developed. Despite winning the world’s best project in 2009, there are still many questions to be answered. There is the question of how it will be able to keep up with all the new knowledge in the world. And there is the question of bias; is there any risk of bias in the answers the system gives because all the knowledge is entered by Wolfram’s team? There is not necessarily one right answer — there are valid alternative perspectives and there are potential biases in the answers one might come up with, depending on the data sources used to compute them.

Wolfram Alpha will continue to grow and will begin to see some interesting new intelligent applications. As of now, Wolfram Alpha contains 10+ trillion pieces of data, 50,000+ types of algorithms and models, and linguistic capabilities for 1000+ domains. Built with Mathematica—which is itself the result of more than 20 years of development at Wolfram ResearchWolfram Alpha‘s core code base now exceeds 5 million lines of symbolic Mathematica code. Running on supercomputer-class compute clusters, Wolfram Alpha makes extensive use of the latest generation of web and parallel computing technologies, including webMathematica and gridMathematica.

References: