Parallel Sudoku Solver Algorithm

Parallel Sudoku Solver Algorithm

Ali Tarhini, Hazem Hajj

Department of Electrical and Computer Engineering

American University of Beirut

Beirut, Lebanon

{aat16, hh63}



Abstract—Sudoku is a complex yet exciting game. It is a 9 by 9 matrix divided to sub grids of size 3. The goal of the game is to fill in all the cells with numbers from 1 to 9 such that no two cells on the same row, column or minigrid have the same number. This problem can be generalized for larger size and the computational complexity to find a solution for Sudoku is high. In this paper, we propose a new parallel algorithm solver for the generalized NxN Sudoku grid that exploits the manycore architecture. Our test results show that the algorithm scales well as the number of cores grow.




Sudoku is a complex yet exciting game. It is a 9 by 9 matrix divided to sub grids of size 3. The goal of the game is to fill in all the cells with numbers from 1 to 9 such that no two cells on the same row, column or minigrid have the same number. This problem can be generalized for larger size and the computational complexity to find a solution for Sudoku is high. A Sudoku puzzle usually comes with a partially filled grid. The aim is to complete the grid in the shortest amount of time.

From a mathematical perspective, it has been proven that the total number of valid Sudoku puzzles is equal to 6, 670, 903, 752, 021,072,936,960 or
approximately 6.671×1021 [1].

Trying to populate all these grids is itself a difficult problem because of the huge number of possibilities and combinations of puzzles, Assuming each solution takes 1 micro second to be found, then with a simple calculation we can determine that it takes 211,532,970,320.3 years to find all possible solutions. If that number was small, say 1000, it would be very easy to write a Soduku solver application that can solve a given puzzle in short amount of time. The program would simply enumerate all the 1000 puzzles and compares the given puzzle with the enumerated ones and we don’t have to design a parallel algorithm for finding Sudoku solutions. Unfortunately, this is not the case since the actual number of possible valid grids is extremely large so it’s not possible to enumerate all the possible solutions. This large number also directly eliminates the possibility of solving the puzzle with brute force technique in a reasonable amount of time. Therefore, a method for solving the puzzle quickly will be derived that takes advantage of manycore architecture on modern and future computer systems which will bring down the time cost of solving an NxN Sudoku to a reasonable amount? Our goal is to allow for the algorithm to take advantage of the manycore architecture to further reduce its running time. This is very important and is at the heart of our work because for large N, the number of cells per row and column in the Sudoku grid, finding a solution becomes an extremely difficult and computationally intensive problem. Hence, the parallel algorithm must also scale as the grid size increases.




Many algorithms have been developed to solve the Sudoku problem. One of these developed algorithms is the simulated annealing algorithm [1]. The Simulated annealing algorithm works by first randomly filling all the blank cells then it checks if the same number is repeated twice in the same row or same column. If there’s a repeated number then it swaps this value with another and repeats checking for repeated values. The simulated annealing algorithm consumes much time to solve the puzzle and it is only applicable to easy Sudoku problems. Another proposed algorithm to solve the Sudoku puzzle follows the linear system approach [6]. The algorithm converts the puzzle into a number of liner equations and tries to solve them. There are many rules that we should keep in mind while transforming the Sudoku puzzle into a system of linear equations. These constraints are:

  1. There shouldn’t be any repeated value in the same row of a Sudoku puzzle.
  2. There shouldn’t be any repeated value in the same column of a Sudoku puzzle.
  3. There shouldn’t be any repeated value in the same 3*3 square of a Sudoku puzzle.
  4. All the numbers should be from 1 to 9.
  5. We must fill all the blank cells.

It is hard to solve the system for hard puzzles. This algorithm is only applicable for relatively easy Sudoku puzzles.

Artificial Bee Colony algorithm was invented recently to solve the Sudoku problem [3]. This algorithm has this name because it imitates the work done by the bees. The bees are divided into two sets, one set performs the real work while the other set supervises their work and receive information from them and based on that information they make decisions. In a similar way this algorithm tries to solve the Sudoku puzzle by following a parallel search approach where multiple blank cells can be filled while other cells would wait to receive information if those chosen cells values fit in that cells, and based on that they fill their values accordingly. Another approach for solving the Sudoku problem is by implementing the SAC algorithm [4]. This algorithm works by first assigning a set of potential values for every blank cell. Each cell has 9 optional values to choose from. The algorithm starts by choosing one of these potential values and assigning them to the cell and proceeds to other cells. If any rule is broken, the algorithm rolls back deletes this value from the set of potential values of the cell and chooses another value from the cell’s optional values. This algorithm follows a parallel approach to speed up solving the Sudoku problem, by allowing two cells to work in parallel for finding their correct values.




Our algorithm follows a depth first search approach with backtracking. However, backtracking on manycore is not an easy task since each core is processing a path independent of the other cores so it’s hard to determine what was the previous step and backtrack to it. We also need to determine the granularity of a task or a work item, that is, the smallest unit of work done by any core because each core will be handling a work item at any point during the running time of the algorithm. Defining the granularity clearly shows whether each work item is dependent on other work items or that it is totally independent. In this case, the work item is not dependant of previous inputs and does not require any kind of inputs from other cores in order to be processed.


Consider a work item to be the operation of setting a value of one cell in an NxN grid. Therefore, the problem space is N2 work items if the cells that initially have values already set are excluded. We will also define the operation LOCK which denotes a locking mechanism for the cell whose value is about to be changed. However, a problem arises when two cores attempt to update two different cells on the same row, on the same column or within the same minigrid. For example, if these two cells had only two possible values, when both cores take a lock over the cells, there is a probability that both cores assign the same value to the cells. Hence, the locking operation is not capable of resolving this issue and needs to be modified. The proposed algorithm works around this problem by defining LOCK as an operation that takes a lock over the entire row, column and minigrid so no other core could lock any cell within the same area. This area is referred to as the conflict boundary of a cell. The shaded area in Figure1 shows the conflict boundary (locked area) for the cell C(2,3) or C(2nd row, 3rd column).


The algorithm starts by splitting the work items equally among available cores. Given N cores and n2 work items, where N is the number of core and n is the width of the square grid, and to distribute the loads equally across the cores, each core is assigned n2/N work items. Each core will determine the set of possible values for each of its cells and store these possible values. Then, each core will sort the work items in ascending order according to the smallest set of possible values for each cell so that the cells that have least possible values will be processed first. This is a good heuristic because most likely there will be cells that have only one possible value and that can be directly set thus reducing the problem space and eliminating unneeded computations. It is also important even if the cell had two possible options because the algorithm will randomly pick one of the two values by random and assign it to the cell. The probability of choosing the wrong value here is 0.5 which is much better than if there were, for example, 4 possible values left for a cell that the algorithm had to choose from. In that case, the probability of choosing a wrong value is ¾ assuming the cell has only one correct answer.


Figure 1 The shaded area represents the locked region which is the conflict boundary required to be locked in order to update cell C(2,3).

Once the work items are assigned, the cores are ready to start processing their work items. A core P will pull a work item from its local queue, check its set of possible value and randomly pick one value and assign it to the cell after taking a lock on the cell’s conflict boundary. If P is unable to take a lock due to another core already locking another cell on the same conflict boundary, the work item is not processed and the work item is enqueued again while another work item is dequeued and processed. This process reduces the idle time for P since it eliminates the need for waiting until the other core releases its lock on the conflict boundary and allows P to process another work item hence maximizing the overall efficiency of the algorithm.


One problem remains when a core tries to set a value for a cell but does not find a value because the cell has no more possible values, meaning that the current values in all the cells in the grid yielded an invalid Sudoku puzzle. The core has to backtrack to a previous state and take another branch. Here, the state of the puzzle is defined as a stable instance of it, that is, a puzzle that could be solved and is not yet determined as unsolvable. In that case, a message is sent to the N-1 cores to stop processing any work items while P retrieves the previous puzzle from its local stack and distributes the new work items of that puzzle to the N-1 cores. Once, all cores receive their work items, P will try to set a value for the cell that previously had no solution. When it sets a value, P sends another message to the N-1 core indicating they may proceed in processing more work items if they are waiting. The algorithm terminates when there are no more work items to be processed.




The proposed algorithm is tested on an 8 core machine with 4 GB of physical memory on a windows based operating system. First the sequential version of the algorithm is run and it utilized, as expected, only one of the available cores. The sequential algorithm was given a partially filled Sudoku grid and the running time took 0.78 seconds for the 9 by 9 Sudoku. Then the parallel version of the proposed algorithm is run on the same puzzle. Monitoring the Task Manager in windows, we noticed the full utilization of all the cores as in the figure shown below. The overall running time of the algorithm took an average of 0.11 seconds of five repeated executions.


Figure 2 Utilization of all available cores by the parallel algorithm.




In this paper, we proposed a new parallel algorithm for solving an NxN Sudoku grid. The algorithm is designed to maximize the utilization of all cores available on the system. Testing and results section that show the efficiency of the algorithm and that it utilized all the available cores on the system. Future work for this algorithm will be testing it for manycore architecture and verifying if it will scale on manycore as it did on multicore.




“An FPGA-Based Sudoku Solver based on Simulated Annealing Methods”. Pavlos Malakonakis, Miltiadis Smerdis, Euripides Sotiriades, Apostolos Dollas International Conference on Field-Programmable Technology, 2009. FPT 2009.


[2] “Linear Systems, Sparse Solutions, and Sudoku”. Prabhu Babu, Kristiaan Pelckmans, Petre Stoica, Fellow, IEEE, and Jian Li, Fellow, IEEE. Signal Processing Letters, IEEE.


[3] “Artificial Bee Colony (ABC), Harmony Search and Bees Algorithms on Numerical Optimization” .D. Karaboga, B. Akay. Erciyes University, The Dept. of Computer Engineering, 38039, Melikgazi, Kayseri, Turkiye.


[4] “Coordinating Data Parallel SAC Programs with S-Net”. Clemens Grelck, Sven-Bodo Scholz, and Alex Shafarenko. Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International


[5] Bertram Felgenhauer, “Enumerating possible Sudoku grids”.


[6] S.F.Bammel, J.Rothstein, The number of 9×9 Latin squares, Discrete Mathematics 11 (1975) 93–95.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: