**By Daniel Lemire, University of Quebec**

Our processors benefit from “SIMD” instructions. These instructions can operate on several values at once, thus greatly accelerating some algorithms. Earlier, I reported that you can multiply the speed of common (fast) random number generators such as PCG and xorshift128+ by a factor of three or four by vectorizing them using SIMD instructions.

A reader challenged me: is this actually useful in practice?

There are problems where the generation of random number is critical to the performance. That is the case in many simulations, for example. The simplest and best known one is probably the random shuffling of arrays. The standard algorithm is quite simple:

for (i = size; i > 1; i--) { var j = random_number(i); switch_values(array, i, j); }

If you are interested, O’Neill wrote a whole blog post of this specific problem.

So can I accelerate the shuffling of an array using SIMD instructions?

So I threw together a vectorized shuffle and a regular (scalar) shuffle, both of them using O’Neill’s PCG32. The net result? I almost double the speed using SIMD instructions when the array is in cache:

SIMD shuffle | 3.5 cycles per entry |

scalar shuffle | 6.6 cycles per entry |

My code is available. I do not do anything sophisticated so I expect it is possible to do a lot better. My sole goal was to demonstrate that SIMD random number generators are practical.

**Bio: Daniel Lemire** is a computer science professor at the University of Quebec. His research is focused on software performance and indexing. He is a techno-optimist.

Original. Reposted with permission.

**Related:**

- The Surprising Complexity of Randomness
- Common Sense in Artificial Intelligence… by 2026?
- Is Deep Learning the Silver Bullet?