Multiprocessing

R. Perry, 10 Aug. 2018

Examples using python3 multiprocessing and the mpi4py MPI package (Message Passing Interface):

pinHash.py uses multiple processes on one processor node to search for a PIN corresponding to a hash.

pinHashMPI.py performs the search using multiple processes on multiple nodes.

Example hashes can be generated using genHash.py.

pinHash simulates a scenario where a cryptographic hash of the user password or PIN (Personal Identification Number) is stored on a smartcard or server. An attacker who steals the hash can not reverse it to find the PIN, but instead can try hashing all possible PINs to find the correct one using an exhaustive search.

Tests were run using compute nodes with 16 Intel Xeon 3.2 GHz cores, performing an exhaustive search over the space 10^#digits. Elapsed time is shown in seconds.

In these runs hyper-threading was disabled, so #CPUs is the number of physical cores used:

                      no MPI           with MPI
            #nodes  1        1        1        4
  #digits   #CPUs   1       16       16       64
  -------        ----     ----     ----     ----
     7           13.4     0.95     1.12     0.34
     8            134     9.15       10     2.53
     9              -       91      102       26

For each increase in #digits by 1, the run-time increases by about a factor of 10 as expected. With no MPI, using 16 CPUs provides a speedup of about 14 vs. 1 CPU. Using MPI with 1 node and 16 CPUs adds some overhead and is about 10% slower. Using MPI with 4 nodes and 64 CPUs provides a speedup of about a factor of 4 compared to 16 CPUs. For MPI #nodes is the number of worker nodes, one of which also served as the leader node for distributing the search space parameters and collecting the results.

With hyper-threading enabled, the number of CPUs used appears to be doubled, but the performance is only slightly better:

                      no MPI           with MPI
            #nodes  1        1        1        4
  #digits   #CPUs   1       32       32      128
  -------        ----     ----     ----     ----
     7           13.4     0.97     1.03     0.29
     8            136     9.17     9.32     2.37
     9              -       90       90       23

In this case hyper-threading seems to help by eliminating the overhead cost of using MPI. But the application tested here is highly parallel and has very little interprocess communication, so may not be typical.

Raw run logs: with and without hyper-threading.