High Performance Computing (Boids Part 1)

The next few posts will be about Boids. From the Wikipedia page:

Boids is an artificial life program, developed by Craig Reynolds in 1986, which simulates the flocking behaviour of birds. His paper on this topic was published in 1987 in the proceedings of the ACM SIGGRAPH conference. [1] The name “boid” corresponds to a shortened version of “bird-oid object”, which refers to a bird-like object.[2] Incidentally, “boid” is also a New York Metropolitan dialect pronunciation for “bird”.

The last line gets me every time. It’s a challenge to say  “boid” without saying “cawfee”. With a team of 4, I built a migration simulation around the idea of boids. There exists an array of alphas and a larger array of regular boids. The goal of the regular boids is to follow the alphas and the goal of the alphas is to be removed from all boids. To handle scale, my team built our simulation on a distributed system. This is the general process:

  1. Boids broadcast their location
  2. Hosts receive/send information over the wire
  3. Boids perform a function
  4. Loop

I was curious about a specific function, namely distance calculations. I want to generate 1,000 birds per host so calculating distance point wise would be computationally expensive. Here’s my original Python code:

random.seed(a = 42)
list_birds = []
for i in range(1000):
x = random.randint(0, 100)
y = random.randint(0, 100)
bird = (x, y)
list_birds.append(bird)
def distance(a, b):
x = (a[0] - b[0]) ** 2
y = (a[1] - b[1]) ** 2
dist = math.sqrt(x + y)
return dist
start = time.time()
for combo in combinations(list_birds, 2):
distance(combo[0], combo[1])
end = time.time()

Traditional Python list with dist function: 0.962867021560669. Normally, less than a second for these ~500,000 computations is pretty good. However, for our real time simulation, one time step being almost a second long would be too noticeable. I was curious if the problem was the distance calculation or iterating through a list. So I ran the following:

start = time.time()
for combo in combinations(list_birds, 2):
pass
end = time.time()

Traditional Python list with pass instead of function: 0.03222298622131348. So it does appear the actual calculations are expensive. I then tried using numpy arrays instead of Python lists, such that:

array_birds = np.random.rand(1000, 2)
start = time.time()
pdist(array_birds)
end = time.time()

Numpy array with numpy functions: 0.004333972930908203. Only 4 milliseconds! For a thousand birds! It’s worth noting that when I used numpy arrays, but still passed it through my Python dist function, I actually lost 10 milliseconds in performance. So why did numpy arrays with numpy functions make my performance time so much faster?

Python is an interpreted language as opposed to a compiled language. This means the computer reads Python line by line, which prevents efficiency through multi-core processing. In a compiled language, the magic box compiler reads all the code first, divides the work, and then executes. Luckily, Python is a wrapper for C (a compiled language), and numpy arrays use the backend C. The distance calculations are all independent of each other, meaning they can be performed concurrently and in parallel. Using numpy arrays, the compiler is able to vectorize computations and gain efficiency. However, if I use numpy arrays with Python functions, then I lose performance because I created more overhead through switching. While Python has some built in threading that makes it great for many things, high performance computing is not one of them.