> This is equivalent to (but faster than) ..... loop
timeit.pya = np.ones(1000,100))
s = 'for i in range(1000):\n np.sum(a[i,:])\n'
%timeit s
# 17.3 ns ± 0.0759 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
%timeit np.apply_along_axis(np.sum, 0, a)
# 597 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)