Cache Oblivious Transpose: https://wgropp.cs.illinois.edu/courses/cs598-s15/lectures/lecture08.pdf t=1:8; N=2*ones(1,8); N=cumprod(N); for i=1:8, for j=1:8, a(i,j)=pi/asin(sqrt(t(i)/N(j))); end; end look at real(a) exact k = M/a, so if M=2**m then a needs to be a power of 2 for k to be integer t N 2 4 8 16 32 64 128 256 - ----------------------------------------------------------------------------------------- 1 4 6 8.6936 12.433 17.678 25.067 35.497 50.233 2 2 4 6 8.6936 12.433 17.678 25.067 35.497 3 1.7011 3 4.7668 7.0151 10.096 14.395 20.44 28.964 4 1.5211 2 4 6 8.6936 12.433 17.678 25.067 5 1.3972 1.8284 3.4457 5.296 7.7307 11.09 15.791 22.406 6 1.3051 1.7011 3 4.7668 7.0151 10.096 14.395 20.44 7 1.233 1.6016 2.5976 4.3468 6.455 9.3204 13.31 18.911 8 1.1745 1.5211 2 4 6 8.6936 12.433 17.678 t N M k ./count args 1 2 4 1 1 2 1 1 2 8 2 1 3 1 2 2 2 1 1 1 2 2 4 4 1 2 2 2 4 4 2 1 2 1 4 4 8 4 1 3 2 4 8 16 4 1 4 2 8 --- using G**j,j partition: perry@rpl:~/www/qc/count$ for i in 13 14; do time ./old_count 8 $i 8 > /dev/null; done real 2m5.074s user 2m5.041s sys 0m0.020s real 8m31.839s user 8m31.755s sys 0m0.048s using cache efficient transpose: % time ./count 8 13 8 > /dev/null 17.306u 0.037s 0:17.41 99.5% 0+0k 0+0io 0pf+0w % time ./count 8 14 8 > /dev/null 68.798u 0.082s 1:09.03 99.7% 0+0k 0+0io 0pf+0w % time ./count 8 15 8 > 8-15-8.txt 277.920u 0.202s 4:40.16 99.2% 0+0k 0+3424io 0pf+0w % time ./count 8 17 8 > 8-17-8.txt 4395.506u 1.082s 1:13:39.33 99.4% 0+0k 0+13952io 0pf+0w --- Classical Monte Carlo, M trials, q=1-p, count=13*13*(sumx/M) E[sumx/M] = M*p/M = p, E[count] = 169*p Var[sumx/M] = M*p*q/(M*M) = p*q/M, Var[count] = 169*169*p*q/M, stdev[count] = 169*sqrt(p*q/M)