The test program (convBench) performs the following:
1. Generates 5000 input samples (sampling period of 0.001) from a fixed function: exp(-0.3*t)*sin(3*t+3/4)
2. Performs linear convolution of this with a 21-tap lowpass equiripple FIR filter (cutoff at 150 Hz)
3. Performs circular convolution using the discrete Fourier transform
4. Calculates the MSE between the results of both types of convolution.
The code was deliberately written to be general, unoptimised, and readable (in the sense that it follows the mathematical equations).
Times in Java are measured using System.currentTimeMillis() to exclude JVM startup. Times for C are measured using the 'time' function in linux (includes executable startup).
Three trials were performed.
Results
IcedTea JVM:
MSE = 2.4512546428693822E-14
Time elapsed: 32893
MSE = 2.4512546428693822E-14
Time elapsed: 32792
MSE = 2.4512546428693822E-14
Time elapsed: 32875
Sun Hotspot (1.5.0):
MSE = 2.4512546427320803E-14
Time elapsed: 30883
MSE = 2.4512546427320803E-14
Time elapsed: 31142
MSE = 2.4512546427320803E-14
Time elapsed: 30897
IBM JDK:
MSE = 2.4512546428761572E-14
Time elapsed: 52430
MSE = 2.4512546428761572E-14
Time elapsed: 52474
MSE = 2.4512546428761572E-14
Time elapsed: 52637
GCC (no optimisation switches):
MSE = 0.000000
16.717u 0.043s 0:16.77 99.8% 0+0k 0+0io 0pf+0w
MSE = 0.000000
16.651u 0.036s 0:16.69 99.9% 0+0k 0+0io 0pf+0w
MSE = 0.000000
16.682u 0.036s 0:16.77 99.6% 0+0k 0+0io 0pf+0w
GCC (with optimisation -O3):
MSE = 0.000000
11.910u 0.038s 0:12.02 99.3% 0+0k 0+0io 0pf+0w
MSE = 0.000000
11.898u 0.016s 0:11.92 99.8% 0+0k 0+0io 0pf+0w
MSE = 0.000000
11.871u 0.012s 0:11.90 99.8% 0+0k 0+0io 0pf+0w
GCC (with all optimisations I can think of):
- Code: Select all
% gcc convBench.c -O3 -march=core2 -ffast-math -funroll-loops -malign-double -mmmx -msse -msse2 -mfpmath=387,sse -lm -o convBench
MSE = 0.000000
7.276u 0.017s 0:07.30 99.7% 0+0k 0+0io 0pf+0w
MSE = 0.000000
7.258u 0.015s 0:07.28 99.7% 0+0k 0+0io 0pf+0w
MSE = 0.000000
7.278u 0.012s 0:07.30 99.7% 0+0k 0+0io 0pf+0w
Intel C++ compiler 10.1 (-O0):
MSE = 0.000000
12.310u 0.036s 0:12.35 99.9% 0+0k 0+0io 0pf+0w
MSE = 0.000000
12.318u 0.021s 0:12.38 99.5% 0+0k 0+0io 0pf+0w
MSE = 0.000000
12.333u 0.021s 0:12.36 99.9% 0+0k 0+0io 0pf+0w
Intel C++ compiler 10.1 (-O3):
MSE = 0.000000
6.095u 0.004s 0:06.10 99.8% 0+0k 0+0io 0pf+0w
MSE = 0.000000
6.097u 0.020s 0:06.12 99.8% 0+0k 0+0io 0pf+0w
MSE = 0.000000
6.111u 0.005s 0:06.12 99.8% 0+0k 0+0io 0pf+0w
Intel C++ Compiler 10.1 (all optimisation switches turned on):
- Code: Select all
% icc convBench.c -O3 -march=core2 -unroll -ip -lm -o convBench
convBench.c(35): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(39): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(39): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(45): (col. 8) remark: LOOP WAS VECTORIZED.
convBench.c(131): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(137): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(140): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(143): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(143): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(144): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(144): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(146): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(151): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(169): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(106): (col. 3) remark: LOOP WAS VECTORIZED.
convBench.c(89): (col. 3) remark: LOOP WAS VECTORIZED.
convBench.c(62): (col. 2) remark: LOOP WAS VECTORIZED.
convBench.c(71): (col. 3) remark: LOOP WAS VECTORIZED.
MSE = 0.000000
2.528u 0.007s 0:02.54 99.2% 0+0k 0+0io 0pf+0w
MSE = 0.000000
2.528u 0.005s 0:02.59 97.2% 0+0k 0+0io 0pf+0w
MSE = 0.000000
2.525u 0.002s 0:02.56 98.4% 0+0k 0+0io 0pf+0w
And the winner is...
C by a longshot

