This Wiki is no longer maintained because the repository has moved to You can find the new Wiki here.

The old wiki is provided for reference below


Getting CALDGEMM up and running


CALDGEMM requires a GPU based on the AMD Cypress chip. Examples are:
  • Radeon HD 5870
  • Radeon HD 5970
  • FirePro V8800
  • Firestream 9350
The following software is required to compile CALDGEMM:

Retrieving the source

Using git you can pull the most recent source from our git repository:

git clone git://

Once you retrieved the code you can receive updates by issuing $git pull$ in the generated directory.


The first two steps should only be required once. The last step is the actual build of CALDGEMM and needs to be performed after every update.

Patch GotoBLAS

The CALDGEMM code contains a patch for GotoBLAS that is not yet included in upstream. You will therefore need to build a patched version of GotoBLAS. By default CALDGEMM expects to find GotoBLAS in a neighboring directory, so you might want to unpack GotoBLAS there. Afterwards apply the patch on GotoBLAS and build GotoBLAS. In addition CALDGEMM will manage the memory policy itself, therefore GotoBLAS should not temper with it, so you need to compile GotoBLAS with NO_MEMPOLICY = 1.

tar -xzf GotoBLAS2-1.13.tar.gz
patch -p0 < caldgemm/gotoblas_patch/gotoblas.patch
cd GotoBLAS2
make NO_MEMPOLICY=1 -j

Install the Stream SDK

Simply unpack the Stream SDK to a convenient location. CALDGEMM will only need the headers provided by the SDK. You will need to make the environment variable ATISTREAMSDKROOT point to the unpacked Stream SDK. You might want to add export ATISTREAMSDKROOT=/path/to/ati-stream-sdk to your .bashrc or other shell setup file to avoid having to redo this step for every build.


By default CALDGEMM utilizes some AMD specific CPU operations. If you are on an Intel CPU you will have to do two modifications to the makefile:
  1. Add -D_NO_AMD_CPU to CXX_OPTS
  2. Change -march=barcelona to -march=native or some Intel architecture like core2.

Then simply run make.

In case of errors make sure the paths to GotoBLAS, the ATI Stream SDK and potentially the ATI driver in the makefile are correct.


The CALDGEMM source contains a binary patch for the Catalyst driver that can improve performance. You might want to apply the Catalyst driver patch to your program. In addition, on Intel systems you can see performance improvements by switching of hyper-threading.


CALDGEMM includes a small benchmark utility that allows you to benchmark the implementation and your system as well as to stresstest your GPUs. For more details look into source:README.


This part is still to be written. As this is a wiki feel free to add content. You will find the interface to program agains in the source:caldgemm.h.

Getting Help

Join us in ##caldgemm on IRC our use the CALDGEMM mailing list if you have questions.


CALDGEMM is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CALDGEMM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.