Cache Simulator

This tool is a very fast and flexible cache simulator, which we developed for internal use and then decided to make available to the general public. We present the answer of the common "how-to-use" question with a simple example of counting cache misses on MMM (Matrix-Matrix Multiply). The file simulate.cpp in the current distribution contains this example:

        #include <iostream>
        #include <sstream>

        using namespace std;

        #include "array2D.h"
        #include "array1D.h"
        #include "cachet.h"

        #define S (2 * 1024)       // cache capacity
        #define L (32)             // cache line size
        #define A (4)              // cache associativity
        #define E (sizeof(double)) // matrix element size
        #define N (100)            // matrix size

        void main ()
        {
                Cache *cache = new CacheT<S, L, A>();
                cache->Invalidate();

                Array2D a(N, N, E, cache);
                Array2D b(N, N, E, cache);
                Array2D c(N, N, E, cache);

                for (int i = 0; i < N; i++)
                for (int j = 0; j < N; j++)
                for (int k = 0; k < N; k++)
                c(i, j, 3) += a(i, k, 1) * b(k, j, 2);

                stringstream f;

                f << N << "\t" << "ijk";

                cout << cache->Statistics(f.str());
        }

Although this source is pretty close to the source of a real MMM, there are several notable differences. First you need to initialize the cache and invalidate it (empty it). Then declaring arrays (matrices) is now “Array2D <name>(<rows>, <cols>, sizeof(<type>), cache)” instead of just “<type> <name>[<rows>][<cols>]”. This permits the cache simulator to track array accesses in the following algorithm. One other difference is the array indexing: instead of just saying “<name>[<row>][<col>]” we use “<name>(<row>, <col>, <label>)”. The label argument is optional, but if specified, it allows the cache simulator to track the origin of all misses (i.e. tie a particular miss to a particular matrix). Finally we print the statistics gathered by the simulator, which in this case are:

        100     ijk     1       0       0       1000000
        100     ijk     1       4       0       250000
        100     ijk     2       0       0       1000000
        100     ijk     2       4       0       1000000
        100     ijk     3       0       0       1000000
        100     ijk     3       4       0       2500
        100     ijk     3       0       1       1000000
        100     ijk     1       1       0       2500
        100     ijk     2       1       0       2500
        100     ijk     3       1       0       2500
        100     ijk     1       2       0       247500
        100     ijk     2       2       0       997500

The first two columns were specified by us as a parameter to the Statistics function of the cache. Next column is the label (as specified as an additional parameter in the array references) – remember that we used 1 for A, 2 for B and 3 for C. If no label is specified in the references, expect to see only 0 here. Next column is the type of miss: 0=none, 1=cold, 2=capacity, 3=collision, 4=miss if the cache was fully associative. In summary, 0 in this column represents number of references; 1, 2, and 3 collectively represent the number of misses given the chosen cache organization; 4 represents the number of misses of a fully associative cache of the same size and same line size. Finally the last but one column represents the access type (0=read, 1=write) and the last column represents the count of references/misses.

The cache simulator simulates a write-allocate cache! Of course you can alternatively use a different cache simulator. We recommend using this one, as it is rather short, you have the whole implementation and it is quite fast as it does not generate memory reference traces, but does the simulation in real-time. When doing your experiments, plug-in the L1 data cache parameters that you found in Problem 1 for the cache capacity, line (block) size and associativity.

The simulator can be complied with Visual C++ or GCC (we have tried version 3.2). Remember to enable all optimizations as it makes use of some STL code, which is very slow if optimizations are not used:

        g++ -O3 simulate.cpp

If you C++ library does not have a hash_map header, you might want to try -Dusemap parameter also.

Support

The cache simulator has been designed and developed by Kamen Yotov. Direct your questions, suggestions and concerns to kamen@yotov.org.