IA-32lib
Document Revision 1.01.030609
This is intended to be a short description of the ia32lib library. You can use the links below to jump to the different sections. Use the HOME key on your keyboard to return to the top. If you want to get the big picture without reading too much, I would suggest skipping the Reference section. If you don't feel reading this at all, scan through the Examples section.
The IA32Lib toolkit has been designed and developed by Kamen Yotov. Direct your questions, suggestions and concerns to kamen@yotov.org.
We will appreciate any possible feedback you might have at every stage (usage, design, source, documentation...). Enjoy!
[ Introduction | Overview | Requirements | Installation | Reference | Examples | Future Work ]
Introduction
Modern processors based on IA-32 have performance counter registers that allow
programmers to count different statistical values about their running applications.
Programming these counters to count exactly what a programmer wants and reading
their values requires access to the so called Model Specific Registers (MSRs)
of the processor. There are several instructions in the IA-32 ISA that provide
access to these special registers (e.g. RDMSR
, WRMSR
, RDPMC
),
but they are either privileged (restricted to be executed in kernel ring-0
mode only) or have some other restrictions, which combined with the security
of the operating system, does not allow the application programmer to use them.
The main purpose of the ia32lib library is to export user-programmable interface
to these crucial performance measurement facilities and to provide appropriate
ways for detailed processor detection (family, model, cache configuration,
...).
Both the library and its full source code are free for personal use and can be freely downloaded. I have not yet figured out the policy for commercial uses, but who knows...
Overview
ia32lib consists of two main parts plus some examples:
- ia32.sys - a Windows NT/2K/XP Kernel-Mode Driver that provides access to IA-32's MSRs;
- ia32.lib + ia32.h - a static library which provides easy interface to ia32.sys and some other nice features like CPU model and cache configuration detection;
- ia32detect.cpp and ia32p6.cpp are two examples that use ia32lib. I will pay more attention to them later (See examples).
The sources to build ia32.sys are provided for completeness and educational purposes only. It is not advisable to try rebuilding ia32.sys unless you really know what you are doing. Further you will need Microsoft Windows NT Driver Development Kit (freely available from Microsoft's site).
Requirements
- A PC running Microsoft Windows NT / 2K / XP;
- Microsoft Visual C++ 6.0 or Microsoft Visual Studio.NET (7.0)
- Optional: Intel C++ Optimizing Compiler 5.0.
NOTES: The compilers ia32lib has been tested with so far are Microsoft C++ 6.0 and 7.0 and Intel C++ Optimizing Compiler 5.0. Every effort has been made to make to code portable to other compilers, but no tests have been performed so far. The next compilers to look at will probably be Watcom C++ 11.0c and Borland C++ 5.5. Compatibility with the first three mentioned above is guaranteed as long the development effort continues.
Installation
- Download the distribution (If you have not already done so);
- Start ia32lib.exe - this will unpack all files to a directory of your choice;
- Install the ia32.sys Kernel-Mode Driver on your Windows system (step-by-step instructions);
- Open ia32lib.dsw workspace or ia32lib.sln solution with the appropriate version of Microsoft Visual C++ (6.0 or 7.0.NET respectively);
- You are ready to go! Try building the two sample programs (ia32detect and ia32p6).
NOTES: Installation of the ia32.sys driver does not require system restart on Windows XP. The installation instructions are also for Windows XP, but the steps should be isomorphic to the steps for Windows NT and Windows 2000. Note that at this point I have not tried the driver on Windows NT and Windows 2000, but it should work :). If you have any problems, mail me!
The directory structure of the distribution is as follows:
- ia32 - root directory of the distribution
- doc - documentation directory
- steps - images for step-by-step driver installation
- step0.png
- ...
- step15.png
- generic.css - style sheet for the documentation
- install.htm - step-by-step driver installation instructions
- index.htm - main documentation file (similar to this one, if not the same :)
- steps - images for step-by-step driver installation
- drv - NT Kernel-Mode Driver directory
- source - driver sources
- makefile - required part of the NT DDK environment build process
- sources - required part of the NT DDK environment build process
- ia32.c - main driver source, based on the portio example in the NT DDK
- ring0.c - IA-32 assembly support routines
- ring0.h - IA-32 assembly support routines header
- ia32.rc - driver version info resource
- ia32.inf - driver installation information file (needed by "Add Hardware Wizard" to install the driver)
- ia32.sys - compiled binary of the driver itself
- source - driver sources
- inc - host for all header files of the ia32lib library
- ia32.h - main header files, includes all others. This is the only one you need to include
- ia32cache.h - describes possible cache configurations for IA-32
- ia32counter.h - defines abstract base class for performance counters
- ia32def.h - defines basic types
- ia32detect.h - defines the IA-32 CPU detection class
- ia32driver.h - provides interface constants for use with the driver. Also included by the driver
- ia32error.h - defines error exception class
- ia32ring0.h - defines class to expose driver API to the application
- ia32size.h - defines auxiliary class for managing memory sizes
- p6counter.h - specializes ia32counter for the Intel P6 processor family (Pentium Pro, II and III)
- lib - source files needed to build ia32.lib
- ia32.cpp - used solely for pre-compiled header generation (Microsoft Visual C++ feature)
- ia32cache.cpp - initializers for known cache configurations... needs additions, so keep an eye on it
- ia32counter.cpp - initializes ia32counter's static variable "counter"
- out - here all output files of the build process are placed
- ia32.lib - pre-built relese version of the library
- ia32detect.exe - pre-built release version of the ia32detect sample application
- ia32p6.exe - pre-built release version of the ia32p6 sample application (requirest the ia32.sys driver to be installed)
- prj - support files for Microsoft Visual C++
- vc.6 - support files for Microsoft Visual C++ 6.0
- ia32detect - ia32detect.exe sample application project
directory
- ia32detect.dsp - ia32detect sample application project file
- ia32lib - ia32.lib library project directory
- ia32lib.dsp - ia32.lib library project file
- ia32p6 - ia32p6.exe sample application project directory
- ia32p6.dsp - ia32p6.exe sample application project file
- ia32lib.dsw - Microsoft Visual C++ 6.0 Project Workspace (open this thing inside the environment)
- ia32detect - ia32detect.exe sample application project
directory
- vc.net - support files for Microsoft Visual C++ .NET
- ia32detect - ia32detect.exe sample application project
directory
- ia32detect.vcproj - ia32detect sample application project file
- ia32lib - ia32.lib library project directory
- ia32lib.vcproj - ia32.lib library project file
- ia32p6 - ia32p6.exe sample application project directory
- ia32p6.vcproj - ia32p6.exe sample application project file
- ia32lib.sln - Microsoft Visual C++ .NET Solution (open this thing inside the environment)
- ia32detect - ia32detect.exe sample application project
directory
- vc.6 - support files for Microsoft Visual C++ 6.0
- src - examples source directory
- ia32detect - source directory for the ia32detect.exe example
- ia32detect.cpp - source for the ia32detect.exe example
- ia32p6 - source directory for the ia32p6.exe example
- ia32p6.cpp - source for the ia32p6.exe example
- ia32detect - source directory for the ia32detect.exe example
Reference
[ ia32def.h | ia32size.h | ia32error.h | ia32driver.h | ia32ring0.h | ia32cache.h | ia32detect.h | ia32counter.h | p6counter.h ]
This part is mostly top-down description of all features in the library. Each header file is discussed separately and in detail. Moreover, if you don't feel like reading, this is the part to skip :).
ia32def.h
types | |||
name | ![]() |
equivalent | |
byte | unsigned char | ||
word |
unsigned |
||
bit | unsigned | ||
uint8 | unsigned __int8 | ||
uint16 | unsigned __int16 | ||
uint32 | unsigned __int32 | ||
uint64 | unsigned __int64 |
Notes:
- bit is used in structured bit-fields (see ia32detect.h for examples).
Back to Reference...
ia32size.h
Constants | |||
Name | ![]() |
Value | |
B | (uint64)1 | ||
KB | (1024 * B) | ||
MB | (1024 * KB) | ||
GB | (1024 * MB) | ||
TB | (1024 * TB) | ||
Classes | |||
Name | ![]() |
Definition | |
ia32size |
class ia32size { uint64 size; public: ia32size (uint64); operator const string () const; operator const uint64 () const; } |
Notes:
- ia32size's purpose is to convey memory sizes in easy to read textual format;
- ia32size::ia32size(uint64) constructs an instance for a specific capacity value;
- ia32size::operator string () const is used to convert the encapsulate value to a string (see example below);
- ia32size::operator uint64 () const is used to return the encapsulated value in native integer format.
Example:
#include "ia32size.h" void main () { printf("%8s\n", ((string)ia32size(16)).c_str()); printf("%8s\n", ((string)ia32size(1024)).c_str()); printf("%8s\n", ((string)ia32size(4096)).c_str()); printf("%8s\n", ((string)ia32size(3 * 1024 * 1024)).c_str()); printf("%8s\n", ((string)ia32size((uint64)13 * 1024 * 1024 * 1024 * 1024)).c_str()); printf("%8s\n", ((string)ia32size((uint64)13 * 1024 * 1024 * 1024 * 1024 + (uint64)7 * 1024 * 1024 * 1024)).c_str()); printf("%8d\n", (uint64)ia32size(12345678)); }
Output:
16 B 1KB 4KB 3MB 13TB 13319GB 12345678
Back to Reference...
ia32error.h
Classes | |||
Name | ![]() |
Definition | |
ia32error |
class ia32error { public: enum err_ { err_generic, err_ring0_cpu, err_ring0_create, err_ring0_ioctl, err_ring0_size, err_ring0_close, err_counter_overflow, err_counter_family, err_counter_MMX, err_counter_SSE, err_counter_counter, err_invalid }; ia32error (err_); operator const char * () const; protected: err_ v; }; |
Notes:
- ia32error is a class whose instances are thrown as exceptions;
- enum ia32error::err_ enumerated the different error values;
- ia32error::ia32error (err_) initializes an instance to a particular error value;
- ia32error::operator const char * () const converts the encapsulated error value to a string (suitable for error display), for the list of specific string values look inside ia32error.h;
- most ring-0 routines throw ia32errors as exceptions. For specific examples see the ia32p6 sample.
Back to Reference...
ia32driver.h
Constants | |||
Name | ![]() |
Value | |
IA32CPU_TYPE | 40000 | ||
IOCTL_IA32CPU_READ_MSR | CTL_CODE(IA32CPU_TYPE, 0x900, METHOD_BUFFERED, FILE_READ_ACCESS) | ||
IOCTL_IA32CPU_WRITE_MSR | CTL_CODE(IA32CPU_TYPE, 0x901, METHOD_BUFFERED, FILE_WRITE_ACCESS) |
Notes:
- Constants defined in this header are used both by the ia32.sys kernel-mode driver and by the ia32ring0.h driver interface header;
- IOCTL_IA32CPU_XXX_MSR are needed to complete DeviceIoControl system calls to the driver.
Back to Reference...
ia32ring0.h
Classes | |||
Name | ![]() |
Definition | |
ia32ring0 |
class ia32ring0 { HANDLE h; public: ia32ring0 (); uint64 rdmsr (uint32 i) const; void wrmsr (uint32 i, uint64 d) const; ~ia32ring0 (); }; |
Notes:
- ia32ring0 is the exported user-level API to the ia32.sys driver, used to read and write IA-32 Model Specific Registers (MSRs);
- ia32ring0::ia32ring0 () initializes a connection to the driver;
- uint64 ia32ring0::rdmsr (uint32 i) const uses the driver to read the i-th MSR and returns its value;
- void ia32ring0::wrmsr (uint32 i, uint64 d) const uses the driver to write the i-th MSR with the value contained in d;
- ia32ring0::~ia32ring0 () closes the connection to the driver.
Back to Reference...
ia32cache.h
Classes | |||
Name | ![]() |
Definition | |
ia32cache |
class ia32cache { public: enum type_ { type_reserved, type_unified, type_instruction, type_trace, type_data, type_invalid }; enum _ { level_TLB = -1, associativity_Full = -1, block_AnySize = 0 }; const byte descriptor; const type_ type; const int level; const ia32size capacity; const ia32size block; const int associativity; ia32cache (byte, type_, int, ia32size, ia32size, int); operator const string () const; protected: const const char * type_text () const; const const string associativity_text () const; }; |
||
Variables | |||
Name | ![]() |
Declaration | |
ia32caches | extern const ia32cache ia32caches[]; | ||
Functions | |||
Name | ![]() |
Prototype | |
_ia32cache | const ia32cache &_ia32cache (byte); |
Notes:
- ia32cache is a class describing cache memory parameters. For now a number of predefined such classes exist (see ia32cache.cpp for complete listing), but in the future it will also be used to describe caches detected empirically by software;
- enum ia32cache::type_ enumerates the different types of caches supported;
- enum ia32cache::_ enumerates some special values for otherwise integer fields like block size and associativity;
- const byte ia32cache::descriptor contains the IA-32 defined byte descriptor of the cache;
- const type_ ia32cache::type contains the type of the cache;
- const int ia32cache::level contains the cache level (-1 means TLB cache);
- const ia32size ia32cache::capacity contains the size of the cache;
- const ia32size ia32cache::block contains the block size of the cache (0 means "Any Size" for page sizes in TLB caches);
- const int ia32cache::associativity contains the associativity of the cache (-1 means "Fully-Associative");
- ia32cache::ia32cache (byte, type_, int, ia32size, ia32size, int) initializes a cache instance;
- ia32cache::operator const string () converts the cache instance to a nice looking string representation (see the ia32detect sample for detailed examples);
- const char * ia32cache::type_text () const returns a text representation of the current value of the type field;
- const string associativity_text () const returns a text representation of the current value of the associativity field;
- ia32cache ia32caches[] contains pre-initialized cache instances for all descriptors known so far;
- const ia32cache &_ia32cache (byte) searches a cache instance by descriptor in the above array.
Back to Reference...
ia32detect.h
Classes | |||
Name | ![]() |
Definition | |
ia32error |
class ia32detect { public: enum type_ { type_OEM, type_OverDrive, type_Dual, type_reserved }; enum brand_ { brand_na, brand_Celeron, brand_PentiumIII, brand_PentiumIIIXeon, brand_reserved1, brand_reserved2, brand_PentiumIIIMobile, brand_reserved3, brand_Pentium4, brand_invalid }; struct version_ { bit Stepping : 4; bit Model : 4; bit Family : 4; bit Type : 2; bit Reserved1 : 2; bit XModel : 4; bit XFamily : 8; bit Reserved2 : 4; }; struct misc_ { byte Brand; byte CLFLUSH; byte Reserved; byte APICId; }; struct feature_ { bit FPU : 1; // Floating Point Unit On-Chip bit VME : 1; // Virtual 8086 Mode Enhancements bit DE : 1; // Debugging Extensions bit PSE : 1; // Page Size Extensions bit TSC : 1; // Time Stamp Counter bit MSR : 1; // Model Specific Registers bit PAE : 1; // Physical Address Extension bit MCE : 1; // Machine Check Exception bit CX8 : 1; // CMPXCHG8 Instruction bit APIC : 1; // APIC On-Chip bit Reserved1 : 1; bit SEP : 1; // SYSENTER and SYSEXIT instructions bit MTRR : 1; // Memory Type Range Registers bit PGE : 1; // PTE Global Bit bit MCA : 1; // Machine Check Architecture bit CMOV : 1; // Conditional Move Instructions bit PAT : 1; // Page Attribute Table bit PSE36 : 1; // 32-bit Page Size Extension bit PSN : 1; // Processor Serial Number bit CLFSH : 1; // CLFLUSH Instruction bit Reserved2 : 1; bit DS : 1; // Debug Store bit ACPI : 1; // Thermal Monitor and Software Controlled Clock Facilities bit MMX : 1; // Intel MMX Technology bit FXSR : 1; // FXSAVE and FXRSTOR Instructions bit SSE : 1; // Intel SSE Technology bit SSE2 : 1; // Intel SSE2 Technology bit SS : 1; // Self Snoop bit Reserved3 : 1; bit TM : 1; // Thermal Monitor bit Reserved4 : 2; }; string vendor; string brand; version_ version; misc_ misc; feature_ feature; byte *cache; ia32detect (); const string version_text () const; protected: const char * type_text () const; const string brand_text () const; private: uint32 init0 (); void init1 (uint32 *d); void process2 (uint32 d, bool c[]); void init2 (byte count); void init0x80000000 (); }; |
Notes:
- enum ia32detect::type_ enumerates CPU types for the version.Type field;
- enum ia32detect::brand_ enumerates CPU brands for the misc.Brand field;
- struct ia32detect::version_ (version field) describes CPU version information as returned by the CPUID instruction;
- struct ia32detect::misc_ (misc field) describes CPU miscellaneous information as returned by the CPUID instruction;
- struct ia32detect::feature_ (feature field) describes CPU feature information as returned by the CPUID instruction;
- string ia32detect::vendor specifies the CPU vendor ("GenuineIntel" for Intel CPUs);
- string ia32detect::brand specifies the CPU brand string, when supported;
- byte *ia32detect::cache specifies a null terminated stream of cache descriptors;
- ia32detect::ia32detect () initializes an instance of the class by (multiple) use of CPUID instruction;
- const string ia32detect::version_text () returns a string representation of the version field;
- const char *ia32detect::type_text () returns a string representation of the type field;
- const string ia32detect::brand_text () returns a string representation of the misc.Brand field;
- all the private members are auxiliary routines to simplify the work of the constructor.
Back to Reference...
ia32counter.h
Classes | |||
Name | ![]() |
Definition | |
ia32counter |
class ia32counter { protected: static uint32 count; uint32 index; public: ia32counter (uint32 counters); }; |
Notes:
- ia32counter is an abstract base class for performance monitoring hardware counter;
- static uint32 ia32counter::count accumulates the number of instances created;
- uint32 ia32counter::index contains the hardware index of this instance;
- ia32counter::ia32counter (uint32 counters) initializes the index and checks for structural hazards (enough hardware counters).
Back to Reference...
p6counter.h
Classes | |||
Name | ![]() |
Definition | |
p6counter |
class p6counter: public ia32counter { public: enum event_ { // Data Cache Unit (DCU) DCU_MEMORY_REFERENCE = 0x43, // DATA_MEM_REFS DCU_LINES_IN = 0x45, DCU_M_LINES_IN = 0x46, DCU_M_LINES_OUT = 0x47, DCU_MISS_OUTSTANDING = 0x48, // Instruction Fetch Unit (IFU) IFU_IFETCH = 0x80, IFU_IFETCH_MISS = 0x81, IFU_TLB_MISS = 0x85, // ITLB_MISS IFU_MEMORY_STALL = 0x86, IFU_ILD_STALL = 0x87, // ILD_STALL // L2 Cache L2_IFETCH = 0x28, L2_LOADS = 0x29, // L2_LD L2_STORES = 0x2A, // L2_ST L2_LINES_IN = 0x24, L2_LINES_OUT = 0x26, L2_M_LINES_IN = 0x25, L2_M_LINES_OUT = 0x27, L2_REQUEST = 0x2E, // L2_RQSTS L2_ADDRESS_STROBE = 0x21, // L2_ADS L2_DATA_BUS_BUSY = 0x22, // L2_DBUS_BUSY L2_DATA_BUS_BUSY_READ = 0x23, // L2_DBUS_BUSY_RD // External Bus Logic (EBL) EBL_DATA_READY = 0x62, // BUS_DRDY_CLOCKS EBL_LOCK = 0x63, // BUS_LOCK_CLOCKS EBL_REQ_OUTSTANDING = 0x60, // BUS_REQ_OUTSTANDING EBL_TRANS_BURST_READ = 0x65, // BUS_TRAN_BRD EBL_TRANS_READ_OWNER = 0x66, // BUS_TRAN_RFO EBL_TRANS_WRITEBACK = 0x67, // BUS_TRANS_WB EBL_TRANS_IFETCH = 0x68, // BUS_TRAN_IFETCH EBL_TRANS_INVALIDATE = 0x69, // BUS_TRAN_INVAL EBL_TRANS_PARTIAL_WRITE = 0x6A, // BUS_TRAN_PWR EBL_TRANS_PARTIAL = 0x6B, // BUS_TRANS_P EBL_TRANS_IO = 0x6C, // BUS_TRANS_IO EBL_TRANS_DEFERRED = 0x6D, // BUS_TRAN_DEF EBL_TRANS_BURST = 0x6E, // BUS_TRAN_BURST EBL_TRANS_ANY = 0x70, // BUS_TRAN_ANY EBL_TRANS_MEMORY = 0x6F, // BUS_TRAN_MEM EBL_DATA_RECEIVE = 0x64, // BUS_DATA_RCV EBL_DRIVE_BNR = 0x61, // BUS_BNR_DRV EBL_DRIVE_HIT = 0x7A, // BUS_HIT_DRV EBL_DRIVE_HITM = 0x7B, // BUS_HITM_DRV EBL_SNOOP_STALL = 0x7E, // BUS_SNOOP_STALL // Floating-Point Unit (FPU) FPU_FLOPS_RETIRED = 0xC1, // FLOPS, Counter 0 only FPU_FLOPS_EXECUTED = 0x10, // FP_COMP_OPS_EXE, Counter 0 only FPU_ASSIST = 0x11, // FP_ASSIST, Counter 1 only FPU_MUL = 0x12, // MUL, Counter 1 only FPU_DIV = 0x13, // DIV, Counter 1 only FPU_DIV_BUSY = 0x14, // CYCLES_DIV_BUSY, Counter 0 only // Memory Ordering (MO) MO_LOAD_BLOCKED = 0x03, // LD_BLOCKS MO_STORE_BUFFER_DRAIN = 0x04, // SB_DRAINS MO_MISALLIGNMENT = 0x05, // MISALIGN_MEM_REF SSE_PREFETCH_DISPATCHED = 0x07, // EMON_KNI_PREF_DISPATCHED SSE_PREFETCH_MISS = 0x4B, // EMON_KNI_PREF_MISS // Instruction Decoding and Retirement (IDR) IDR_INSTRUCTION_RETIRED = 0xC0, // INST_RETIRED IDR_UOP_RETIRED = 0xC2, // UOPS_RETIRED IDR_INSTRUCTION_DECODED = 0xD0, // INST_DECODED SSE_INSTRUCTION_RETIRED = 0xD8, // EMON_KNI_INST_RETIRED SSE_COMPUTATION_RETIRED = 0xD9, // EMON_KNI_COMP_INST_RET // Interrupts (INT) INT_HW_RECEIVED = 0xC8, // HW_INT_RX INT_MASKED = 0xC6, // CYCLES_INT_MASKED INT_PENDING_AND_MASKED = 0xC7, // CYCLES_INT_PENDING_AND_MASKED // Branches (BR) BR_INSTRUCTION_RETIRED = 0xC4, // BR_INST_RETIRED BR_MISSPREDICT_RETIRED = 0xC5, // BR_MISS_PRED_RETIRED BR_TAKEN_RETIRED = 0xC6, BR_MISSPREDICT_TAKEN_RETIRED = 0xC7, // BR_MISS_PRED_TAKEN_RET BR_INSTRUCTION_DECODED = 0xE0, // BR_INST_DECODED BR_BTB_MISS = 0xE2, // BTB_MISSES BR_BOGUS = 0xE4, BR_BACLEAR = 0xE6, // BARCLEARS // Stalls (STALL) STALL_RESOURCE = 0xA2, // RESOURCE_STALLS STALL_PARTIAL = 0xD2, // PARTIAL_RAT_STALLS // Multimedia Extensions (MMX) MMX_INSTRUCTION_EXECUTE = 0xB0, // MMX_INSTR_EXEC MMX_SATURATING_EXECUTE = 0xB1, // MMX_SAT_INSTR_EXEC MMX_UOP_EXECUTE = 0xB2, // MMX_UPOS_EXEC MMX_TYPE_EXECUTE = 0xB3, // MMX_INSTR_TYPE_EXEC MMX_FPU_TRANSITION = 0xCC, // FP_MMX_TRANS MMX_ASSIST = 0xCD, MMX_INSTRUCTION_RETIRED = 0xCE, // MMX_INSTR_RET // Segment Register Renaming (SRR) SRR_STALL = 0xD4, // SEG_RENAME_STALLS SRR_COUNT = 0xD5, // SEG_REG_RENAME SRR_COUNT_RETIRED = 0xD6, // RET_SEG_RENAMES SEGMENT_REGISTER_LOADS = 0x06, // SEGMENT_REG_LOADS CPU_CLOCKS_UNHALTED = 0x79 // CPU_CLK_UNHALTED }; enum mask_ { NONE = 0x0, L2_M = 0x8, L2_E = 0x4, L2_S = 0x2, L2_I = 0x1, L2_MESI = 0xF, EBL_SELF = 0x00, EBL_ANY = 0x20, SSE_PREFETCH_NTA = 0x00, SSE_PREFETCH_T1 = 0x01, SSE_PREFETCH_T2 = 0x02, SSE_WEAKLY_ORDERED_STORES = 0x03, SSE_PACKED_AND_SCALAR = 0x00, SSE_SCALAR = 0x01, MMX_PACKED_MULTIPLY = 0x01, MMX_PACKED_SHIFT = 0x02, MMX_PACK = 0x04, MMX_UNPACK = 0x08, MMX_PACKED_LOGICAL = 0x10, MMX_PACKED_ARITHMETIC = 0x20, MMX_ANY = 0x3F, MMX_TO_FPU = 0x0, MMX_FROM_FPU = 0x1, SRR_ES = 0x1, SRR_DS = 0x2, SRR_FS = 0x4, SRR_GS = 0x8, SRR_ANY = 0xF }; struct { bit event : 8; bit mask : 8; bit ring123 : 1; bit ring0 : 1; bit edge : 1; bit pin : 1; bit int_ : 1; bit reserved : 1; bit enable : 1; bit invert : 1; bit count : 8; } config; p6counter (event_ event, mask_ mask = NONE, byte count = 0, bool invert = false); operator const uint64 () const; protected: ia32ring0 r0; }; |
Notes:
- p6counter is a derived class of ia32counter for performance monitoring counter on the Intel P6 Family of CPUs (Pentium Pro, II and III);
- enum p6counter::event_ enumerates the different events this counter can be programmed to count;
- enum p6counter::mask_ enumerates the different values for the mask field in the counter programming register;
- struct p6counter::config represents the counter's programming register;
- p6counter::p6counter (event_, mask_, byte, invert) initilizes the hardware counter and starts it;
- p6counter::operator uint64 () const reads the current value of the counter;
- ia32ring0 p6counter::r0 is used for communication with the kernel-mode driver.
Back to Reference...
Examples
ia32detect
This examples fully exploits the features for CPU detection. Here you can find demonstrated all the supported features. Provided below is the complete source code (not much).
#include "ia32.h" void main () { ia32detect ia32; printf("Vendor = %s\n\n", ia32.vendor.c_str()); printf("Brand = %s\n\n", ia32.brand.c_str()); printf("Version = %s\n\n", ia32.version_text().c_str()); printf("Cache: \n\n"); for (int i = 0; ia32.cache[i]; i++) printf("%s\n", ((string)_ia32cache(ia32.cache[i])).c_str()); printf("\nFeatures:\n\n"); printf("%c %s\n", ia32.feature.FPU ? '+' : '-', "Floating Point Unit On-Chip"); printf("%c %s\n", ia32.feature.VME ? '+' : '-', "Virtual 8086 Mode Enhancements"); printf("%c %s\n", ia32.feature.DE ? '+' : '-', "Debugging Extensions"); printf("%c %s\n", ia32.feature.PSE ? '+' : '-', "Page Size Extensions"); printf("%c %s\n", ia32.feature.TSC ? '+' : '-', "Time Stamp Counter"); printf("%c %s\n", ia32.feature.MSR ? '+' : '-', "Model Specific Registers"); printf("%c %s\n", ia32.feature.PAE ? '+' : '-', "Physical Address Extension"); printf("%c %s\n", ia32.feature.MCE ? '+' : '-', "Machine Check Exception"); printf("%c %s\n", ia32.feature.CX8 ? '+' : '-', "CMPXCHG8 Instruction"); printf("%c %s\n", ia32.feature.APIC ? '+' : '-', "APIC On-Chip"); printf("%c %s\n", ia32.feature.SEP ? '+' : '-', "SYSENTER and SYSEXIT instructions"); printf("%c %s\n", ia32.feature.MTRR ? '+' : '-', "Memory Type Range Registers"); printf("%c %s\n", ia32.feature.PGE ? '+' : '-', "PTE Global Bit"); printf("%c %s\n", ia32.feature.MCA ? '+' : '-', "Machine Check Architecture"); printf("%c %s\n", ia32.feature.CMOV ? '+' : '-', "Conditional Move Instructions"); printf("%c %s\n", ia32.feature.PAT ? '+' : '-', "Page Attribute Table"); printf("%c %s\n", ia32.feature.PSE36 ? '+' : '-', "32-bit Page Size Extension"); printf("%c %s\n", ia32.feature.PSN ? '+' : '-', "Processor Serial Number"); printf("%c %s\n", ia32.feature.CLFSH ? '+' : '-', "CLFLUSH Instruction"); printf("%c %s\n", ia32.feature.DS ? '+' : '-', "Debug Store"); printf("%c %s\n", ia32.feature.ACPI ? '+' : '-', "Thermal Monitor and Software Controlled Clock Facilities"); printf("%c %s\n", ia32.feature.MMX ? '+' : '-', "Intel MMX Technology"); printf("%c %s\n", ia32.feature.FXSR ? '+' : '-', "FXSAVE and FXRSTOR Instructions"); printf("%c %s\n", ia32.feature.SSE ? '+' : '-', "Intel SSE Technology"); printf("%c %s\n", ia32.feature.SSE2 ? '+' : '-', "Intel SSE2 Technology"); printf("%c %s\n", ia32.feature.SS ? '+' : '-', "Self Snoop"); printf("%c %s\n", ia32.feature.TM ? '+' : '-', "Thermal Monitor"); }
Below is the output from my laptop machine. Please, if you decide to install the package, run this small problem and e-mail me the results.
Vendor = GenuineIntel Brand = Intel(R) Pentium(R) III Mobile CPU 1000MHz Version = 6.11.1 Intel OEM Processor XVersion(0.0) Cache: 0x01: TLB instruction, Entries( 32), PageSize(4KB), Associativity(4-way) 0x02: TLB instruction, Entries( 2), PageSize(4MB), Associativity( Full) 0x03: TLB data, Entries( 64), PageSize(4KB), Associativity(4-way) 0x04: TLB data, Entries( 8), PageSize(4MB), Associativity(4-way) 0x08: L1 instruction$, Size( 16KB), Block( 32 B), Associativity(4-way) 0x0c: L1 data$, Size( 16KB), Block( 32 B), Associativity(4-way) 0x83: L2 unified$, Size( 512KB), Block( 32 B), Associativity(8-way) Features: + Floating Point Unit On-Chip + Virtual 8086 Mode Enhancements + Debugging Extensions + Page Size Extensions + Time Stamp Counter + Model Specific Registers + Physical Address Extension + Machine Check Exception + CMPXCHG8 Instruction - APIC On-Chip + SYSENTER and SYSEXIT instructions + Memory Type Range Registers + PTE Global Bit + Machine Check Architecture + Conditional Move Instructions + Page Attribute Table + 32-bit Page Size Extension - Processor Serial Number - CLFLUSH Instruction - Debug Store - Thermal Monitor and Software Controlled Clock Facilities + Intel MMX Technology + FXSAVE and FXRSTOR Instructions + Intel SSE Technology - Intel SSE2 Technology - Self Snoop - Thermal Monitor
ia32p6
This example demonstrates the usage of Intel P6 Hardware Performance Monitoring Counters. Processors from this family have two almost identical counters. In the source below, one of them is setup to count memory references and the other - to count requests to the L2 cache (which is actually nothing else but L1 misses!).
#include "ia32.h" #include "p6counter.h" void main () { p6counter c1(p6counter::L2_REQUEST, p6counter::L2_MESI); p6counter c2(p6counter::DCU_MEMORY_REFERENCE); const int c = 10000000; static int a[c]; for (int ai1 = 0; ai1 < c; ai1++) a[ai1]++; SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS); uint64 t1 = c1; uint64 t2 = c2; for (int ai2 = 0; ai2 < c; ai2++) a[ai2] *= 13; printf("L1 misses = %I64d\nL1 accesses = %I64d\n", c1 - t1, c2 - t2); }
We walk an array of 10000000 integers, multiplying each element by 13 (a load access, followed by a store access, i.e. 2 accesses per element). Also because the L1 line size is 32 bytes, we have 8 elements per line or about 12500000 cache lines accessed (all misses). This totals up to 20000000 memory accesses and 12500000 L1 misses. The excess of 1495 misses and 10728 memory accesses in the results below is due to OS noise, the amount of which (<<1%) is quite acceptable.
L1 misses = 1251495 L1 accesses = 20010728
The code of the example employs many techniques to reduce the noise during measurements. Here are the most important things you need to keep in mind when monitoring performance in this setting:
- Microsoft Windows NT / 2K / XP does not allocate all the memory your process requested instantly after the request. Rather pages are allocated when they are first accessed. This means that when you access a memory page for the first time, a page fault occurs and the OS takes over. The instructions executed by the OS exception handler can be millions, resulting in excessive noise in the measurements. For this reason the code above walks the array in advance to make sure all pages are present in memory when the counting starts.
- Because Microsoft Windows NT / 2K / XP is a preemptive multitasking operating
system, our program is not the only thing running on the machine. Performance
counters are in the CPU and they count for all processes simultaneously.
In order to reduce foreign code noise, it is advisable to boost the priority
of your process to maximum level (real-time priority). This setting will
reserve the machine almost exclusively to your application and the overall
responsiveness might seem jerky until the program terminates. The code above
achieves the priority boost by the SetPriorityClass Windows system call:
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
- Last but not least, make sure you avoid obvious counting overlaps. An example would be to split the final printf statement in the example above in to different function calls. Note that the current counter value is read when the '-' sign is evaluated. Thus if you print the delta of the first counter (cache misses in this case) in a separate function call to printf, the second counter (memory references in this case) will count the data accesses performed during this function call as well.
Future Work
Although this document seems quite long, it is more of a draft than something completed.
There are many (orthogonal) directions this work can be extended.
First priority is of course implementing ia32counter subclasses (like p6counter) for other processor families, like Intel Pentium 4, Intel Ithanium and different models of AMD. I believe it is important to understand the specifics of Intel P4, as it is the first processor ever to provide precise event-based sampling performance monitoring. What this means is that one can get the processor state when an event (e.g. cache miss) occurs, so the exact instruction causing the miss is known. This can further facilitate the preciseness of research methods in this area.
Another direction is to extend the CPU detection procedure with empirical measurements that can detect memory hierarchy in conventional software (a la HW1 cs612). As processors become more and more sophisticated from hardware point of view, this task becomes harder and harder, but I believe it is still doable. This is very important step if we want to build compilers that dynamically tune themselves to the current CPU (possibly a CPU that did not exist when the compiler was released!)
Last, I am not sure how important this is, but this document is way too long and needs better structure and probably some factoring. If the library grows bigger, better documentation will be needed or it will be yet one of these public domain things that you need to read all the headers before starting to use it. I said this before, and I will repeat it again: If you ever plan to use this thing, please, please give feedback. Contributions are also more than welcome, but I would suggest if you have an idea to coordinate it with me, as there is good chance it is already under way...
So far I am not worried if this piece of software is useful or not. For sure it is useful for me. I bet it would also be useful for cs612... I hope it is useful for you too. Good luck!
References
- Intel IA-32 Developper Manuals v.1 - 3, http://www.intel.com
- http://www.sandpile.org