![]() | What If Home | Product Overview | Features and Benefits | Throughput Analysis | Latency Analysis | Technical Requirements | Discussion Forum | Blog |
Product Overview
Intel® Architecture Code Analyzer helps you statically analyze the data dependency, throughput and latency of code snippets on Intel® microarchitectures. The term kernel is used throughout the rest of this document instead of code snippet.
Features and Benefits
For a given binary, Intel® Architecture Code Analyzer:
- Performs static analysis of kernel throughput and latency under ideal front-end, out-of-order engine and memory hierarchy conditions.
- Identifies the binding of the kernel instructions to the processor ports.
- Identifies kernel critical path.
The Intel® Architecture Code Analyzer enables you to do a first order estimate of relative kernel performance on different micro architectures. The Intel® Architecture Code Analyzer does not provide absolute performance numbers.
Intel® Architecture Code Analyzer is a command-line tool with ASCII output. It handles one or more kernels that are marked for analysis within an executable, a shared library, or an object file.
Throughput Analysis
The Throughput Analysis treats the kernel as a body of an infinite loop. It computes the kernel throughput and highlights its bottlenecks.
The Throughput Analysis report contains the following whole kernel information:
- Throughput of the analyzed kernel, counted in cycles.
- The kernel bottleneck: front-end, port #, divider unit or inter-iteration dependency.
- Total number of cycles each processor port was bound with micro-ops.
The Throughput Analysis also provides the following information per instruction:
- Number of instruction micro-ops.
- Average number of cycles the instruction was bound to each processor port, per loop iteration
- An indication whether the instruction is on the critical path of the analyzed kernel.
- Instruction disassembly in Intel® Software Developer’s Manual (MASM) style.
Latency Analysis
The Latency Analysis treats the kernel as a sequence of instructions that is executed once. It computes the latency of the kernel execution from its first to its last instruction and identifies all resource conflicts within the kernel.
The Latency Analysis report contains the following information:
- Latency of the analyzed kernel, counted in cycles.
- Instructions that were delayed due to resource conflicts
- The instructions on the critical path (CP) resulting from data dependencies and resource conflicts.
- Total resource conflict delay for each execution unit.
- Dependencies between instructions due to resource conflicts.
Technical Requirements
Intel® Architecture Code Analyzer is a command-line utility that can analyze a kernel, contained in a binary file, that is delimited with special markers. The tool is capable of analyzing both IA-32 and Intel® 64 code, including Intel® AVX and AVX2 instructions.
Intel® Architecture Code Analyzer is available on Windows*, Linux*, and Mac OS X* operating systems. Both IA-32 and Intel® 64 operating systems are supported. Intel® 64 code can be analyzed on IA-32 operating systems and vice versa.
Release Notes for 2.1
- Added support for Intel® microarchitecture codenamed Haswell.
- Added support for MSVS64 compiler.
- Added 64-bit binaries.
Release Notes for 2.0.1
- Fixed a bug where –graph option failed to produce graph file.
Release Notes for 2.0
- Added support for Intel® microarchitecture codenamed Sandy Bridge. This replaces the Intel® AVX microarchitecture previously in Intel® Architecture code Analyzer.
- Added support for Intel® microarchitecture codenamed Ivy Bridge.
- Added support for Mac OS X.
- Improved analyzer algorithm for throughput analysis
(new analysis output, see more details in User Manual) - Improved analyzer algorithm for latency analysis, output also includes microarchitecture events that will affect the latency. (new analysis output, see more details in the User Manual)
- Added support for graphic output of the dependency graph
Release Notes for 1.1.3
- Fixed a bug where using -o option produced truncated output
- Fixed IACA_UD_BYTES definition in iacaMarks.h to include {}.
Release Notes for 1.1.2
- Intel® Architecture Code Analyzer now supports adding START and END marks in code compiled with Visual C++ compiler (64-bit). See iacaMarks.h
- Intel® Architecture Code Analyzer now supports multiple block analysis. You can direct the tool to analyze the n'th block that is delimited with analyzer marks. When used with n=0, all surrounded blocks in the file are analyzed and the output contains separate reports per block.
Release Notes for 1.1.1
- Fixed Intel® AVX zero idiom instructions wrong identification
- Fixed empty code blocks (containing only zero idiom instructions / not supported instructions) crashing the analyzer
- Fixed Analyzer arch nehalem option to treat AES and PCLMUL instructions as illegal. These aren't supported on Intel® microarchitecture codename Nehalem.
- Changed analyzer marks to abort if the binary is executed. To deactivate the marks when building for execution #define IACA_MARKS_OFF or use -DIACA_MARKS_OFF option in the compiler command line. Binaries with active marks should be used for analysis only.
Release Notes for 1.1
- Intel® Architecture Code Analyzer is now hosted on Linux* operating systems, in addition to Windows* operating systems. Both IA-32 and Intel® 64 operating systems are supported.
- Intel® Architecture Code Analyzer now supports two existing Intel® processors: Intel® microarchitecture, codenamed Nehalem and Westmere
- Two critical path types are detected:
- DATA_DEPENDENCY critical path (similar to previous releases - reflects instruction data dependencies only)
- PERFORMANCE critical path (new - reflects port conflicts and front-end pressure, as well)
Release Notes for 1.0.2
- Ignoring pop ebx / push ebx that Intel® Architecture Code Analyzer Markers add to IA32 code
- Fixed misclassifying rcp / rsqrt as divider operations
Release Notes for 1.0.1
- Graceful handling of unsupported instructions, they are quietly ignored in the analyzed block analysis and do not impact the throughput and latency calculations.
- A few unsupported instructions are now supported, e.g. CMOV instruction family
- Intel® AVX to Intel® SSE code switch detection. The performance penalty associated with such code switch is noted but not accounted for.