### Following the Guide--An Odd Problem with Linking MKL

Some days ago, I asked a problem on Stackoverflow, which is about mkl linking.

Surprisingly, I found no one else has encountered a similar problem. Thus, I post the scenarios in the blog.

Here's the link.

This problem could be solved by configuring the following options other than the single dynamic library:

Surprisingly, I found no one else has encountered a similar problem. Thus, I post the scenarios in the blog.

Here's the link.

## How Do I Discover the Problem?

In blitz, my deep learning framework, I have to mix openmp, blas, and cuda together. However, since my icc is the newest such that current nvcc 7.5 does not support it, I use g++, mkl and nvcc instead.

I used the

*single dynamic library*that could simplify many options by the following command:-fopenmp -lmkl_rt

But the thing is, my simple MNIST model could not generate a correct result! After a long period of debugging, I found multi-thread sgemm worked not in an expected way. Well, it really shocked me because I never wonder a famous vender product is wrong.

## Following the Guide

By reading many documents, I found that by single dynamic library, mkl does not use GNU-thread internally:

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/657060

Hence, some problems might occur because I used g++. But where exactly the problem lies? Nonetheless I could read the source codes of mkl.

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/657060

Hence, some problems might occur because I used g++. But where exactly the problem lies? Nonetheless I could read the source codes of mkl.

Thankfully, there's a linking guide provided by intel--advisor.

This problem could be solved by configuring the following options other than the single dynamic library:

Export MKL_INTERFACE_LAYER = GNU + export MKL_THREADING_LAYER = GNU

-lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lpthread -ldl

## More problems

Using g++ compiler, I figure out that the convolution layer after a pooling layer is much slower that compiled by icc. This is another mysterious problem for me. Fortunately, my boss does not force me to solve it, instead he only want one of the compiler works efficiently under a backend, either GPU or CPU.