In order for removing the drawback of the traditional visual saliency detection methods which solely used the information of current viewing image or prior knowledge, this paper proposes an information theoretic algorithm to combine the long-term features which imply the prior knowledge with short-term features which imply the information of current viewing image. Firstly, a long-term sparse dictionary and short-term sparse dictionary are trained using the eye-tracking data and current viewing image, respectively. Their corresponding sparse codes are regarded as the long-term and short-term features, respectively. Secondly, to reduce the problem of existing methods which derivated features on the entire image or a local neighborhood with the fixed size, an information entropy based the estimation method of probability distribution of features is proposed. This method can infer an optimal size of region adaptively according to the characteristics of the current viewing image for the calculation of probability of the appearance of long-tern and short-term features. Finally, the saliency map is formulated by Shannon self-information. The subjective and quantitative comparisons with 8 state-of-the-art methods on publicly available eye-tracking databases demonstrate the effectiveness of the proposed method.