Professor （retired, Living In Canada）
Liaoning Engineering And Technology University
The factor space is also the feature space of instances or true classes. We use “factor” because observed factors instead of any property are used as features. In detections, estimations, and classifications, signal X is mixed with noise to become Z, or the factor Z of class X is observed. We need classifier Y=f(Z) to predict X by Y. Shannon and others use minimum average distortion instead of Maximum Mutual Information (MMI) as the optimization criterion because it is very hard to solve the MMI. Without f(Z), we cannot express mutual information I(X; Y). Without the expression, we cannot optimize f(Z). An expedient method is first to construct likelihood functions or Shannon's channels with parameters, and then to search the parameter space by the gradient descent or the Newton Method. According to the semantic information G theory, we can use a simple iteration algorithm to resolve this problem. The semantic information measure is Iij=log[T(θj|xi)/T(θj)], where T(θj|xi) is a truth function, and T(θj) is its average. The average G of Iij is the Semantic Mutual Information (SMI). For given f(Z), there is information Iij*=log[P(yj|xi)/P(yj)]. In this algorithm, Step I is to let Iij= Iij*, and Step II is to optimize f(Z) to maximize G by the KL formula. Our experiments show that 2-3 iterations can make I(X;Y) reach 99% of the MMI in most cases. The convergence can be proved with the R(G) function, which is an improved R(D) function. The R(G) function is a bowl-like curve with a matching point G=R, which means R≥G always. The Step I makes G=R and produces a new R(G) with a higher matching point. The Step II makes G climb to the upper right corner of the new R(G) function. Repeating the two steps can achieve the MMI. Reference: https://arxiv.org/a/lu_c_3.html
Fig. 1 The Maximum Mutual Information (MMI) Classifications of a Factor Space. The initial partition of Z-space is made by two vertical lines. After two iterations, the partition is made by three curves so that the mutual information reaches 99.99 of the MMI.