**Subjective
Information Measure **

**and
Rate Fidelity Theory**

Chen-Guang Lu

Independent Researcher[①]

email: survival99@hotmail.com

*Abstract***--With
help of fish-covering model, this paper intuitively explains how to extend Hartley’s information formula to the generalized
information formula step by step for
measuring subjective information: metrical information (such as conveyed
by thermometers), sensory information (such as conveyed by color vision), and
semantic information (such as conveyed by weather forecasts). The pivotal step
is to differentiate condition probability and logical condition probability of
a message. The paper illustrates the rationality of the formula, discusses the
coherence of the generalized information formula and Popper’s knowledge
evolution theory. For optimizing data compression, the paper discusses
rate-of-limiting-errors and its similarity to complexity-distortion based on
Kolmogorov’s complexity theory, and improves the rate-distortion theory into
the rate-fidelity theory by replacing Shannon’s distortion with subjective
mutual information. It is proved that both the rate-distortion function and the
rate-fidelity function are equivalent to a rate-of-limiting-errors function
with a group of fuzzy sets as limiting condition, and can been expressed by a
formula of generalized mutual information for lossy coding, or by a formula of
generalized entropy for lossless coding. By analyzing the rate-fidelity
function related to visual discrimination and quantizing bits of pixels of images,
the paper concludes that subjective information is less than or equal to objective
(Shannon’s) information; there is an optimal matching point at which two kinds
of information are equal; the matching information increases with visual discrimination
(defined by confusing probability) rising; for given visual discrimination, too
high resolution of images or too much objective information is wasteful.**

*Index Terms***--Shannon’s
theory, generalized information theory, subjective information, metrical
information, sensory information, semantic information, Popper’s theory,
complexity-distortion, rate-distortion, rate-fidelity.**

To measure sensory information and semantic
information, I set up a generalized information theory thirteen years ago [4-8]
and published a monograph focusing on this theory in 1993 [5]. But, my researches
are still rarely known by English researchers of information theory. Recently,
I read some papers about complexity distortion theory [2], [9] based on
Kolmogorov’s complexity theory. I found that, actually, I had discussed
complexity-distortion function and proved that the generalized entropy in my
theory was just such a function, and had concluded that the complexity-distortion
function with size-unequal fuzzy error-limiting balls could be expressed by a
formula of generalized mutual information. I also found that some researchers did
some efforts [9] similar to mine for improving Shannon’s rate-distortion theory.

This paper first explains how to extend Hartley’s
information formula to the generalized information formula, and then discusses the generalized mutual
information and some questions related to Popper’s theory, complexity
distortion theory, and rate-distortion theory.

Hartley’s information formula is [3]

*I*=log*N,
*(1)

where *I* denotes the information
conveyed by the occurrence of one of
*N* events with
equal probability. If a message *y* tells that uncertain extension changes
from *N*_{1 }to *N*_{2},
then information conveyed by *y* is

*I _{r}*=

We call (2) relative information formula. Before
discussing its properties, I tell a story about covering fish with fish covers.

Figure 1. Fish-covering model for relative information*
I _{r}*

Fish covers are made of bamboo. A fish cover looks
like a hemisphere with a round hole at top for human hand to catch fish. Fish
covers are suitable for catching fish in adlittoral pond. When I was a teenage,
after seeing peasants catch fish with fish covers, I found a basket with a hole
at bottom and followed those peasants to catch fish. Fortunately, I
successfully caught some fish. Then I compared my basket with much bigger fish
covers to concluded that the fish cover is bigger so that covering fish is
easier; yet, catching fish with hand is more difficult; if a fish cover is big
enough to cover the pond, it must be able to cover fish; yet, it is useless
because catching fish with hand is the same difficult as before; when one uses
the basket or smaller fish cover to cover fish, though covering fish is more
difficult, catching fish with hand is much easier.

An uncertain event is alike a fish with random position
in a pond. Let a sentence *y*=” Fish is covered”; *y* will convey
information about the position of fish. Let *N*_{1 }be the area of
the pond, *N*_{2} be the area covered by the fish cover, then
information conveyed by *y* is *I _{r}*=log(

Hartley’s information formula requires *N*
events with equal probability *P*=1/*N*. Yet, the probabilities of
events are unequal in general. For example, the fish stays in deep water in
bigger probability and in shallow water in smaller probability. In these cases,
we need to replace 1/*N* with probability *P* so that

*I*=log(1/*P*)
(3)

and

*I _{r}*

** **Let *X* denote the random variable taking
values from set *A*={*x*_{1}, *x*_{2},
…} of events, *Y* denote the random variable taking values from set *B*={*y*_{1},
*y*_{2},…} of sentences or messages. For each *y _{j}*,
there is a subset

_{}* ^{}*.

For convenience, we call this
formula as the fish-covering information formula.

Note that the most important
thing is generally *P*(*x _{i}*|

*P*(*x _{i}*|

yet,

*P*(*x _{i}*|

where *y _{j}* may be an incorrect reading
datum, a wrong message, or a lie, yet,

* _{}*,

whose average is just Shannon mutual
information [11].

Let the feature function of set *A _{j} _{ }*be

*P*(*x _{i}* |

where* _{}^{}. *From (5) and (7), we have

* _{}, *(8)

which (illustrated by Figure 2) is the
transition from classical information formula to generalized information
formula.

Figure 2. Illustration of fish-covering information
formula related to Bayesian formula

The reading datum of a thermometer may be considered
as reporting sentence *y _{j}*∈

Information conveyed by a reading datum of thermometer
and information conveyed by a forecast “The rainfall will be about 10 mm” are
the same in essence. Using a clear set as condition as above is not good enough
because the information amount should change with *x _{i}*
continuously. We wish that the bigger the error (i.e.

Now, we consider *y _{j}* as sentence “

Actually, the confusing probability *Q*(*A _{j}*|

*Q*(*A _{j}*|

=confusing probability or similarity of *x _{i}*
with

=membership grade of *x _{i}* in

=logical probability or creditability of proposition *y _{j}*(

The discrimination of human sense organs, such as visual
discrimination for gray levels of pixels of images, can also be described by confusing
probability functions. In these cases, a sensation can be considered as a
reading datum *y _{j}*=

Figure 3 Confusing probability function from clear
sets

First we do many times experiments to get the clear sets *s _{jk}*,

* _{}
*(9)

Now, replacing a clear set with a fuzzy set as
condition, we get the generalized information formula:

* _{} * (10)

It looks the same as the fish-covering information
formula (8), but* _{ }Q*(

Figure 4
Generalized information formula for measuring metrical information, sensory
information, and number-forecasting information

Figure 4 tells us that when a reading datum or a
sensation* y _{j}*=

The generalized information formula can also be used
to measure semantic information in general, such as information from weather
forecast “Tomorrow will be rainy or heavy rainy”. We may assume that for any proposition *y _{j}*,
there is a Plato’s idea

From my view-point, forecasting information is more
general information in comparison with descriptive information. If a forecast
is always correct, then the forecasting information will become descriptive
information.

About the criterion of advance of scientific theory, philosopher
Karl Popper wrote:

*“The criterion of relative potential satisfactoriness…
characterizes as preferable the theory which tell us more; that is to say, the
theory which contains the greater amount of empirical information or content;
which is logically strong; which has the greater explanatory and predictive
power; and which can therefore be more severely tested by comparing predicted
facts with observations. In short, we prefer an interesting, daring, and highly
informative theory to a trivial one.*” ( in [10], pp. 250)

Clearly, Popper used information as the criterion to
value the advance of scientific theories. According to Popper’s theory, the
more easily a proposition is falsified logically and the more it can go through
facts (in my words, the less the prior logical probability *Q*(*A _{j}*)
is, and the bigger the posterior logical probability

Calculating the average of *I*(*x _{i}*;

* _{}. *(11)

Actually, the probabilities on right of log should be
prior probabilities or logical probabilities, the probability on left of log
should be posterior probability. Since now we differentiate two kinds of
probabilities and use *Q*(.) for those probabilities after log. Hence the above
formula becomes

* _{}. *(12)

We can prove that as* Q*(*X|A _{j}*)=

Further, we have generalized mutual information
formula

_{}

(13)

where

* _{}
*(14)

* _{} *(15)

* _{}
* (16)

* _{} * (17)

I call *H*(*X*) forecasting entropy, which
reflects the average coding length when we economically encode *X* according
to *Q*(*X*) while real source is *P*(*X*), and reaches its
minimum as *Q*(*X*)=* P*(*X*). I call *H*(*X*|*Y*) posterior
forecasting entropy, call *H*(*Y*) generalized entropy, and call *H*(*Y*|*X*) generalized condition entropy or fuzzy
entropy [6].

I think that the generalized information is subjective
information and Shannon information is objective information. If two weather
forecasters always provide opposite forecasts and one is always correct and
another is always incorrect. They convey the same objective information, but
the different subjective information. If *Q*(*X*)=* P*(*X*) and* Q*(*X|A _{j}*)=

In [5], I defined rate-of-limiting-errors, which is
similar to complexity distortion [2]. The difference is that the error-limiting
condition for rate-of-limiting-errors is a group of sets or fuzzy sets *A _{J=}*
{

We know that the color space of digital images is
visually ununiform and human eyes’ discrimination is fuzzy. So, in some cases,
such as coding for digital images, using size-unequal balls or fuzzy balls as
limiting condition will be more reasonable.

Assume *P*(*Y*) is a source; encode *Y*
into *X*; allow *y _{j}* is encoded into any

I had proved that *R*(*A _{J}*)=

Furthermore, when the limiting sets are fuzzy, i.e. *P*(*X|y _{j}*)
≤

* _{} *(18)

To realize this rate, there must be *P*(*X*)=* Q*(*X*)
and *P*(*X|y _{j}*)=

Now, from the view-point of the complexity distortion
theory, the generalized entropy *H*(*Y*) is just prior complexity,
the fuzzy entropy *H*(*Y*|*X*) is just the posterior complexity,
and *I*(*X*;*Y*) is the reduced complexity.

Actually, Shannon ever mentioned fidelity criterion
for lossy coding. He used the distortion as the criterion for optimizing lossy
coding because the fidelity criterion is hard to be formulated. However, distortion is not a good criterion
in most cases.

How do we value a person? We value him according to
not only his errors but also his contributions. For this reason, I replace the error
function *d _{ij}*=

In a way similar to that in the classical information
theory [1], we can obtain the expression of function *R*(*G*) with
parameter *s*:

_{ }(19)

where *s*=*dR/dG *indicates the
slope of function *R*(*G*) ( see Figure 5) and

_{}.

We define a group of sets *B _{I=}* {

_{} (20)

where *m* is the maximum of exp(*sI _{ij}*);
then from (19) and (20) we have

_{ }(21)

This function is just the rate-of-limiting-errors with
a group of fuzzy sets *B _{I}=*{

In [7], I defined information value *V* by the increment of growing speed of
fund because of information, and
suggested to use the information value as criterion to optimize communication
in some cases to get function rate-value *R*(*V*), which is also meaningful.

For simplicity, we consider how subjectively visual
information is related to visual discrimination and quantizing grades of gray
levels of pixels of images,.

Let the gray level of quantized pixel be a source and
the gray level is *x _{i}=i, i*=0, 1...

(22)

where *d* is discrimination parameter.
The smaller the *d*, the higher the discrimination.

Figure 5
Relationship between *d* and *R*(*G*) for *b*=63

Figure 5 indicates that when *R*=0, *G*<0,
which means that if a coded image has nothing to do with the original image, we
still believe it reflects the original image, then the information will be
negative. When *G*=-2, *R*>0, which means that certain objective
information is necessary when one uses lies to deceive enemy to some extent; or
say, lies based on facts are more terrible than lies based on nothing. The each
line of function *R*(*G*) is tangent with the line *R*=*G*,
which means there is a matching point at which objective information is equal
to subjective information, and the higher the discrimination (the less the *d*),
the bigger the matching information amount. The slope of *R*(*G*)
becomes bigger and bigger with *G* increasing, which tell us that for
given discrimination, it is limited to increase subjective information.

Figure 6 tells us that for given discrimination, there
exists the optimal quantizing-bit *k*' so that the matching value of *G*
and *R* reaches the maximum. If *k*<*k*', the matching
information* *increases with *k*; if* k*>*k*', the
matching information no longer increases with *k*. This means that too
high resolution of images is unnecessary or uneconomical for given visual
discrimination.

Figure 6
Relationship between matching value of *R* with* G*,
discrimination parameter *d*, and quantizing bit *k*

[1] T. Berger, *Rate Distortion Theory*,
Englewood Cliffs, N.J.: Prentice-Hall, 1971.

[2] M. S. Daby and E. Alexandros,
“Complexity Distortion Theory”, *IEEE Tran. On Information Theory,* Vol.
49, No. 3, 604-609, 2003.

[3] R. V. L. Hartley, “Transmission of
information”, *Bell System Technical Journal, *7 , 535, 1928.

[4] C.-G. Lu, “Coherence between the
generalized mutual information formula and Popper's theory of scientific
evolution”(in Chinese),* J. of Changsha University*, No.2, 41-46, 1991.

[5] C.-G. Lu, *A
Generalized Information Theory* (in Chinese), China Science and Technology
University Press, 1993

[6] C.-G. Lu, “Coding meaning of generalized
entropy and generalized mutual information” (in Chinese), *J. of China
Institute of Communications*, Vol.15, No.6, 38-44, 1995.

[7] C.-G. Lu, *Portfolio’s Entropy Theory
and Information Value*, (in Chinese),
China Science and Technology University Press, 1997

[8] C.-G. Lu, “A generalization of Shannon's
information theory”, *Int. J. of General Systems*, Vol. 28, No.6, 453-490,
1999.

[9] G. Peter and P. Vitanyi, Shannon
information and Kolmogorov complexity, *IEEE Tran. On Information Theory, * submitted, http://homepages.cwi.nl/~paulv/papers/info.pdf

[10] K. Popper, *Conjectures
and Refutations—the Groth of Scientific Knowledge*, Routledge, London and
New York, 2002.

[11] C. E.
Shannon, “A mathematical theory of communication”, *Bell System Technical
Journal, *Vol. *27, *pt. I, pp. 379-429; pt. II, pp. 623-656, 1948.

[12] P. Z. Wang, *Fuzzy Sets and Random Sets Shadow
*(in Chinese), Beijing Normal University Press, 1985.