A measure of the degree to which a word contributes to differentiating a particular category from other categories
CPD(w)=max(CPD(w,c))
CPD(w,a) = (Aw-Bw)/(Aw+Bw)
Where,
Aw : number of documents of the class A with the word w
Bw : number of documents not of the class A with the word w
CPD for a word is the maximum CPD per class
Example
Grain | Trade | Interest | Agriculture | CPD(grain) | CPD(trade) | CPD(interest) | CPD(agriculture) | CPD | |
wheat | 25 docs | 0 | 0 | 0 | (25-0)/(25+0)=1 | (0-25)/(0+25)=-1 | (0-25)/(0+25)=-1 | (0-25)/(0+25)=-1 | 1 |
economy | 15 docs | 15 docs | 15 docs | 15 docs | (15-45)/(15+45)=-0.5 | (15-45)/(15+45)=-0.5 | (15-45)/(15+45)=-0.5 | (15-45)/(15+45)=-0.5 | -0.5 |
quotas | 1 doc | 50 docs | 1 docs | 1 docs | (1-52)/(1+52)=-0.96 | (50-3)/(50+3)=0.89 | (1-52)/(1+52)=-0.96 | (1-52)/(1+52)=-0.96 | 0.89 |
Intuitively, we can say that
With a CPD(grain) of 1, wheat is totaly linked with the grain category
With a CPD of -0.5 in every category, economy is not linked with the any category
With a CPD(trade) of 0.89, quotas is strongly linked with the trade category
The CPD of word approach 1 when a word appears only in one category and approach -1 when the word appears an equal number of times in every category (and the number of categories increase)
reference : Categorical Proportional Difference: A Feature Selection Method for Text Categorization Mondelle Simeon Robert Hilderman