,

CPD - Categorical Proportional Difference

A measure of the degree to which a word contributes to differentiating a particular category from other categories


CPD(w)=max(CPD(w,c))
CPD(w,a) = (Aw-Bw)/(Aw+Bw)

Where,
Aw : number of documents of the class A with the word w
Bw : number of documents not of the class A with the word w

CPD for a word is the maximum CPD per class

Example


GrainTradeInterestAgricultureCPD(grain)CPD(trade)CPD(interest)CPD(agriculture)CPD
wheat25 docs000(25-0)/(25+0)=1(0-25)/(0+25)=-1(0-25)/(0+25)=-1(0-25)/(0+25)=-11
economy15 docs15 docs15 docs15 docs(15-45)/(15+45)=-0.5(15-45)/(15+45)=-0.5(15-45)/(15+45)=-0.5(15-45)/(15+45)=-0.5-0.5
quotas1 doc50 docs1 docs1 docs(1-52)/(1+52)=-0.96(50-3)/(50+3)=0.89(1-52)/(1+52)=-0.96(1-52)/(1+52)=-0.960.89


Intuitively, we can say that

With a CPD(grain) of 1, wheat is totaly linked with the grain category
With a CPD of -0.5 in every category, economy is not linked with the any category
With a CPD(trade) of 0.89, quotas is strongly linked with the trade category

The CPD of word approach 1 when a word appears only in one category and approach -1 when the word appears an equal number of times in every category (and the number of categories increase)



reference : Categorical Proportional Difference: A Feature Selection Method for Text Categorization Mondelle Simeon Robert Hilderman