Abstract
An important way to improve the performance of naive Bayesian classifiers (NBCs) is to remove or relax the fundamental assumption of independence among the attributes, which usually results in an estimation of joint probability density function (p.d.f.) instead of the estimation of marginal p.d.f. in the NBC design. This paper proposes a non-naive Bayesian classifier (NNBC) in which the independence assumption is removed and the marginal p.d.f. estimation is replaced by the joint p.d.f. estimation. A new technique of estimating the class-conditional p.d.f. based on the optimal bandwidth selection, which is the crucial part of the joint p.d.f. estimation, is applied in our NNBC. Three well-known indexes for measuring the performance of Bayesian classifiers, which are classification accuracy, area under receiver operating characteristic curve, and probability mean square error, are adopted to conduct a comparison among the four Bayesian models, i.e., normal naive Bayesian, flexible naive Bayesian (FNB), the homologous model of FNB $ ({\rm FNB}-{\rm ROT})$, and our proposed NNBC. The comparative results show that NNBC is statistically superior to the other three models regarding the three indexes. And, in the comparison with support vector machine and four boosting-based classification methods, NNBC achieves a relatively favorable classification accuracy while significantly reducing the training time.
Original language | English |
---|---|
Article number | 6471192 |
Pages (from-to) | 21-39 |
Number of pages | 19 |
Journal | IEEE Transactions on Cybernetics |
Volume | 44 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2014 |
Externally published | Yes |
Keywords
- Joint probability density estimation
- kernel function
- naive Bayesian classifier (NBC)
- optimal bandwidth
- probability mean square error