Abstract
This paper aims to show that model validation is of great importance to ensure the predictive accuracy of a statistical model. By extending the use of logistic regression analysis, it further demonstrates the value of logistic modelling of non-discrete linguistic categories in language performance. This statistical technique is illustrated on a corpus-based study of the theory on the grammatical factors for the oscillation between the use and omission of the definite article preceding multi-word organization names (e.g. the Foreign Office, Mansfield College) in the English language. By validating the preliminary model on fresh corpora, the final logistic model can capture more precisely the gradience in the grammatical factors that affect article usage preceding multi-word organization names. As the logistic model is a model of language in use rather than a purely statistical model, this paper further translates the regression coefficients into the probability statements that a name is favouring the use of the definite article.
Original language | English |
---|---|
Pages (from-to) | 287-313 |
Number of pages | 27 |
Journal | Literary and Linguistic Computing |
Volume | 18 |
Issue number | 3 |
DOIs | |
Publication status | Published - Sept 2003 |