TY - JOUR
T1 - Leveraging statistical information in fine-grained financial sentiment analysis
AU - Zhang, Han
AU - Li, Zongxi
AU - Xie, Haoran
AU - Lau, Raymond Y.K.
AU - Cheng, Gary
AU - Li, Qing
AU - Zhang, Dian
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/3
Y1 - 2022/3
N2 - The recent development of deep learning-based natural language processing (NLP) methods has fostered many downstream applications in various fields. As one of the applications in the financial industry, fine-grained financial sentiment analysis (FSA) aims to understand the sentimental orientation, i.e., bullish or bearish, of financial texts by predicting the polarity score and has been widely applied in the financial industry stock-related opinion mining. Because of the lack of a large-scale labeled dataset and the domain-dependent nature, FSA is challenging. Previous works mainly focus on constructing and exploiting handcrafted lexicons that encode expert knowledge to enhance the semantic features in decision making, which yields improvements but are expensive to acquire. This paper proposes a lightweight regression model incorporating the statistical distribution of a term over the polarity range, say between − 1 and 1, to address the fine-grained FSA task. More concretely, we first count each word’s appearance at different polarity intervals and produce a statistic-based representation for each text, which will be encoded as a corpus-level statistical feature vector by an autoencoder. Subsequently, the obtained feature vector will be integrated with the semantic feature vector in the regression model. Our experiments show such a model can produce significant improvements compared with the baseline models on two FSA subsets, i.e., news headlines and microblogs, without a computational overhead. Furthermore, we notice the signs that lexicon-based approaches have neglected can play an important role in FSA.
AB - The recent development of deep learning-based natural language processing (NLP) methods has fostered many downstream applications in various fields. As one of the applications in the financial industry, fine-grained financial sentiment analysis (FSA) aims to understand the sentimental orientation, i.e., bullish or bearish, of financial texts by predicting the polarity score and has been widely applied in the financial industry stock-related opinion mining. Because of the lack of a large-scale labeled dataset and the domain-dependent nature, FSA is challenging. Previous works mainly focus on constructing and exploiting handcrafted lexicons that encode expert knowledge to enhance the semantic features in decision making, which yields improvements but are expensive to acquire. This paper proposes a lightweight regression model incorporating the statistical distribution of a term over the polarity range, say between − 1 and 1, to address the fine-grained FSA task. More concretely, we first count each word’s appearance at different polarity intervals and produce a statistic-based representation for each text, which will be encoded as a corpus-level statistical feature vector by an autoencoder. Subsequently, the obtained feature vector will be integrated with the semantic feature vector in the regression model. Our experiments show such a model can produce significant improvements compared with the baseline models on two FSA subsets, i.e., news headlines and microblogs, without a computational overhead. Furthermore, we notice the signs that lexicon-based approaches have neglected can play an important role in FSA.
KW - Financial sentiment analysis
KW - Information retrieval
KW - Natural language processing
KW - Sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85124272402&partnerID=8YFLogxK
U2 - 10.1007/s11280-021-00993-1
DO - 10.1007/s11280-021-00993-1
M3 - Article
AN - SCOPUS:85124272402
SN - 1386-145X
VL - 25
SP - 513
EP - 531
JO - World Wide Web
JF - World Wide Web
IS - 2
ER -