TY - JOUR
T1 - Two-dimensional data partitioning for non-negative matrix tri-factorization
AU - Yan, Jiaxing
AU - Liu, Hai
AU - Lei, Zhiqi
AU - Rao, Yanghui
AU - Liu, Guan
AU - Xie, Haoran
AU - Tao, Xiaohui
AU - Wang, Fu Lee
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/8/28
Y1 - 2024/8/28
N2 - As a two-sided clustering and dimensionality reduction paradigm, Non-negative Matrix Tri-Factorization (NMTF) has attracted much attention in machine learning and data mining researchers due to its excellent performance and reliable theoretical support. Unlike Non-negative Matrix Factorization (NMF) methods applicable to one-sided clustering only, NMTF introduces an additional factor matrix and uses the inherent duality of data to realize the mutual promotion of sample clustering and feature clustering, thus showing great advantages in many scenarios (e.g., text co-clustering). However, the existing methods for solving NMTF usually involve intensive matrix multiplication, which is characterized by high time and space complexities, that is, there are limitations of slow convergence of the multiplicative update rules and high memory overhead. In order to solve the above problems, this paper develops a distributed parallel algorithm with a 2-dimensional data partition scheme for NMTF (i.e., PNMTF-2D). Experiments on multiple text datasets show that the proposed PNMTF-2D can substantially improve the computational efficiency of NMTF (e.g., the average iteration time is reduced by up to 99.7% on Amazon) while ensuring the effectiveness of convergence and co-clustering.
AB - As a two-sided clustering and dimensionality reduction paradigm, Non-negative Matrix Tri-Factorization (NMTF) has attracted much attention in machine learning and data mining researchers due to its excellent performance and reliable theoretical support. Unlike Non-negative Matrix Factorization (NMF) methods applicable to one-sided clustering only, NMTF introduces an additional factor matrix and uses the inherent duality of data to realize the mutual promotion of sample clustering and feature clustering, thus showing great advantages in many scenarios (e.g., text co-clustering). However, the existing methods for solving NMTF usually involve intensive matrix multiplication, which is characterized by high time and space complexities, that is, there are limitations of slow convergence of the multiplicative update rules and high memory overhead. In order to solve the above problems, this paper develops a distributed parallel algorithm with a 2-dimensional data partition scheme for NMTF (i.e., PNMTF-2D). Experiments on multiple text datasets show that the proposed PNMTF-2D can substantially improve the computational efficiency of NMTF (e.g., the average iteration time is reduced by up to 99.7% on Amazon) while ensuring the effectiveness of convergence and co-clustering.
KW - 2-Dimensional data partitioning
KW - Non-negative matrix tri-factorization
KW - Text co-clustering
UR - http://www.scopus.com/inward/record.url?scp=85196941038&partnerID=8YFLogxK
U2 - 10.1016/j.bdr.2024.100473
DO - 10.1016/j.bdr.2024.100473
M3 - Article
AN - SCOPUS:85196941038
VL - 37
JO - Big Data Research
JF - Big Data Research
M1 - 100473
ER -