针对概念分解(Concept Factorization, CF)算法没有同时考虑样本中存在的类别信息及数据间多元几何结构信息的问题，该文提出一种基于超图正则化受限的概念分解(Hyper-graph regularized Constrained Concept Factorization, HCCF)算法。HCCF算法通过构建一个无向加权的拉普拉斯超图正则项，提取数据间的多元几何结构信息，克服了传统图模型只能表达数据间成对关系的缺陷；同时采用硬约束的方式使样本的类别信息在低维空间中保持一致，充分利用了标记样本的类别信息。该文采用乘性迭代的方法求解HCCF算法的目标函数并证明了其收敛性。在TDT2库、Reuters库和PIE库上的实验结果表明，HCCF算法提高了聚类的准确率和归一化互信息，验证了算法的有效性。
The Concept Factorization (CF) algorithm can not take into account the label information and the multi-relationship of samples simultaneously. In this paper, a novel algorithm called Hyper-graph regularized Constrained Concept Factorization (HCCF) is proposed, which extracts the multi-geometry information of samples by constructing an undirected weighted hyper-graph Laplacian regularize term, hence overcomes the deficiency that traditional graph model expresses pair-wise relationship only. Meanwhile, HCCF takes full advantage of the label information of labeled samples as hard constraints, and it preserves label consistent in low-dimensional space. The objective function of HCCF is solved by the iterative multiplicative updating algorithm and its convergence is also proved. The experimental results on TDT2, Reuters, and PIE data sets show that the proposed approach achieves better clustering performance in terms of accuracy and normalized mutual information, and the effectiveness of the proposed approach is verified.