深圳大学学报理工版

网络借贷领域中的欺诈检测是根据收集到的用户历史交易数据等信息,来判断该用户是欺诈用户还是正常用户.现有方法认为用户是独立存在的,忽略了用户之间的关联信息.考虑到目前欺诈逐渐成为群体行为,在欺诈网络内呈现出欺诈节点与非欺诈节点关联稀疏,而欺诈节点间关联紧密的现象,提出基于标签传播的协同分类欺诈检测方法.通过收集真实网上借贷公司的用户通话数据,构建用户之间的通话关联网络,利用标签传播算法扩散欺诈节点的标签信息,确定未知标签节点是否为欺诈用户.通过对权重进行幂操作,改进了标签传播算法中概率转移矩阵的初始化方法,使其适应欺诈场景下正负样本分布不平衡的现象.在有标签样本比例极低且训练样本分布不均衡的真实借贷数据集中进行了7次测试,采用所提算法检测到欺诈用户的精确率最高达17%,所得F1值与精确率都比经典的WvRn算法更优.

In the field of online lending, the key problem for fraud detection is how to judge whether the user is a fraudster or a normal user based on the collected historical transaction data of the user. At present, the representative research methods treat any user as an independent node and ignore the related information among users. Considering that the fraud is gradually becoming a group behavior, the relationships among fraud nodes and non-fraud nodes are sparse in social networks, and the relationships among fraud nodes are closely related, we propose a collective classification fraud detection method with label propagation. A call-records-based user association network is constructed based on the phone call records between users of online lending company, and we use the label propagation algorithm to spread the label information of fraud node to determine whether the unlabeled node is a fraudulent user. In addition, we improve the initialization method of transition probability matrixin label propagation algorithm by the operation of weights powering to avoid the performance degradation of label propagation algorithm caused by the unbalanced distribution of fraud data. Finally, the validation experiment is conducted in a real loan data set with a very low proportion of labeled samples and unbalanced training sample distribution. By using the proposed method in this article, the accuracy rate of fraud user detection reaches 17%, and the F1 value and accuracy rate are both better than those of the classic WvRn algorithm.

引言
1 研究背景
2 关联网络的构建与分析
3 算法及改进
4 实验与验证
5 结语

图1 传播示例图<br/>Fig.1 An illustrative example of propagation

图1 传播示例图
Fig.1 An illustrative example of propagation

图2 ICA算法步骤<br/>Fig.2 Process of ICA algorithm

图2 ICA算法步骤
Fig.2 Process of ICA algorithm

图3 吉布斯采样步骤<br/>Fig.3 Process of Gibbs sampling

图3 吉布斯采样步骤
Fig.3 Process of Gibbs sampling

表1 通话关系格式<br/>Table 1 The sample of call data

表1 通话关系格式
Table 1 The sample of call data

图4 标签传播算法框架图
Fig.4 Process of LPA

表2 不同n下的实验结果1)<br/>Table 2 Result of different n

表2 不同n下的实验结果1)
Table 2 Result of different n

表3 多轮试验下的用户标签分布状况以及最终预测结果<br/>Table 3 User distribution and Pfraud under multiple rounds of testing

表3 多轮试验下的用户标签分布状况以及最终预测结果
Table 3 User distribution and Pfraud under multiple rounds of testing

图5 采用ULPA进行欺诈检测时不同迭代次数下的Pfraud和F1值<br/>Fig.5 Pfraud and F1 from ULPA under different iterations

图5 采用ULPA进行欺诈检测时不同迭代次数下的Pfraud和F1值
Fig.5 Pfraud and F1 from ULPA under different iterations

图6 采用ULPA进行欺诈检测的ROC曲线和AUC值<br/>Fig.6 ROC curve and AUC from ULPA

图6 采用ULPA进行欺诈检测的ROC曲线和AUC值
Fig.6 ROC curve and AUC from ULPA

图7 采用WvRn算法和ULPA进行欺诈检测的F1值对比<br/>Fig.7 F1 from comparison of WvRn and ULPA under multiple rounds of testing

图7 采用WvRn算法和ULPA进行欺诈检测的F1值对比
Fig.7 F1 from comparison of WvRn and ULPA under multiple rounds of testing

图8 采用WvRn算法和ULPA进行欺诈检测的Pfraud对比<br/>Fig.8 Pfraud from comparison of WvRn and ULPA under multiple rounds of testing

图8 采用WvRn算法和ULPA进行欺诈检测的Pfraud对比
Fig.8 Pfraud from comparison of WvRn and ULPA under multiple rounds of testing

[1] CALDERA J, HAIN J M, SHERLOCK K. Enhanced automated anti-fraud and anti-money-laundering payment system: U. S. Patent Application 14/846, 169[P]. 2016- 03-10.
[2] SARNO R, DEWANDONO R D, AHMAD T, et al. Hybrid association rule learning and process mining for fraud detection[J]. IAENG International Journal of Computer Science, 2015, 42(2): 59-72.
[3] WANG Hao, WANG Zonghu, ZHANG Bin, et al. Information collection for fraud detection in P2P financial market[C]// The 2nd International Conference on Material Engineering and Advanced Manufacturing Technology.[S. l.]: EDP Sciences, 2018, 189: 06006.
[4] FERNANDEZ A. Artificial intelligence in financial services[J]. Banco de Espana Article, 2019, 7: 19.
[5] AHMED M, MAHMOOD A N, ISLAM M R. A survey of anomaly detection techniques in financial domain[J]. Future Generation Computer Systems, 2016, 55: 278-288.
[6] BAESENS B, Van VLASSELAER V, VERBEKE W. Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection[M]. Hoboken, USA: John Wiley & Sons, Inc, 2015.
[7] ITOO F, Meenakshi, SINGH S. Comparison and analysis of logistic regression, Naïve Bayes and kNN machine learning algorithms for credit card fraud detection[J]. International Journal of Information Technology.(2020- 02-15). https://doi.org/10.1007/s41870- 020- 00430-y.
[8] SINGH N, LAI K H, VEJVAR M, et al. Data-driven auditing: a predictive modeling approach to fraud detection and classification[J]. Journal of Corporate Accounting & Finance, 2019, 30(3): 64-82.
[9] CARNEIRO N, FIGUEIRA G, COSTA M. A data mining based system for credit-card fraud detection in e-tail[J]. Decision Support Systems, 2017, 95: 91-101.
[10] FU Kang, CHENG Dawei, TU Yi, et al. Credit card fraud detection using convolutional neural networks[C]// Proceedings of the 23th International Conference on Neural Information Processing. Kyoto, Japan: Springer, 2016: 483- 490.
[11] ZAKARYAZAD A, DUMAN E. A profit-driven artificial neural network(ANN)with applications to fraud detection and direct marketing[J]. Neurocomputing, 2016, 175: 121-131.
[12] ZHANG Zhaohui, ZHOU Xinxin, ZHANG Xiaobo, et al. A model based on convolutional neural network for online transaction fraud detection[J]. Security and Communication Networks, 2018(2): 1-9.
[13] SAVE P, TIWAREKAR P, JAIN K N, et al. A novel idea for credit card fraud detection using decision tree[J]. International Journal of Computer Applications, 2017, 161(13): 6-9.
[14] XUAN Shiyang, LIU Guanjun, LI Zhenchuan, et al. Random forest for credit card fraud detection[C]// The 15th International Conference on Networking, Sensing and Control(ICNSC). Zhuhai, China: IEEE, 2018: 1- 6.
[15] KHARE N, VISWANATHAN P. Decision tree-based fraud detection mechanism by analyzing uncertain data in banking system[M]// Emerging Research in Data Engineering Systems and Computer Communications. Singapore: Springer, 2020: 79-90.
[16] MACSKASSY S A, PROVOST F. Classification in networked data: a toolkit and a univariate case study[J]. Journal of Machine Learning Research, 2007, 8(1): 935-983.
[17] JENSEN D, NEVILLE J, GALLAGHER B. Why collective inference improves relational classification[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, USA: Association for Computing Machinery, 2004: 593-598.
[18] MCDOWELL L K, GUPTA K M, AHA D W. Cautious inference in collective classification[C]// Proceedings of the 22nd National Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2007: 596- 601.
[19] GREGORY S. Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2010, 12(10): 103018.
[20] PENG Lu, LIN Rongheng. Fraud phone calls analysis based on label propagation community detection algorithm[C]// IEEE World Congress on Services. San Francisco, USA: IEEE, 2018: 23-24.
[21] CUI Haoyi, LI Qingzhong, LI Hui, et al. Healthcare fraud detection based on trustworthiness of doctors[C]// IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Tianjin, China: IEEE, 2016: 74-81.
[22] KOHLI P, LADICKY L, TORR P H S. Robust higher order potentials for enforcing label consistency[J]. International Journal of Computer Vision, 2009, 82(3): 302-324.
[23] PARK J, BARABÁSI A L. Distribution of node characteristics in complex networks[J]. Proceedings of the National Academy of Sciences, 2007, 104(46): 17916-17920.

备注

引言

1 研究背景

2 关联网络的构建与分析

3 算法及改进

4 实验与验证

5 结语

期刊信息

备注