深圳大学学报理工版

纠删码广泛部署于分布式键值存储系统来保证数据可靠性.通过将用户数据编码并存储到多个存储节点,纠删码存储系统可以在部分节点失效的情况下恢复原始数据.随着存储节点数量的增加,存储节点往往会出现负载不均衡的情况,限制其在高校云计算和信息化领域的应用场景.为解决上述问题,提出大规模纠删码键值存储系统负载均衡方案.通过将逻辑控制和存储功能分离,纠删码存储系统可以高效地确定存储节点的负载状态.为充分利用节点之间网络带宽资源,提出多切片数据编码传输方案.根据用户写入数据量,设计混合数据写入机制来提升数据写入操作的性能.在此基础上,设计了原型纠删码键值存储系统,实际原型系统测试验证了本研究中负载均衡算法的有效性.

Erasure codes are widely used in distributed key-value(KV)storage systems to enhance the data reliability. However, load balance of storage nodes is a well-known challenge when deploying such erasure-coded storage systems in cloud computing and information service scenarios. To solve the above problems, we propose a large-scale load balance scheme for erasure-coded KV storage systems. By adding the control nodes to storage systems, our scheme efficiently obtains current states of the storage nodes. To improve the utilization of network bandwidth, we design a multiple-coded shard transmission proposal. Based on data volume of write requests, we further provide an efficient hybrid writing scheme. Finally, we implement a prototype erasure-coded storage system and conduct extensive experiments to verify the efficiency of our scheme.

引言
1 纠删码存储系统研究现状概述
2 读写性能测量和研究动机
3 纠删码存储结构优化
4 数据写入过程的负载均衡算法
5 实验验证
6 结语

图1 纠删码键值存储系统数据写入过程

图2 存储节点读写测试结果

图3 逻辑控制和存储空间分离的软件架构

图4 两阶段数据写入操作

图5 对象数据读取过程

图6 多切片数据编码传输过程

图7 基于数据量的混合数据写入方案

图8 原型分布式键值存储系统框架

图9 数据量较小情况下,数据读写操作性能

图10 数据量较大情况下,数据读写操作性能

图11 不同逻辑控制节点数据量情况下,数据读写操作性能

[1] SATHIAMOORTHY M, ASTERIS M, PAPAILIOPOULOS D, et al. XORing elephants: novel erasure codes for big data[C]// Proceedings of IEEE International Conference on Very Large Data Bases(VLDB 13). Riva del Garda, Italy: IEEE, 2013, 6: 325-336.
[2] GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system[C]// Proceedings of the 19th ACM Symposium on Operating Systems Principles. New York, USA: ACM, 2003, 37: 29-43.
[3] REED I S, SOLOMON G. Polynomial codes over certain finite fields[J]. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(2): 300-304.
[4] MU Shuai, CHEN Kang, WU Yongwei, et al. When paxos meets erasure code: reduce network and storage cost in state machine replication[C]// Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. Vancouver, Canada: ACM, 2014: 61-72.
[5] 魏学才,宫庆媛,沈佳杰,等.适应冷热数据存储的多编码架构的设计与实证[J].计算机应用与软件,2017,34(2):35-41.
[6] PLANK J S, XU L. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications[C]// Proceedings of the 5th IEEE International Symposium on Network Computing and Applications(NCA 06). Cambridge, USA: IEEE, 2006: 173-180.
[7] ZHANG Guangyan, WU Guiyong, WANG Shupeng, et al. CaCo: an efficient Cauchy coding approach for cloud storage systems[J]. IEEE Transactions on Computers(TC), IEEE, 2016, 65(2): 435-447.
[8] HUANG Jianzhong, LIANG Xianhai, QIN Xiao, et al. PUSH: a pipelined reconstruction I/O for erasure-coded storage clusters[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 26(2): 516-526.
[9] XU Lihao, BRUCK J. X-code: MDS array codes with optimal encoding[J]. IEEE Transactions on Information Theory, 1999, 45(1): 272-276.
[10] HAFNER J L. HoVer erasure codes for disk arrays[C]// Proceedings of the 36th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN 06). Philadelphia, USA: IEEE, 2006: 217-226.
[11] HUANG Cheng, XU Lihao. STAR: an efficient coding scheme for correcting triple storage node failures[J]. IEEE Transactions on Computers, 2007, 57(7): 889-901.
[12] YAO Jie, JIANG Hong, CAO Qiang, et al. Elastic-RAID: a new architecture for improved availability of parity-based RAIDs by elastic mirroring[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(4): 1044-1056.
[13] DIMAKIS A G, GODFREY P B, WU Y, et al. Network coding for distributed storage systems[J]. IEEE Transactions on Information Theory, 2010, 56(9): 4539-4551.
[14] KERMARREC A M, LE SCOUARNEC N, STRAUB G. Repairing multiple failures with coordinated and adaptive regenerating codes[C]// Proceedings of International Symposium on Network Coding. Beijing,: IEEE, 2011: 1-6.
[15] RESCH J K, PLANK J S. AONT-RS: Blending security and performance in dispersed storage systems[C]// Proceedings of the 9th USENIX Conference on File and Storage Technologies. San Jose, USA: USENIX, 2011: 191-202.
[16] ULUYOL M, HUANG A, GOEL A, et al. Near-optimal latency versus cost tradeoffs in geo-distributed storage[C]// Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation. Santa Clara, USA: USENIX, 2012: 157-180.
[17] HUANG C, SIMITCI H, XU Y, et al. Erasure coding in windows azure storage[C]// Proceedings of the 2012 USENIX Annual Technical Conference(ATC 12). Boston, USA: USENIX, 2012: 15-26.
[18] CHEN H C, HU Yuchong, LEE P P C, et al. NCCloud: a network-coding-based storage system in a cloud-of-clouds[J]. IEEE Transactions on Computers, IEEE, 2014, 63(1): 31-44.
[19] SHEN Jiajie, LI Yi, ZHOU Yangfan, et al. Mobile cloud-of-clouds storage made efficient: a network coding based approach[C]// Proceedings of 2018 IEEE 37th Symposium on Reliable Distributed Systems. Bahia, Brazil: IEEE, 2018: 72-82.
[20] WANG Z, LI T, WANG H, et al. CRaft: An erasure-coding-supported version of raft for reducing storage cost and network cost[C]// Proceedings of the 18th USENIX Conference on File and Storage Technologies. Santa Clara, USA: USENIX 2020: 297-307.
[21] XIANG Liping, XU Yinlong, LUI J C S, et al. Optimal recovery of single disk failure in rdp code storage systems[C]// Proceedings of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York, USA: ACM, 2010: 119-130.
[22] WANG Yan, YIN Xunrui, WANG Xin. MDR codes: a new class of raid-6 codes with optimal rebuilding and encoding[J]. IEEE Journal on Selected Areas in Communications, 2014, 32(5): 1008-1018.
[23] ZHANG Guangyan, HUANG Zican, MA Xiaosong, et al. RAID+: deterministic and balanced data distribution for large disk enclosures[C]// Proceedings of the 16th USENIX Conference on File and Storage Technologies. Oakland, USA:USENIX, 2018: 279-294.
[24] SHEN Zhirong, LEE P P, SHU Jiwu, et al. Correlation-aware stripe organization for efficient writes in erasure-coded storage systems[C]// Proceedings of the 36th Symposium on Reliable Distributed Systems. Hong Kong, China: IEEE, 2017: 134-143.
[25] LI Mingqiang, QIN Chuan, LEE P P C, et al. Convergent dispersal: toward storage-efficient security in a cloud-of-clouds[C]// Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems. Philadelphia, USA: USENIX, 2014: 1-5.
[26] LI Mingqiang, QIN Chuan, LEE P P C. CD store: toward reliable, secure, and cost-efficient cloud storage via convergent dispersal[C]// Proceedings of 2015 USENIX Annual Technical Conference. Denver, USA: USENIX, 2015: 111-124.

备注

引言

1 纠删码存储系统研究现状概述

2 读写性能测量和研究动机

3 纠删码存储结构优化

4 数据写入过程的负载均衡算法

5 实验验证

6 结语

期刊信息

备注

引言