基于纠删码的分布式内存数据更新方案设计(任务书,开题报告,论文25000字)
摘要
随着“互联网+”时代的到来,网络环境下每十八个月新生成的数据量约为过去生成数据量的总和。数据量的爆炸性增长,对存储系统的I/O性能、数据容错性、存储安全性等指标提出了更高的要求。传统的集中式存储系统已经无法满足存储海量数据的需求,分布式存储系统应运而生。
为了增强分布式存储系统的访问性能,一种技术趋势将越来越多数据缓存在内存中,从而构成分布式内存系统。随着分布式内存规模的不断扩大,内存中驻留的数据也越来越多。但断电、火灾、硬件故障、病毒攻击等等事故可能导致存储节点失效从而使得整个系统数据丢失风险增大。出于内存数据容错性考虑,本文提出了“内存奇偶校验+外存单副本”的数据容错方案,以数据分块D0和D1的存储为例,内存校验数据分块P由源数据分块D0和D1异或得到,外存则存放一份源数据分块D0和D1的副本,利用这种放置方案,不仅能支持内存数据容错,而且能提高内存空间利用率。
本文通过与另外一种内存数据容错方案——“内存双副本+外存单副本”进行对比分析,从时间和空间开销的角度,验证本文“内存奇偶校验+外存单副本”方案在内存数据一致性方面的优越性,实验中两者内存数据更新时间基本相近,而内存空间利用率则从50%提高至66.7%,证明此内存数据容错方案能够有效的增强整体性能。
关键词:纠删码;分布式内存;数据容错;数据一致性
Abstract
With the coming of the “Internet+” age, in the network environment it is the fact that during every eighteen months the amount of newly generated data is approximately equivalent to the sum of the amount of generated data in the past. With the explosive growth of the amount of data, it puts forward higher requirements for the storage system in terms of the I/O performance, data fault tolerance, storage security and so on. The traditional centralized storage system has been unable to meet the needs of storing the massive data, the distributed storage system came into being.
In order to enhance the access performance of distributed storage system, it is a technology trend that more and more data is cached in memory, thereby forming a distributed memory system. With the continuous expansion of the scale of distributed memory, the data in memory becomes more and more. However, the accident of power failure, fire, hardware failure, virus attack and so on could cause the storage nodes failure so that the risk of data loss is increased in the entire system. For the considerations of fault tolerance of the data in memory, the paper proposes a data fault-tolerant solution called as “memory parity + external memory single copy” , in data block D0 and D1 storage case, the memory parity data block P is got by the xor operation of the source data block D0 and D1, a copy of the source data block D0 and D1 is stored in external memory. Using the solution can not only support the fault tolerance of the data in memory, but also improve the space utilization of memory.
In the paper, the solution is compared and analyzed with another fault-tolerant solution of the data in memory called as “memory double copy + external memory single copy”. From the view of time and space overhead, the advantages of the solution in terms of the consistency of the data in memory have been proved. In the experiments both update time of in-memory data is basically the same, but the space utilization of memory can be improved from 50% to 66.7%, which proves that the solution can effectively improve the overall performance.
Key words: Erasure code;Distributed memory;Data fault tolerance;Data consistency
目 录
第一章 绪论 1
1.1 研究背景和意义 1
1.1.1 研究背景 1
1.1.2 研究意义 3
1.1.3 国内外研究现状 3
1.2 本文内容 4
1.3 本文组织结构 5
第二章 技术背景概述与技术方案选择 6
2.1 存储集群下的内存管理机制概述 6
2.1.1 典型内存管理机制分析 6
2.1.2 常见内存管理器概述 9
2.2 纠删码概述 12
2.2.1 纠删码与完全副本比较 12
2.2.2 纠删码原理 12
2.2.3 典型纠删码 13
2.2.4 RS纠删码分类 13
2.3 内存数据更新方案 15
2.3.1小写更新 15
2.3.2 DUM更新方案 15
2.3.3 PUM更新方案 15
2.3.4两种方案分析比较 16
2.4 Unix网络编程概述 17
2.4.1 文件I/O 17
2.4.2 线程 17
2.4.3 Socket原理及分析 17
第三章 分布式内存数据容错模型设计与实现 20
3.1内存双副本+外存单副本容错模型 20
3.1.1 系统初始化过程 20
3.1.2 内存数据更新过程 22
3.2 内存奇偶校验+外存单副本容错模型 25
3.2.1 系统初始化过程 25
3.2.2 内存数据更新过程 27
3.3 方案对比分析 31
第四章 实验测试 33
4.1实验工具 33
4.2实验平台 39
4.3实验参数 40
4.4功能测试 41
4.5性能测试 50
4.6结果分析 55
第五章 总结与展望 57
5.1总结 57
5.2展望 58
参考文献 60
致谢 62 |