Set replication in Hadoop

I was trying loading file using hadoop API as an experiment.

I want to set replication to minimum as this one is for experiment. I first tried this with FileSystem.setReplication():

Configuration config = new Configuration();

config.set("fs.defaultFS","hdfs://192.168.248.166:8020");

FileSystem dfs2 = FileSystem.get(config);

Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");

Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");

dfs2.copyFromLocalFile(src2, dst2);

dfs2.setReplication(dst2, (short)1);  /**setting replication**/

The replica was shown as 1, but it was available on 3 datanodes.

When I tried it with Configuration.set():

Configuration config = new Configuration();

config.set("fs.defaultFS","hdfs://192.168.248.166:8020");

config.set("dfs.replication", "1");  /**setting replication**/

FileSystem dfs2 = FileSystem.get(config);

Path src2 = new Path("C:\\Users\\abc\\Desktop\\testfile.txt");

Path dst2 = new Path(dfs2.getWorkingDirectory()+"/tempdir");

This gave the desired outcome (1 replica available on 1 datanode)

Why there are two APIs for the same thing? What is the difference between these two?

The difference is that Filesystem's setReplication() sets the replication of an existing file on HDFS. In your case, you first copy the local file testFile.txt to HDFS, using the default replication factor (3) and then change the replication factor of this file to 1. After this command, it takes a while until the over-replicated blocks get deleted. (source)

On the other hand, when you use the config.set("dfs.replication", "1"); command to set the replication, you can copy the local file after that, so its blocks get copied just once, from the first time.

In other words, I believe (but I might be wrong) that both commands have the same final result, but you have to wait a little bit until the first one is carried out.

Set replication in Hadoop的更多相关文章

Hadoop学习之路（十）HDFS API的使用
HDFS API的高级编程 HDFS的API就两个:FileSystem 和Configuration 1.文件的上传和下载 package com.ghgj.hdfs.api; import org ...
跟着我一起学习大数据——Hadoop
hadoop配置文件:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.9.0/ 一:Hadoop简介总结下起源于Nutch项目,社区 ...
[转载] 首席工程师揭秘：LinkedIn大数据后台是如何运作的？(一）
本文作者:Jay Kreps,linkedin公司首席工程师:文章来自于他在linkedin上的分享:原文标题:The Log: What every software engineer should ...
我的简历 PHP Java C# 技术总监
石先生 ID:303321266 目前正在找工作 13611326258 hr_msn@163.com 男|32 岁 (1985/08/06)|现居住北京-海淀区|12年工作经验 ...
云计算分布式大数据Hadoop实战高手之路第七讲Hadoop图文训练课程：通过HDFS的心跳来测试replication具体的工作机制和流程
这一讲主要深入使用HDFS命令行工具操作Hadoop分布式集群,主要是通过实验的配置hdfs-site.xml文件的心跳来测试replication具体的工作和流程. 通过HDFS的心跳来测试repl ...
[大牛翻译系列]Hadoop（2）MapReduce 连接：复制连接（Replication join）
4.1.2 复制连接(Replication join) 复制连接是map端的连接.复制连接得名于它的具体实现:连接中最小的数据集将会被复制到所有的map主机节点.复制连接有一个假设前提:在被连接的数 ...
Data Replication in a Multi-Cloud Environment using Hadoop & Peer-to-Peer technologies
http://fbevmware.blogspot.com/2013/12/data-replication-in-multi-cloud.html 要FQ... —————————————————— ...
Hadoop 50090端口的页面， Replication的数字是真实的文件备份数吗？（不是）
红色方框的部分,代表Hadoop系统,人工设定的文件备份数,但不是实际的备份数.文件备份数不会大于集群机器的总数目(因为备份文件不会同时存在一台机器上,这样就没有意义),所以如果总集群数目是2,即使 ...
hadoop 3.x Replication与Availability不一致
看下面的文字前先确保你的Replication值不大于你设置的虚拟机数量如图,显示的副本数为3,但是实际可用的只有一台机器,查看了下hadoop003,hadoop004两台机器,果然没有存储数据, ...

随机推荐

C# 获取一定区间的随即数 0、1两个值除随机数以外的取值方法(0、1两个值被取值的概率相等)
获取随机数举例:0-9 Random random = new Random(); int j = random.Next(0, 9); 0.1两个值被取值的概率相等 int a = Math.Ab ...
Postgresql ODBC驱动，用sqlserver添加dblink跨库访问postgresql数据库
在同样是SQLserver数据库跨库访问时,只需要以下方法 declare @rowcount int set @rowcount =(select COUNT(*) from sys.servers ...
腾讯防水墙(滑动验证码)的简单使用 https://007.qq.com
在线体验:https://007.qq.com/online.html 快速开始:https://007.qq.com/quick-start.html 简单使用: 1. 引入 JS <scri ...
C# Json反序列化
Json反序列化有两种方式[本人],一种是生成实体的,方便处理大量数据,复杂度稍高,一种是用匿名类写,方便读取数据,较为简单. 使用了Newtonsoft.Json,可以自行在nuget中导入 Jso ...
4.1 explain 之 id
一.id 是什么 select 查询的序列化,包含一组数字,表示查询中执行select子句或操作的顺序二.三种情况 a. id相同,执行顺序由上至下 b. 如果是子查询,id的序号会递增,id值越大 ...
1.写页面 2.css的继承属性有哪些 3.margin对布局的影响
1. sparent 透明的 2. placeholder 提示语写页面 1.搞清结构层次 2. 保证模块化让它们之间不能收到影响. (1) 元素性质 (2)标准流浮动带来的脱离文档流撑不起父级 ...
RNP项目遇到的坑
1.nginx问题和前端约定了在header中存放登录态k-v,选择的key是带下划线的. nginx 默认会丢弃带下划线的 header. 设置 underscores_in_headers on ...
谈谈 final finally finalize 区别
声明本篇所涉及的提问,正文的知识点,全都来自于杨晓峰的<Java核心技术36讲>,当然,我并不会全文照搬过来,毕竟这是付费的课程,应该会涉及到侵权之类的问题. 所以,本篇正文中的知识点, ...
touch.js 拖动、缩放、旋转（鼠标手势）
可以实现手势操作:拖动.缩放.旋转.封装好的脚本方法是这样的: var cat = window.cat || {}; cat.touchjs = { left: 0, top: 0, scaleVa ...
Mybatis 同时传入多个参数和对象
流程 1,mapper 接口文件使用 @param 注解(一个参数就不用使用注解,多个参数要么使用注解,要么使用数组的方式取值) 2,mapper xml 文件使用 mapper 接口文件传参 pub ...

Set replication in Hadoop

Set replication in Hadoop的更多相关文章

随机推荐

热门专题