HDP2.4安装系列介绍了通过ambari创建hbase集群的过程,但工作中一直采用.net的技术路线,如何去访问基于Java搞的Hbase呢? Hbase提供基于Java的本地API访问,同时扩展了通过 Thrift、Rest 实现Web访问的API。 so 决定开发基于.net的 sdk,通过其提供的 rest webAPI 来访问 hbase, 其中c#与Java的数据交互过程采用protobuf协议。

目录:

  • 参考资料
  • 基本原理
  • c#、java数据交互
  • hbase filter 实体
  • WebRequester
  • hbaseClient

参考资料:

基本原理:

  • HBase Rest 是建立在HBase java 客户端基础之上提供的web 服务,示意图如下:
  • 可以通过 start /stop 等命令来启动或停止Hbase的 Rest server 服务,如下:
    1. 命令:hbase rest start   (默认的方式启动rest服务,端口是8080)
    2. 命令:hbase rest start 9000 (这种方式以端口9000方式启动)
    3. 命令:hbase-daemon.sh start rest -p 9000
  • 当服务启动的时候,系统内嵌的jetty servlet container启动并部署servlet.服务默认的监听端口为8080,可通过修改hbase 配置文件来替换其它端口。
  • 简单概述需求:将下面列表中的访问及传递参数用c#进行封装
    1. http://192.168.2.21: 为HBase master 对应的IP地址
    2. 8080: 是HBase Rest Server对应的端口
    3. yourTable: 操作HBase 数据库的表名
    4. schema/regions/scanner: 约定关键字

c#与java通过protobuf数据交互:

  • Hbase 为java与其它开发语言通过protobuf进行数据交互制定一个特定的数据结构(见hbase官网REST Protobufs Schema 的结构描述),网上有一堆的工具可根据据protobufs schemal 文件生成java、c#源码。意思是双方都遵守这个数据文件格式,来实现夸平台的数据交互与共享。这个就是做了一个平台无关的文件与平台和语言相关的数据对象之间的适配转化工作,如很多xml解析器一样的原理。
  • 协议文件是.proto为后缀的文件,格式如下代码示例
    package org.apache.hadoop.hbase.rest.protobuf.generated;
    
    message TableInfo {
    required string name = ;
    message Region {
    required string name = ;
    optional bytes startKey = ;
    optional bytes endKey = ;
    optional int64 id = ;
    optional string location = ;
    }
    repeated Region regions = ;
    }
    1. package:在Java里面代表这个文件所在的包名,在c#里面代表该文件的命名空间
    2. message:代表一个类;
    3. required: 代表该字段必填;
    4. optional: 代表该字段可选,并可以为其设置默认值
  • 从github上下载window版的转换工具,将解压后包中的ProtoGen.exe.config,protoc.exe,ProtoGen.exe及Google.ProtocolBuffers.dll文件放到某个新建的文件夹( 如:c:\zhu)
  • 将hbase 规定的协议文件同时copy至该目录 (hbase源码包中 \hbase\hbase-rest\src\main\resources\org\apache\hadoop\hbase\rest\protobuf  下的文件)
  • 以TableInfoMessage.proto 为例进行说明, windows系统下打开命令提示符,切换至 c:\zhu 文件夹下
  • 执行:protoc --descriptor_set_out=TableInfoMessage.protobin --include_imports TableInfoMessage.proto
  • 上述命令之后,c:\zhu 文件夹内生成了一个TableInfoMessage.protobin文件
  • 执行:protogen AddressBook.protobin  (目录下会生成名为TableInfoMessage.cs文件,这就是生成的c#源码)
  • 当然你可以写一个批处理命令来执行,完成后生成的9个文件引入到你的Visual studio 工程即可使用。

hbase filter 实体:

  • 在hbase读取数据时设置的过滤参数,参照 (hbase\hbase-client\src\main\java\org\apache\hadoop\hbase\filter)源码,用c#翻译一次
  • 完成后如下图
  •    

WebRequester:

  • 封装http请求 WebRequester 类

    public class WebRequester
    {
    private string url = string.Empty; /// <summary>
    ///
    /// </summary>
    /// <param name="urlString"></param>
    public WebRequester(string urlString)
    {
    this.url = urlString;
    } /// <summary>
    /// Issues the web request.
    /// </summary>
    /// <param name="endpoint">The endpoint.</param>
    /// <param name="method">The method.</param>
    /// <param name="input">The input.</param>
    /// <param name="options">request options</param>
    /// <returns></returns>
    public HttpWebResponse IssueWebRequest(string endpoint, string method, Stream input, RequestOptions options)
    {
    return IssueWebRequestAsync(endpoint, method, input,options).Result;
    } /// <summary>
    /// Issues the web request asynchronous.
    /// </summary>
    /// <param name="endpoint">The endpoint.</param>
    /// <param name="method">The method.</param>
    /// <param name="input">The input.</param>
    /// <param name="options">request options</param>
    /// <returns></returns>
    public async Task<HttpWebResponse> IssueWebRequestAsync(string endpoint, string method, Stream input, RequestOptions options)
    {
    string uri = string.Format("{0}/{1}", this.url, endpoint);
    HttpWebRequest httpWebRequest = HttpWebRequest.CreateHttp(uri);
    httpWebRequest.Timeout = options.TimeoutMillis;
    httpWebRequest.PreAuthenticate = true;
    httpWebRequest.Method = method;
    httpWebRequest.ContentType = options.ContentType; if (options.AdditionalHeaders != null)
    {
    foreach (var kv in options.AdditionalHeaders)
    {
    httpWebRequest.Headers.Add(kv.Key, kv.Value);
    }
    } if (input != null)
    {
    using (Stream req = await httpWebRequest.GetRequestStreamAsync())
    {
    await input.CopyToAsync(req);
    }
    } return (await httpWebRequest.GetResponseAsync()) as HttpWebResponse;
    }
    }
  • http 操作实体类
    public class RequestOptions
    {
    public string AlternativeEndpoint { get; set; }
    public bool KeepAlive { get; set; }
    public int TimeoutMillis { get; set; }
    public int SerializationBufferSize { get; set; }
    public int ReceiveBufferSize { get; set; }
    public bool UseNagle { get; set; }
    public int Port { get; set; }
    public Dictionary<string, string> AdditionalHeaders { get; set; }
    public string AlternativeHost { get; set; }
    public string ContentType { get; set; } public static RequestOptions GetDefaultOptions()
    {
    return new RequestOptions()
    {
    KeepAlive = true,
    TimeoutMillis = ,
    ReceiveBufferSize = * * ,
    SerializationBufferSize = * * ,
    UseNagle = false,
    //AlternativeEndpoint = Constants.RestEndpointBase,
    //Port = 443,
    AlternativeEndpoint = string.Empty,
    Port = ,
    AlternativeHost = null,
    ContentType = "application/x-protobuf"
    };
    } }

hbaseClient:

  • 定义hbase 常用操作接口IHbaseClient(包含基于表的操作以及数据的读写),示例如下

    public interface IHBaseClient
    { /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<org.apache.hadoop.hbase.rest.protobuf.generated.Version> GetVersionAsync(RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="schema"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<bool> CreateTableAsync(TableSchema schema, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task DeleteTableAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableInfo> GetTableInfoAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableSchema> GetTableSchemaAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableList> ListTablesAsync(RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="tableName"></param>
    /// <param name="scannerSettings"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<ScannerInformation> CreateScannerAsync(string tableName, Scanner scannerSettings, RequestOptions options); /// <summary>
    ///
    /// </summary>
    /// <param name="scannerInfo"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<CellSet> ScannerGetNextAsync(ScannerInformation scannerInfo, RequestOptions options); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="cells"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<bool> StoreCellsAsync(string table, CellSet cells, RequestOptions options = null);
    }
  • 实现接口类 HBaseClient
    public class HBaseClient : IHBaseClient
    {
    private WebRequester _requester; private readonly RequestOptions _globalRequestOptions; /// <summary>
    ///
    /// </summary>
    /// <param name="endPoints"></param>
    /// <param name="globalRequestOptions"></param>
    public HBaseClient(string url, RequestOptions globalRequestOptions = null)
    {
    _globalRequestOptions = globalRequestOptions ?? RequestOptions.GetDefaultOptions();
    _requester = new WebRequester(url);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<org.apache.hadoop.hbase.rest.protobuf.generated.Version> GetVersionAsync(RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    return await GetRequestAndDeserializeAsync<org.apache.hadoop.hbase.rest.protobuf.generated.Version>(EndPointType.Version, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="schema"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<bool> CreateTableAsync(TableSchema schema, RequestOptions options = null)
    {
    if (string.IsNullOrEmpty(schema.name))
    throw new ArgumentException("schema.name was either null or empty!", "schema"); var optionToUse = options ?? _globalRequestOptions;
    string endpoint = string.Format("{0}/{1}", schema.name, EndPointType.Schema);
    using (HttpWebResponse webResponse = await PutRequestAsync(endpoint,schema, optionToUse))
    {
    if (webResponse.StatusCode == HttpStatusCode.Created)
    {
    return true;
    } // table already exits
    if (webResponse.StatusCode == HttpStatusCode.OK)
    {
    return false;
    } // throw the exception otherwise
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't create table {0}! Response code was: {1}, expected either 200 or 201! Response body was: {2}",
    schema.name,webResponse.StatusCode,message));
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task DeleteTableAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Schema); using (HttpWebResponse webResponse = await ExecuteMethodAsync<HttpWebResponse>(WebMethod.Delete, endPoint, null, optionToUse))
    {
    if (webResponse.StatusCode != HttpStatusCode.OK)
    {
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't delete table {0}! Response code was: {1}, expected 200! Response body was: {2}",
    table, webResponse.StatusCode, message));
    }
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableInfo> GetTableInfoAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Regions);
    return await GetRequestAndDeserializeAsync<TableInfo>(endPoint, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableSchema> GetTableSchemaAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Schema);
    return await GetRequestAndDeserializeAsync<TableSchema>(endPoint, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableList> ListTablesAsync(RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    return await GetRequestAndDeserializeAsync<TableList>("", optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="tableName"></param>
    /// <param name="scannerSettings"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<ScannerInformation> CreateScannerAsync(string tableName, Scanner scannerSettings, RequestOptions options)
    {
    string endPoint = string.Format("{0}/{1}", tableName, EndPointType.Scanner); using (HttpWebResponse response = await ExecuteMethodAsync(WebMethod.Post, endPoint, scannerSettings, options))
    {
    if (response.StatusCode != HttpStatusCode.Created)
    {
    using (var output = new StreamReader(response.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format( "Couldn't create a scanner for table {0}! Response code was: {1}, expected 201! Response body was: {2}",
    tableName, response.StatusCode, message));
    }
    }
    string location = response.Headers.Get("Location");
    if (location == null)
    {
    throw new ArgumentException("Couldn't find header 'Location' in the response!");
    } return new ScannerInformation(new Uri(location), tableName, response.Headers);
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="scannerInfo"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<CellSet> ScannerGetNextAsync(ScannerInformation scannerInfo, RequestOptions options)
    {
    string endPoint = string.Format("{0}/{1}/{2}", scannerInfo.TableName, EndPointType.Scanner, scannerInfo.ScannerId);
    using (HttpWebResponse webResponse = await GetRequestAsync(endPoint, options))
    {
    if (webResponse.StatusCode == HttpStatusCode.OK)
    {
    return Serializer.Deserialize<CellSet>(webResponse.GetResponseStream());
    } return null;
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="cells"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<bool> StoreCellsAsync(string table, CellSet cells, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string path = table + "/somefalsekey";
    using (HttpWebResponse webResponse = await PutRequestAsync(path, cells, options))
    {
    if (webResponse.StatusCode == HttpStatusCode.NotModified)
    {
    return false;
    } if (webResponse.StatusCode != HttpStatusCode.OK)
    {
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't insert into table {0}! Response code was: {1}, expected 200! Response body was: {2}",
    table, webResponse.StatusCode, message));
    }
    }
    }
    return true;
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="endpoint"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<T> GetRequestAndDeserializeAsync<T>(string endpoint, RequestOptions options)
    {
    using (WebResponse response = await _requester.IssueWebRequestAsync(endpoint, WebMethod.Get, null, options))
    {
    using (Stream responseStream = response.GetResponseStream())
    {
    return Serializer.Deserialize<T>(responseStream);
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="TReq"></typeparam>
    /// <param name="endpoint"></param>
    /// <param name="query"></param>
    /// <param name="request"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> PutRequestAsync<TReq>(string endpoint, TReq request, RequestOptions options)
    where TReq : class
    {
    return await ExecuteMethodAsync(WebMethod.Post, endpoint, request, options);
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="TReq"></typeparam>
    /// <param name="method"></param>
    /// <param name="endpoint"></param>
    /// <param name="request"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> ExecuteMethodAsync<TReq>(string method,string endpoint,TReq request,RequestOptions options) where TReq : class
    {
    using (var input = new MemoryStream(options.SerializationBufferSize))
    {
    if (request != null)
    {
    Serializer.Serialize(input, request);
    }
    input.Seek(, SeekOrigin.Begin);
    return await _requester.IssueWebRequestAsync(endpoint,method, input, options);
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="endpoint"></param>
    /// <param name="query"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> GetRequestAsync(string endpoint, RequestOptions options)
    {
    return await _requester.IssueWebRequestAsync(endpoint, WebMethod.Get, null, options);
    }
    }
  • 按步骤完成上面的代码,编译通过即OK,下一篇进入sdk的测试验证之旅

HBase(一): c#访问hbase组件开发的更多相关文章

  1. HBase(二): c#访问HBase之股票行情Demo

    上一章完成了c#访问hbase的sdk封装,接下来以一个具体Demo对sdk进行测试验证.场景:每5秒抓取指定股票列表的实时价格波动行情,数据下载后,一方面实时刷新UI界面,另一方面将数据放入到在内存 ...

  2. 使用C#通过Thrift访问HBase

    前言 因为项目需要要为客户程序提供C#.Net的HBase访问接口,而HBase并没有提供原生的.Net客户端接口,可以通过启动HBase的Thrift服务来提供多语言支持. Thrift介绍 环境 ...

  3. CDH 6.0.1 版本 默认配置下 HUE | happybase 无法访问 Hbase 的问题

    第一个问题 HUE 无法直接连接到 HBase 在默认配置下 CDH 6.0.1 版本下的 HBase2.0 使用了默认配置 hbase.regionserver.thrift.compact = T ...

  4. Hbase记录-client访问zookeeper大量断开以及参数调优分析(转载)

    1.hbase client配置参数 超时时间.重试次数.重试时间间隔的配置也比较重要,因为默认的配置的值都较大,如果出现hbase集群或者RegionServer以及ZK关掉,则对应用程序是灾难性的 ...

  5. 使用C#和Thrift来访问Hbase实例

    今天试着用C#和Thrift来访问Hbase,主要参考了博客园上的这篇文章.查了Thrift,Hbase的资料,结合博客园的这篇文章,终于搞好了.期间经历了不少弯路,下面我尽量详细的记录下来,免得大家 ...

  6. Pyspark访问Hbase

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl/p/7347167.html 转载请注明出处 记录自己最近抽空折腾虚拟机环境时用spark2.0的pyspark ...

  7. windows平台下用C#访问HBase

    Hadoop中的HBase有多种数据访问方式,ubuntu里可以用hbase shell查看操作hbase数据库,但windows平台下需要用thrift对它进行访问. 例如hadoop安装在/usr ...

  8. JAVA API访问Hbase org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32

    Java使用API访问Hbase报错: 我的hbase主节点是spark1   java代码访问hbase的时候写的是ip 结果运行程序报错 不能够识别主机名 修改主机名     修改主机hosts文 ...

  9. PHP通过thrift2访问HBASE

    前一段时间需要在网页上显示HBASE查询的结果,考虑用PHP来实现,在网上搜了一下,普遍都是用thrift作为接口来实现的.​ 参考博文:​ http://www.cnblogs.com/scotom ...

随机推荐

  1. 获取客户端ip并用正则表达式验证

    代理HTTP_VIA /// <summary> /// 获得请求的ip /// </summary> /// <returns></returns> ...

  2. Objective-C语言介绍 、 Objc与C语言 、 面向对象编程 、 类和对象 、 属性和方法 、 属性和实例变量

    1 第一个OC控制台程序 1.1 问题 Xcode是苹果公司向开发人员提供的集成开发环境(非开源),用于开发Mac OS X,iOS的应用程序.其运行于苹果公司的Mac操作系统下. 本案例要求使用集成 ...

  3. git 上传

    首先明白两个点: git clone diveinedu@192.168.1.254:~/YGYSocket  从服务器上下载项目 divein 服务器密码 nc -l -t 2000  socket ...

  4. JMS生产者+单线程发送-我们到底能走多远系列(29)

    我们到底能走多远系列(29) 扯淡: “然后我俩各自一端/望着大河弯弯/终于敢放胆/嘻皮笑脸/面对/人生的难”      --- <山丘> “迎着风/迎向远方的天空/路上也有艰难/也有那解 ...

  5. sprintf 用法

    字符串格式化命令,主要功能是把格式化的数据写入某个字符串中 试试下面的代码就知道了 #include<cstdio> #include<cstdlib> using names ...

  6. xmind第一天笔记

  7. C函数及指针学习1

    1 大段程序注释的方法 #if 0#endif 2三字母词 以两个问号 开始的都要注意 3 字面值(常量) 在整型号字面值后加 字符L (long),U(unsigned)说明字符常量 为长整型 或( ...

  8. RViz 实时观测机器人建立导航2D封闭空间地图过程 (SLAM) ----27

    原创博客:转载请表明出处:http://www.cnblogs.com/zxouxuewei/ ROS提供了非常强大的图形化模拟环境 RViz,这个 RViz 能做的事情非常多.今天我们学习一下如何使 ...

  9. linux中进程控制

    1.进程标识 每个进程都有一个非负整型表示的唯一的进程ID.进程ID标识符总是唯一的.  虽然进程ID是唯一的,但某个ID被回收后,ID号是可以复用的. ID为0的进程通常是调度进程(其常常被称交换进 ...

  10. java基础之:匿名内部类

    在java提高篇-----详解内部类中对匿名内部类做了一个简单的介绍,但是内部类还存在很多其他细节问题,所以就衍生出这篇博客.在这篇博客中你可以 了解到匿名内部类的使用.匿名内部类要注意的事项.如何初 ...