HDP2.4安装系列介绍了通过ambari创建hbase集群的过程,但工作中一直采用.net的技术路线,如何去访问基于Java搞的Hbase呢? Hbase提供基于Java的本地API访问,同时扩展了通过 Thrift、Rest 实现Web访问的API。 so 决定开发基于.net的 sdk,通过其提供的 rest webAPI 来访问 hbase, 其中c#与Java的数据交互过程采用protobuf协议。

目录:

  • 参考资料
  • 基本原理
  • c#、java数据交互
  • hbase filter 实体
  • WebRequester
  • hbaseClient

参考资料:

基本原理:

  • HBase Rest 是建立在HBase java 客户端基础之上提供的web 服务,示意图如下:
  • 可以通过 start /stop 等命令来启动或停止Hbase的 Rest server 服务,如下:
    1. 命令:hbase rest start   (默认的方式启动rest服务,端口是8080)
    2. 命令:hbase rest start 9000 (这种方式以端口9000方式启动)
    3. 命令:hbase-daemon.sh start rest -p 9000
  • 当服务启动的时候,系统内嵌的jetty servlet container启动并部署servlet.服务默认的监听端口为8080,可通过修改hbase 配置文件来替换其它端口。
  • 简单概述需求:将下面列表中的访问及传递参数用c#进行封装
    1. http://192.168.2.21: 为HBase master 对应的IP地址
    2. 8080: 是HBase Rest Server对应的端口
    3. yourTable: 操作HBase 数据库的表名
    4. schema/regions/scanner: 约定关键字

c#与java通过protobuf数据交互:

  • Hbase 为java与其它开发语言通过protobuf进行数据交互制定一个特定的数据结构(见hbase官网REST Protobufs Schema 的结构描述),网上有一堆的工具可根据据protobufs schemal 文件生成java、c#源码。意思是双方都遵守这个数据文件格式,来实现夸平台的数据交互与共享。这个就是做了一个平台无关的文件与平台和语言相关的数据对象之间的适配转化工作,如很多xml解析器一样的原理。
  • 协议文件是.proto为后缀的文件,格式如下代码示例
    package org.apache.hadoop.hbase.rest.protobuf.generated;
    
    message TableInfo {
    required string name = ;
    message Region {
    required string name = ;
    optional bytes startKey = ;
    optional bytes endKey = ;
    optional int64 id = ;
    optional string location = ;
    }
    repeated Region regions = ;
    }
    1. package:在Java里面代表这个文件所在的包名,在c#里面代表该文件的命名空间
    2. message:代表一个类;
    3. required: 代表该字段必填;
    4. optional: 代表该字段可选,并可以为其设置默认值
  • 从github上下载window版的转换工具,将解压后包中的ProtoGen.exe.config,protoc.exe,ProtoGen.exe及Google.ProtocolBuffers.dll文件放到某个新建的文件夹( 如:c:\zhu)
  • 将hbase 规定的协议文件同时copy至该目录 (hbase源码包中 \hbase\hbase-rest\src\main\resources\org\apache\hadoop\hbase\rest\protobuf  下的文件)
  • 以TableInfoMessage.proto 为例进行说明, windows系统下打开命令提示符,切换至 c:\zhu 文件夹下
  • 执行:protoc --descriptor_set_out=TableInfoMessage.protobin --include_imports TableInfoMessage.proto
  • 上述命令之后,c:\zhu 文件夹内生成了一个TableInfoMessage.protobin文件
  • 执行:protogen AddressBook.protobin  (目录下会生成名为TableInfoMessage.cs文件,这就是生成的c#源码)
  • 当然你可以写一个批处理命令来执行,完成后生成的9个文件引入到你的Visual studio 工程即可使用。

hbase filter 实体:

  • 在hbase读取数据时设置的过滤参数,参照 (hbase\hbase-client\src\main\java\org\apache\hadoop\hbase\filter)源码,用c#翻译一次
  • 完成后如下图
  •    

WebRequester:

  • 封装http请求 WebRequester 类

    public class WebRequester
    {
    private string url = string.Empty; /// <summary>
    ///
    /// </summary>
    /// <param name="urlString"></param>
    public WebRequester(string urlString)
    {
    this.url = urlString;
    } /// <summary>
    /// Issues the web request.
    /// </summary>
    /// <param name="endpoint">The endpoint.</param>
    /// <param name="method">The method.</param>
    /// <param name="input">The input.</param>
    /// <param name="options">request options</param>
    /// <returns></returns>
    public HttpWebResponse IssueWebRequest(string endpoint, string method, Stream input, RequestOptions options)
    {
    return IssueWebRequestAsync(endpoint, method, input,options).Result;
    } /// <summary>
    /// Issues the web request asynchronous.
    /// </summary>
    /// <param name="endpoint">The endpoint.</param>
    /// <param name="method">The method.</param>
    /// <param name="input">The input.</param>
    /// <param name="options">request options</param>
    /// <returns></returns>
    public async Task<HttpWebResponse> IssueWebRequestAsync(string endpoint, string method, Stream input, RequestOptions options)
    {
    string uri = string.Format("{0}/{1}", this.url, endpoint);
    HttpWebRequest httpWebRequest = HttpWebRequest.CreateHttp(uri);
    httpWebRequest.Timeout = options.TimeoutMillis;
    httpWebRequest.PreAuthenticate = true;
    httpWebRequest.Method = method;
    httpWebRequest.ContentType = options.ContentType; if (options.AdditionalHeaders != null)
    {
    foreach (var kv in options.AdditionalHeaders)
    {
    httpWebRequest.Headers.Add(kv.Key, kv.Value);
    }
    } if (input != null)
    {
    using (Stream req = await httpWebRequest.GetRequestStreamAsync())
    {
    await input.CopyToAsync(req);
    }
    } return (await httpWebRequest.GetResponseAsync()) as HttpWebResponse;
    }
    }
  • http 操作实体类
    public class RequestOptions
    {
    public string AlternativeEndpoint { get; set; }
    public bool KeepAlive { get; set; }
    public int TimeoutMillis { get; set; }
    public int SerializationBufferSize { get; set; }
    public int ReceiveBufferSize { get; set; }
    public bool UseNagle { get; set; }
    public int Port { get; set; }
    public Dictionary<string, string> AdditionalHeaders { get; set; }
    public string AlternativeHost { get; set; }
    public string ContentType { get; set; } public static RequestOptions GetDefaultOptions()
    {
    return new RequestOptions()
    {
    KeepAlive = true,
    TimeoutMillis = ,
    ReceiveBufferSize = * * ,
    SerializationBufferSize = * * ,
    UseNagle = false,
    //AlternativeEndpoint = Constants.RestEndpointBase,
    //Port = 443,
    AlternativeEndpoint = string.Empty,
    Port = ,
    AlternativeHost = null,
    ContentType = "application/x-protobuf"
    };
    } }

hbaseClient:

  • 定义hbase 常用操作接口IHbaseClient(包含基于表的操作以及数据的读写),示例如下

    public interface IHBaseClient
    { /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<org.apache.hadoop.hbase.rest.protobuf.generated.Version> GetVersionAsync(RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="schema"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<bool> CreateTableAsync(TableSchema schema, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task DeleteTableAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableInfo> GetTableInfoAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableSchema> GetTableSchemaAsync(string table, RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<TableList> ListTablesAsync(RequestOptions options = null); /// <summary>
    ///
    /// </summary>
    /// <param name="tableName"></param>
    /// <param name="scannerSettings"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<ScannerInformation> CreateScannerAsync(string tableName, Scanner scannerSettings, RequestOptions options); /// <summary>
    ///
    /// </summary>
    /// <param name="scannerInfo"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<CellSet> ScannerGetNextAsync(ScannerInformation scannerInfo, RequestOptions options); /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="cells"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    Task<bool> StoreCellsAsync(string table, CellSet cells, RequestOptions options = null);
    }
  • 实现接口类 HBaseClient
    public class HBaseClient : IHBaseClient
    {
    private WebRequester _requester; private readonly RequestOptions _globalRequestOptions; /// <summary>
    ///
    /// </summary>
    /// <param name="endPoints"></param>
    /// <param name="globalRequestOptions"></param>
    public HBaseClient(string url, RequestOptions globalRequestOptions = null)
    {
    _globalRequestOptions = globalRequestOptions ?? RequestOptions.GetDefaultOptions();
    _requester = new WebRequester(url);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<org.apache.hadoop.hbase.rest.protobuf.generated.Version> GetVersionAsync(RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    return await GetRequestAndDeserializeAsync<org.apache.hadoop.hbase.rest.protobuf.generated.Version>(EndPointType.Version, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="schema"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<bool> CreateTableAsync(TableSchema schema, RequestOptions options = null)
    {
    if (string.IsNullOrEmpty(schema.name))
    throw new ArgumentException("schema.name was either null or empty!", "schema"); var optionToUse = options ?? _globalRequestOptions;
    string endpoint = string.Format("{0}/{1}", schema.name, EndPointType.Schema);
    using (HttpWebResponse webResponse = await PutRequestAsync(endpoint,schema, optionToUse))
    {
    if (webResponse.StatusCode == HttpStatusCode.Created)
    {
    return true;
    } // table already exits
    if (webResponse.StatusCode == HttpStatusCode.OK)
    {
    return false;
    } // throw the exception otherwise
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't create table {0}! Response code was: {1}, expected either 200 or 201! Response body was: {2}",
    schema.name,webResponse.StatusCode,message));
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task DeleteTableAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Schema); using (HttpWebResponse webResponse = await ExecuteMethodAsync<HttpWebResponse>(WebMethod.Delete, endPoint, null, optionToUse))
    {
    if (webResponse.StatusCode != HttpStatusCode.OK)
    {
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't delete table {0}! Response code was: {1}, expected 200! Response body was: {2}",
    table, webResponse.StatusCode, message));
    }
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableInfo> GetTableInfoAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Regions);
    return await GetRequestAndDeserializeAsync<TableInfo>(endPoint, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableSchema> GetTableSchemaAsync(string table, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string endPoint = string.Format("{0}/{1}", table, EndPointType.Schema);
    return await GetRequestAndDeserializeAsync<TableSchema>(endPoint, optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<TableList> ListTablesAsync(RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    return await GetRequestAndDeserializeAsync<TableList>("", optionToUse);
    } /// <summary>
    ///
    /// </summary>
    /// <param name="tableName"></param>
    /// <param name="scannerSettings"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<ScannerInformation> CreateScannerAsync(string tableName, Scanner scannerSettings, RequestOptions options)
    {
    string endPoint = string.Format("{0}/{1}", tableName, EndPointType.Scanner); using (HttpWebResponse response = await ExecuteMethodAsync(WebMethod.Post, endPoint, scannerSettings, options))
    {
    if (response.StatusCode != HttpStatusCode.Created)
    {
    using (var output = new StreamReader(response.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format( "Couldn't create a scanner for table {0}! Response code was: {1}, expected 201! Response body was: {2}",
    tableName, response.StatusCode, message));
    }
    }
    string location = response.Headers.Get("Location");
    if (location == null)
    {
    throw new ArgumentException("Couldn't find header 'Location' in the response!");
    } return new ScannerInformation(new Uri(location), tableName, response.Headers);
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="scannerInfo"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<CellSet> ScannerGetNextAsync(ScannerInformation scannerInfo, RequestOptions options)
    {
    string endPoint = string.Format("{0}/{1}/{2}", scannerInfo.TableName, EndPointType.Scanner, scannerInfo.ScannerId);
    using (HttpWebResponse webResponse = await GetRequestAsync(endPoint, options))
    {
    if (webResponse.StatusCode == HttpStatusCode.OK)
    {
    return Serializer.Deserialize<CellSet>(webResponse.GetResponseStream());
    } return null;
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="table"></param>
    /// <param name="cells"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    public async Task<bool> StoreCellsAsync(string table, CellSet cells, RequestOptions options = null)
    {
    var optionToUse = options ?? _globalRequestOptions;
    string path = table + "/somefalsekey";
    using (HttpWebResponse webResponse = await PutRequestAsync(path, cells, options))
    {
    if (webResponse.StatusCode == HttpStatusCode.NotModified)
    {
    return false;
    } if (webResponse.StatusCode != HttpStatusCode.OK)
    {
    using (var output = new StreamReader(webResponse.GetResponseStream()))
    {
    string message = output.ReadToEnd();
    throw new WebException(
    string.Format("Couldn't insert into table {0}! Response code was: {1}, expected 200! Response body was: {2}",
    table, webResponse.StatusCode, message));
    }
    }
    }
    return true;
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="endpoint"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<T> GetRequestAndDeserializeAsync<T>(string endpoint, RequestOptions options)
    {
    using (WebResponse response = await _requester.IssueWebRequestAsync(endpoint, WebMethod.Get, null, options))
    {
    using (Stream responseStream = response.GetResponseStream())
    {
    return Serializer.Deserialize<T>(responseStream);
    }
    }
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="TReq"></typeparam>
    /// <param name="endpoint"></param>
    /// <param name="query"></param>
    /// <param name="request"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> PutRequestAsync<TReq>(string endpoint, TReq request, RequestOptions options)
    where TReq : class
    {
    return await ExecuteMethodAsync(WebMethod.Post, endpoint, request, options);
    } /// <summary>
    ///
    /// </summary>
    /// <typeparam name="TReq"></typeparam>
    /// <param name="method"></param>
    /// <param name="endpoint"></param>
    /// <param name="request"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> ExecuteMethodAsync<TReq>(string method,string endpoint,TReq request,RequestOptions options) where TReq : class
    {
    using (var input = new MemoryStream(options.SerializationBufferSize))
    {
    if (request != null)
    {
    Serializer.Serialize(input, request);
    }
    input.Seek(, SeekOrigin.Begin);
    return await _requester.IssueWebRequestAsync(endpoint,method, input, options);
    }
    } /// <summary>
    ///
    /// </summary>
    /// <param name="endpoint"></param>
    /// <param name="query"></param>
    /// <param name="options"></param>
    /// <returns></returns>
    private async Task<HttpWebResponse> GetRequestAsync(string endpoint, RequestOptions options)
    {
    return await _requester.IssueWebRequestAsync(endpoint, WebMethod.Get, null, options);
    }
    }
  • 按步骤完成上面的代码,编译通过即OK,下一篇进入sdk的测试验证之旅

HBase(一): c#访问hbase组件开发的更多相关文章

  1. HBase(二): c#访问HBase之股票行情Demo

    上一章完成了c#访问hbase的sdk封装,接下来以一个具体Demo对sdk进行测试验证.场景:每5秒抓取指定股票列表的实时价格波动行情,数据下载后,一方面实时刷新UI界面,另一方面将数据放入到在内存 ...

  2. 使用C#通过Thrift访问HBase

    前言 因为项目需要要为客户程序提供C#.Net的HBase访问接口,而HBase并没有提供原生的.Net客户端接口,可以通过启动HBase的Thrift服务来提供多语言支持. Thrift介绍 环境 ...

  3. CDH 6.0.1 版本 默认配置下 HUE | happybase 无法访问 Hbase 的问题

    第一个问题 HUE 无法直接连接到 HBase 在默认配置下 CDH 6.0.1 版本下的 HBase2.0 使用了默认配置 hbase.regionserver.thrift.compact = T ...

  4. Hbase记录-client访问zookeeper大量断开以及参数调优分析(转载)

    1.hbase client配置参数 超时时间.重试次数.重试时间间隔的配置也比较重要,因为默认的配置的值都较大,如果出现hbase集群或者RegionServer以及ZK关掉,则对应用程序是灾难性的 ...

  5. 使用C#和Thrift来访问Hbase实例

    今天试着用C#和Thrift来访问Hbase,主要参考了博客园上的这篇文章.查了Thrift,Hbase的资料,结合博客园的这篇文章,终于搞好了.期间经历了不少弯路,下面我尽量详细的记录下来,免得大家 ...

  6. Pyspark访问Hbase

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl/p/7347167.html 转载请注明出处 记录自己最近抽空折腾虚拟机环境时用spark2.0的pyspark ...

  7. windows平台下用C#访问HBase

    Hadoop中的HBase有多种数据访问方式,ubuntu里可以用hbase shell查看操作hbase数据库,但windows平台下需要用thrift对它进行访问. 例如hadoop安装在/usr ...

  8. JAVA API访问Hbase org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32

    Java使用API访问Hbase报错: 我的hbase主节点是spark1   java代码访问hbase的时候写的是ip 结果运行程序报错 不能够识别主机名 修改主机名     修改主机hosts文 ...

  9. PHP通过thrift2访问HBASE

    前一段时间需要在网页上显示HBASE查询的结果,考虑用PHP来实现,在网上搜了一下,普遍都是用thrift作为接口来实现的.​ 参考博文:​ http://www.cnblogs.com/scotom ...

随机推荐

  1. Gmail邮箱添加域名解析

    主机记录  MX   服务器地址 优先级@  MX   ASPMX.L.GOOGLE.COM. 10@  MX   ALT1.ASPMX.L.GOOGLE.COM. 20@  MX   ALT2.AS ...

  2. magento后台登陆被锁定 索引报错的解决:General error: 1205 Lock wait timeout

    1. magento在索引的时候用shell,有时候会报错: General error: 1205 Lock wait timeout exceeded 这个时候,是因为行锁的原因,在表中您直接用s ...

  3. 转:通过代码理解Asp.net4中的几种ClientIDMode设置.

    转:http://www.cnblogs.com/xray2005/archive/2011/07/05/2097881.html 以前我们可以通过ClientID在JavaScript脚本中服务器端 ...

  4. nginx的启动,停止命令

    停止操作停止操作是通过向nginx进程发送信号(什么是信号请参阅linux文 章)来进行的步骤1:查询nginx主进程号ps -ef | grep nginx在进程列表里 面找master进程,它的编 ...

  5. 将存储在本地的大量分散的小文件,合并并保存在hdfs文件系统中

    import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java ...

  6. maven为不同环境打包(hibernate)-超越昨天的自己系列(6)

    超越昨天的自己系列(6) 使用ibatis开发中,耗在dao层的开发时间,调试时间,差错时间,以及适应修改需求的时间太长,导致项目看起来就添删改查,却特别费力.   在项目性能要求不高的情况下,开始寻 ...

  7. davlik虚拟机内存管理之一——内存分配

    转载自http://www.miui.com/thread-74715-1-1.html dalvik虚拟机是Google在Android平台上的Java虚拟机的实现,内存管理是dalvik虚拟机中的 ...

  8. Xcode 工程文件打开不出来, cannot be opened because the project file cannot be parsed.

    svn更新代码后,打开xcode工程文件,会出现  xxx..xcodeproj  cannot be opened because the project file cannot be parsed ...

  9. Selenium WebDriver对cookie进行处理绕过登录验证码

    现在几乎所有登录页面都会带一个验证码,做起自动化这块比较麻烦, 所以要绕过网站的验证码. 首先需要手动登录一次你的测试网站,去chrome的F12里获取这个网站的cookie信息,找到对应的保存登录信 ...

  10. banner轮播图js

    例子1: if(!$('.side_ul ul').is(":animated")){            var wli = $('.side_ul li').width()+ ...