ELK学习总结（4-2）关于导入数据

用REST API的_bulk来批量插入，可以达到5到10w条每秒

把数据写进json文件，然后再通过批处理，执行文件插入数据：

1、先定义一定格式的json文件，文件不能过大，过大会报错

2、后用curl命令去执行Elasticsearch的_bulk来批量插入

建议生成10M一个文件，然后分别去执行这些小文件就可以了！

json数据文件内容的定义

{"index":{"_index":"meterdata","_type":"autoData"}}

{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:03:00"}

{"index":{"_index":"meterdata","_type":"autoData"}}

{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:04:00"}

{"index":{"_index":"meterdata","_type":"autoData"}}

{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:05:00"}

{"index":{"_index":"meterdata","_type":"autoData"}}

{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:06:00"}

{"index":{"_index":"meterdata","_type":"autoData"}}

{"Mfid ":1,"TData":172170,"TMoney":209,"HTime":"2016-05-17T08:07:00"}

批处理内容的定义

cd E:\curl-7.50.3-win64-mingw\bin

curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\437714060.json

curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\743719428.json

curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\281679894.json

curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\146257480.json

curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\892018760.json

pause

工具代码

private void button1_Click(object sender, EventArgs e)
{
//Application.StartupPath + "\\" + NextFile.Name
Task.Run(() => { CreateDataToFile(); });
}
public void CreateDataToFile()
{
StringBuilder sb = new StringBuilder();
StringBuilder sborder = new StringBuilder();
int flag = 1;
sborder.Append(@"cd E:\curl-7.50.3-win64-mingw\bin" + Environment.NewLine);
DateTime endDate = DateTime.Parse("2016-10-22");
for (int i = 1; i <= 10000; i++)//1w个点
{
DateTime startDate = DateTime.Parse("2016-10-22").AddYears(-1);
this.Invoke(new Action(() => { label1.Text = "生成第" + i + "个"; }));

while (startDate <= endDate)//每个点生成一年数据,每分钟一条
{
if (flag > 100000)//大于10w分割一个文件
{
string filename = new Random(GetRandomSeed()).Next(900000000) + ".json";

FileStream fs3 = new FileStream(Application.StartupPath + "\\testdata\\" + filename, FileMode.OpenOrCreate);
StreamWriter sw = new StreamWriter(fs3, Encoding.GetEncoding("GBK"));
sw.WriteLine(sb.ToString());
sw.Close();
fs3.Close();
sb.Clear();
flag = 1;
sborder.Append(@"curl 172.17.1.15:9200/_bulk?pretty --data-binary @E:\Bin\Debug\testdata\" + filename + Environment.NewLine);

}
else
{
sb.Append("{\"index\":{\"_index\":\"meterdata\",\"_type\":\"autoData\"}}" + Environment.NewLine);
sb.Append("{\"Mfid \":" + i + ",\"TData\":" + new Random().Next(1067500) + ",\"TMoney\":" + new Random().Next(1300) + ",\"HTime\":\"" + startDate.ToString("yyyy-MM-ddTHH:mm:ss") + "\"}" + Environment.NewLine);
flag++;
}
startDate = startDate.AddMinutes(1);//
}

}
sborder.Append("pause");
FileStream fs1 = new FileStream(Application.StartupPath + "\\testdata\\order.bat", FileMode.OpenOrCreate);
StreamWriter sw1 = new StreamWriter(fs1, Encoding.GetEncoding("GBK"));
sw1.WriteLine(sborder.ToString());
sw1.Close();
fs1.Close();
MessageBox.Show("生成完毕");

}
static int GetRandomSeed()
{//随机生成不重复的编号
byte[] bytes = new byte[4];
System.Security.Cryptography.RNGCryptoServiceProvider rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
rng.GetBytes(bytes);
return BitConverter.ToInt32(bytes, 0);
}

总结

测试结果，发现Elasticsearch的搜索速度是挺快的，生成过程中，在17亿数据时查了一下，根据Mid和时间在几个月范围的数据，查十条数据两秒多完成查询，

而且同一查询条件查询越多，查询就越快，应该是Elasticsearch缓存了，

52亿条数据，大概占用500G空间左右，还是挺大的，

相比Protocol Buffers存储的数据，要大三倍左右，但搜索速度还是比较满意的。