OpenTSDB and HBase rough performance test

In order to see what technological choices we have to implement a charting solution for hundreds of millions of points we decided to try OpenTSDB and check results against its underlying HBase.

The point of this test is to get a rough idea if this technology would be appropriate for our needs. Planned the following tests:

fastest data retrieval to get 5000 points out of 10 million points,
fastest data retrieval to get 5000 points out of 200 million points.

We use these points to generate JS charts. On this benchmark did not test scalability, only used 1-8 threads to gather data to see how this impacts the performance.

OpenTSDB v2.0 Benchmark

From the OpenTSDB site, the description is:
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Retrieval of 5000 out of 10 million points

System, configuration and data retrieval procedure

The benchmark machine is Linux (Ubuntu 12.04 64-bit) with 4 Cores and 10GB of RAM.
With OpenTSDB v2.0 used HBase version 0-94.5. Disabled compression (COMPRESSION = NONE) because had problems with compression=lzo on Ubuntu, it happened from time to time to receive errors on database creation.
On OpenTSDB put the data through socket and got the data through the OpenTSDB http api.

Generated an OpenTSDB database with 10 million records by inserting into a metric: a long int date, an int value, and a tag named “sel” that is a string.

The insert operation is made with one thread. For data retrieval used threads (1,2,4 and 8 threads per run), in our test case every thread runs the same operation.

5000 rows selected out of 10 million rows

Database	Operation	Total rows	Threads	No of selected rows	Run time (Select + Fetch)
OpenTSDB	Insert	10000000	1	0	127305 ms
OpenTSDB	Select+Fetch	10000000	1	5000	2224 ms	no cache
OpenTSDB	Select+Fetch	10000000	1	5000	161 ms
OpenTSDB	Select+Fetch	10000000	2	5000	146 ms
OpenTSDB	Select+Fetch	10000000	4	5000	237 ms
OpenTSDB	Select+Fetch	10000000	8	5000	228 ms

threads – represents the number of threads that ran in the same time
first run – it’s the first run without any cache made, the other runs are with cache
run time – it’s the total run time of select+fetch

Problems encountered

While using OpenTSDB encountered the following problems:

The retrieved data at this moment can be only as ASCII (raw data) or as png image. The JSON option it’s not yet implemented,
Failed to run the test case with 200 million points inserted into a metric, even when runing the OpenTSDB Java instance with 10GB of RAM (-Xmx10240m -Xms10240m -XX:MaxPermSize=10g ) always receieved an OutOfMemory error. Received this error from OpenTSDB logs not from HBase or our Java process,
On OpenTSDB if you insert 2 points with the same date in the same metric (in seconds) all queries that will include the duplicate date will fail with an exception (net.opentsdb.core.IllegalDataException: Found out of order or duplicate data),
The connection to the HBase server dropped suddenly several times,
Not an error but maybe a limitation: when tried inserting 10 million metrics got an “[New I/O worker #1] UniqueId: Failed to lock the `MAXID_ROW’ row”.

Conclusions for this test case

OpenTSDB beats MySql and MongoDB at every test. It is 2-4X faster than MySql with or without cache, 7 – 328X times faster than MongoDB.

The problems encountered at the current version show that this version can’t be used in production, needs fixes.

Retrieval of 5000 out of 200 million points

As stated in the “Encountered problems” section of the previous test it was not possible to test performance for 200 million points due to OutOfMemory errors.
Failed to run the test case with 200 million points, even when runing the OpenTSDB Java instance with 10GB of RAM (-Xmx10240m -Xms10240m -XX:MaxPermSize=10g ) always receieved an OutOfMemory error. Received this error from OpenTSDB logs not from HBase or our Java process

Code used for tests

Insert Code:

s.connect(new InetSocketAddress(host, 4242));
s_out = new PrintWriter(s.getOutputStream(), true);

for (i = 0; i <= tsdbgen.argu; i++) {
if (i % 10000 == 0) {
System.gc();
} 

ri = generator.nextInt(99999); 

if (tsdbgen.nrselect >= nrcount) {
nrcount++;
tsdbgen.startLong++;
message = "put test " + tsdbgen.startLong + " " + ri
+" sel=12345";

} else {
tsdbgen.startLong++;
message = "put test " + tsdbgen.startLong + " " + ri
+ " sel=ubuntu";

}

s_out.println(message);
}

s_out.close();
s.close();

s.connect(new InetSocketAddress(host, 4242));

s_out = new PrintWriter(s.getOutputStream(), true);

for (i = 0; i <= tsdbgen.argu; i++) {

if (i % 10000 == 0) {

System.gc();

}

ri = generator.nextInt(99999);

if (tsdbgen.nrselect >= nrcount) {

nrcount++;

tsdbgen.startLong++;

message = "put test " + tsdbgen.startLong + " " + ri

+" sel=12345";

} else {

tsdbgen.startLong++;

message = "put test " + tsdbgen.startLong + " " + ri

+ " sel=ubuntu";

}

s_out.println(message);

}

s_out.close();

s.close();

Get Code no cache ( for no cache used the end time the actual date so openTSDB doesn’t use cache):

URL start = new URL(
"http://localhost:4242/q?start=2006/02/03-12:00:00&end="+tsdbno.data+"&m=sum:test{sel=12345}&ascii");
URLConnection yc = start.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;

while ((inputLine = in.readLine()) != null) {
count++;
// System.out.println(inputLine);
}

in.close();
System.out.println("Rows: " + count);

URL start = new URL(

"http://localhost:4242/q?start=2006/02/03-12:00:00&end="+tsdbno.data+"&m=sum:test{sel=12345}&ascii");

URLConnection yc = start.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(

yc.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null) {

count++;

// System.out.println(inputLine);

}

in.close();

System.out.println("Rows: " + count);

Get code with cache (for cache ran the same query several times):

URL start = new URL(
"http://localhost:4242/q?start=2006/05/03-12:00:00&m=sum:test{sel=12345}&ascii");
URLConnection yc = start.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;

while ((inputLine = in.readLine()) != null) {
count++;
// System.out.println(inputLine);
}

in.close();
System.out.println("Rows: " + count);

URL start = new URL(

"http://localhost:4242/q?start=2006/05/03-12:00:00&m=sum:test{sel=12345}&ascii");

URLConnection yc = start.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(

yc.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null) {

count++;

// System.out.println(inputLine);

}

in.close();

System.out.println("Rows: " + count);

HBase Test

Made this test mostly as a check on the previous OpenTSDB performance results.
On Hbase inserted into rowKey (in this case “ubuntu” or “another”): String family, String qualifier and String Value

database	operation	total rows	threads	no of selected rows	run time (select + fetch)
HBase	Insert	10000000	8	0	4285229 ms
HBase	Select+Fetch	10000000	1	5000	ms	no cache
HBase	Select+Fetch	10000000	1	5000	134 ms
HBase	Select+Fetch	10000000	2	5000	184 ms
HBase	Select+Fetch	10000000	4	5000	337 ms
HBase	Select+Fetch	10000000	8	5000	257 ms

Insert Code:

public static void addRecord(HTable table, String rowKey, String family,
String qualifier, String value) throws Exception {
try {
// HTable table = new HTable(conf, tableName);
Put put = new Put(Bytes.toBytes(rowKey));
put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier),
Bytes.toBytes(value));
table.put(put);
// System.out.println("insert recored " + rowKey + " to table "
// + tableName + " ok.");
} catch (IOException e) {
e.printStackTrace();
}
}

Configuration conf = null;
conf = HBaseConfiguration.create();
String tablename = "test";
try {
HTable table = new HTable(conf, tablename);
for (i = 0; i < hbasegen.argu; i++) {
if (i % 250000 == 0) {
System.gc();
}
final String randomString = UUID.randomUUID().toString();
if (hbasegen.nrselect >= nrcount) {
nrcount++;

c = randomString.substring(0, 5);
d = randomString.substring(5, 10);
e = randomString.substring(10, 15);

HBaseTest.addRecord(table, "ubuntu", "test1", c, d);
} else {
c = randomString.substring(0, 5);
d = randomString.substring(5, 10);
e = randomString.substring(10, 15);

HBaseTest.addRecord(table, "another", "test1", c, d);
}

}

} catch (Exception x) {
x.printStackTrace();
}

public static void addRecord(HTable table, String rowKey, String family,

String qualifier, String value) throws Exception {

try {

// HTable table = new HTable(conf, tableName);

Put put = new Put(Bytes.toBytes(rowKey));

put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier),

Bytes.toBytes(value));

table.put(put);

// System.out.println("insert recored " + rowKey + " to table "

// + tableName + " ok.");

} catch (IOException e) {

e.printStackTrace();

}

Configuration conf = null;

conf = HBaseConfiguration.create();

String tablename = "test";

try {

HTable table = new HTable(conf, tablename);

for (i = 0; i < hbasegen.argu; i++) {

if (i % 250000 == 0) {

System.gc();

}

final String randomString = UUID.randomUUID().toString();

if (hbasegen.nrselect >= nrcount) {

nrcount++;

c = randomString.substring(0, 5);

d = randomString.substring(5, 10);

e = randomString.substring(10, 15);

HBaseTest.addRecord(table, "ubuntu", "test1", c, d);

} else {

c = randomString.substring(0, 5);

d = randomString.substring(5, 10);

e = randomString.substring(10, 15);

HBaseTest.addRecord(table, "another", "test1", c, d);

}

} catch (Exception x) {

x.printStackTrace();

}

Get code:

Configuration conf = null;
conf = HBaseConfiguration.create();

try {
HTable table = new HTable(conf, tablename);
HBaseTest.getOneRecord(table,"ubuntu");

} catch (Exception x) {
x.printStackTrace();
}

public static void getOneRecord(HTable table, String rowKey)
throws IOException {
int count = 0;
Get get = new Get(rowKey.getBytes());
Result rs = table.get(get);
for (KeyValue kv : rs.raw()) {}

}

Configuration conf = null;

conf = HBaseConfiguration.create();

try {

HTable table = new HTable(conf, tablename);

HBaseTest.getOneRecord(table,"ubuntu");

} catch (Exception x) {

x.printStackTrace();

}

public static void getOneRecord(HTable table, String rowKey)

throws IOException {

int count = 0;

Get get = new Get(rowKey.getBytes());

Result rs = table.get(get);

for (KeyValue kv : rs.raw()) {}

}

OpenTSDB v1.1.0 Benchmark

Retrieval of 5000 out of 10 million points

System, configuration and data retrieval procedure

The benchmark machine is Linux (Ubuntu 12.04 64-bit) with 4 Cores and 10GB of RAM.
For the LZO compression built hadoop-lzo and copied the lib to the HBase instance. Also created the tsdb tables with COMPRESSION=lzo.

On OpenTSDB put the data through socket and got the data through the OpenTSDB http api.
5000 rows selected out of 10 million rows with LZO

Database	Operation	Total rows	Threads	No of selected rows	Run time (Select + Fetch)
OpenTSDB 1.1.0	Insert	10000000	1	0	113651 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	1	5000	2895 ms	no cache
OpenTSDB 1.1.0	Select+Fetch	10000000	1	5000	97 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	2	5000	140 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	4	5000	128 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	8	5000	207 ms

threads – represents the number of threads that ran in the same time
run time – it’s the total run time of select+fetch
5000 rows selected out of 10 million rows with Date Range

Inserted 10 million points (5000 points in 2013 and the rest before 2013). Made a query for 5000 points with max value from 2013/01/03-12:00:00 – current date

Database	Operation	Total rows	Threads	No of selected rows	Run time (Select + Fetch)
OpenTSDB 1.1.0	Insert	10000000	1	0	124185 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	1	5000	136 ms	no cache
OpenTSDB 1.1.0	Select+Fetch	10000000	1	5000	126 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	2	5000	170 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	4	5000	179 ms
OpenTSDB 1.1.0	Select+Fetch	10000000	8	5000	227 ms

Problems encountered

While using OpenTSDB 1.1.0 encountered the following problems:

When tried to insert 200 million points with the default parametrs (./tsdb tsd –port=4242 –staticroot=staticroot –cachedir=”$tsdtmp”) got an OutOfMemory exeception after a while,
When tried to insert 200 million points by modifying the startup of the tsdb to have have access to more RAM (java -Xmx10240m -Xms10240m -XX:MaxPermSize=10g -enableassertions -enablesystemassertions -classpath /root/opentsdb-1.1.0/third_party/hbase/asynchbase-1.4.1.jar:/root/opentsdb-1.1.0/third_party/guava/guava-13.0.1.jar:/root/opentsdb-1.1.0/third_party/slf4j/log4j-over-slf4j-1.7.2.jar:/root/opentsdb-1.1.0/third_party/logback/logback-classic-1.0.9.jar:/root/opentsdb-1.1.0/third_party/logback/logback-core-1.0.9.jar:/root/opentsdb-1.1.0/third_party/netty/netty-3.6.2.Final.jar:/root/opentsdb-1.1.0/third_party/slf4j/slf4j-api-1.7.2.jar:/root/opentsdb-1.1.0/third_party/suasync/suasync-1.3.1.jar:/root/opentsdb-1.1.0/third_party/zookeeper/zookeeper-3.3.6.jar:/root/opentsdb-1.1.0/tsdb-1.1.0.jar:/root/opentsdb-1.1.0/src net.opentsdb.tools.TSDMain –port=4242 –staticroot=staticroot –cachedir=/tmp/tsd/tsd) got often the following exception:2013-04-03 18:07:18,965 ERROR [New I/O worker #10] CompactionQueue: Failed to write a row to re-compact
org.hbase.async.RemoteException: org.apache.hadoop.hbase.RegionTooBusyException: region is flushing
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2592)
…,
If tried to query the data from point 2. got the following exception:java.lang.ArrayIndexOutOfBoundsException: nullopentsdb process
java.lang.OutOfMemoryError: GC overhead limit exceeded,
If in the same metric there are two identical dates got the following exception (and didn’t get any data from OpenTSDB):
Error 2
Duplicate data cell:
Request failed: Internal Server Error
net.opentsdb.core.IllegalDataException: Found out of order or duplicate data: cell=Cell([56, 87], [0, 0, 0, 0, 0, 1, 12, -112]), delta=901.

Conclusions for this test case

The 1.1.0 is more stable than 2.0, didn’t crash but gave the above exceptions.

Seems that the LZO option doesn’t make the insert and no cache retrieval faster for this test case. Some improvements can be seen on the data retrieval with cache.

Got a surprisingly good value on the date range query with no cache.

Code used for tests

Insert Code:

s.connect(new
 InetSocketAddress(host, 4242));
s_out = new PrintWriter(s.getOutputStream(), true);

for (i = 0; i <= tsdbgen.argu; i++) { if (i % 10000 == 0) { 
System.gc(); } ri = generator.nextInt(99999); if (tsdbgen.nrselect >=
 nrcount) {
nrcount++;
tsdbgen.selectLong++;
message = "put test " + tsdbgen.selectLong + " " + ri
+" sel=ubuntu";
} else {
tsdbgen.startLong++;
message = "put test " + tsdbgen.startLong + " " + ri
+ " sel=mac";

}

// System.out.println(message);
s_out.println(message);
}

s_out.close();
s.close();

Date start = new Date("2007/12/24-08:24:15");
Date select = new Date("2013/01/24-08:24:15");

startLong = start.getTime() / 1000;
selectLong = select.getTime() / 1000;

s.connect(new InetSocketAddress(host, 4242));

s_out = new PrintWriter(s.getOutputStream(), true);

for (i = 0; i <= tsdbgen.argu; i++) { if (i % 10000 == 0) { System.gc(); } ri = generator.nextInt(99999); if (tsdbgen.nrselect >= nrcount) {

nrcount++;

tsdbgen.selectLong++;

message = "put test " + tsdbgen.selectLong + " " + ri

+" sel=ubuntu";

} else {

tsdbgen.startLong++;

message = "put test " + tsdbgen.startLong + " " + ri

+ " sel=mac";

}

// System.out.println(message);

s_out.println(message);

}

s_out.close();

s.close();

Date start = new Date("2007/12/24-08:24:15");

Date select = new Date("2013/01/24-08:24:15");

startLong = start.getTime() / 1000;

selectLong = select.getTime() / 1000;

Get Code (for one theread):

URL start = new URL(
"http://localhost:4242/q?start=2013/01/03-12:00:00&end="+tsdbno.data+"&m=max:test&ascii");
URLConnection yc = start.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;

while ((inputLine = in.readLine()) != null) {
count++;
//System.out.println(inputLine);
}

in.close();
System.out.println("Rows: " + count);

// tsdbno.data = is the current date

URL start = new URL(

"http://localhost:4242/q?start=2013/01/03-12:00:00&end="+tsdbno.data+"&m=max:test&ascii");

URLConnection yc = start.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(

yc.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null) {

count++;

//System.out.println(inputLine);

}

in.close();

System.out.println("Rows: " + count);

// tsdbno.data = is the current date

Get Code (with cache for one thread):

URL start = new URL(
"http://localhost:4242/q?start=2013/01/03-12:00:00&m=max:test&ascii");
URLConnection yc = start.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;

while ((inputLine = in.readLine()) != null) {
count++;
// System.out.println(inputLine);
}

in.close();
System.out.println("Rows: " + count);

URL start = new URL(

"http://localhost:4242/q?start=2013/01/03-12:00:00&m=max:test&ascii");

URLConnection yc = start.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(

yc.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null) {

count++;

// System.out.println(inputLine);

}

in.close();

System.out.println("Rows: " + count);

Kalyan Hadoop Training in Hyderabad @ ORIEN IT, Ameerpet, 040 65142345 , 9703202345

Pages

Wednesday, 6 August 2014