Skip to main content

Row Count : HBase Aggregation example

With the coprocessors HBase 0.92 introduces a new way to process data directly on a region server. As a user this is definitively a very exciting feature : now you can easily define your own distributed data services.

This post is not intended to help you how to define them (i highly recommend you to watch this presentation if you want to do so) but to quickly presents the new aggregation service shipped with HBase 0.92 that is built upon the endpoint coprocessor framework.

1. Enable AggregationClient coprocessor

You have two choices :

You can enable aggregation coprocessor on all your tables by adding the following lines to hbase-site.xml :
or can enable coprocessor only on a table throught the HBase shell :

1. disable the table
hbase> disable 'mytable'

2. add the coprocessor
hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.hadoop.hbase.coprocessor.AggregateImplementation||'

3. re-enable the table

hbase> enable 'mytable'

2. Increase RPC timeout (opt)
In some cases the following exception could be raised : Call to node2/ failed on socket timeout exception
On large table or if you are expecting slow I/O this might be necessary:
n.b. : you can also specify this property directly in the client source code, see next section
3. ''RowCount' Snippet
The following snippet uses the "row count" feature of the aggregation service.

public class MyAggregationClient {

    private static final byte[] TABLE_NAME = Bytes.toBytes("mytable");
    private static final byte[] CF = Bytes.toBytes("d");

    public static void main(String[] args) throws Throwable {

        Configuration customConf = new Configuration();

        // Increase RPC timeout, in case of a slow computation
        customConf.setLong("hbase.rpc.timeout", 600000);
        // Default is 1, set to a higher value for faster
        customConf.setLong("hbase.client.scanner.caching", 1000);
        Configuration configuration = HBaseConfiguration.create(customConf);
        AggregationClient aggregationClient = new AggregationClient(
        Scan scan = new Scan();
        long rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
        System.out.println("row count is " + rowCount);


4. What else ?

The full list of functions provided by the aggregation client can be found here :

See also for discussion about forthcoming functions.

Note that this version is designed to work on data whose type is Long. If you plan to use other data types you should provide your own column interpreter. See

Credits : thanks to Gary Helmling for his contribution on the HBase mailing list


Popular posts from this blog

HBase + Subversion + Eclipse + Windows

HBase + Subversion + Eclipse + Windows (it should be easy to adapt for Linux)

Update : please note that sinceHBase-4336 / HBase 0.96 the source tree is split in more than one Maven module this post is no more relevant, i have created a new post on this subject :
This is a simple setup in order to play with the source code of HBase under Microsoft Windows.
Since HBase use some Unix specific commands like chmod the only requirements here are Cygwin and a working Maven 3 environment. (It is obvious that you need Java and Eclipse, but you DON'T need anything else like the Eclipse Maven plugin or any SSH configuration)

1. Checkout the source code

The first step is to check out the source code from the Subversion repository. I did it under my cygwin home repository. In this example i want to play with the 0.90 branch :

svn co hbase-0.90

2. Generate the Eclipse configura…

Analyse d'un "thread dump" d'une JVM IBM sous AIX

Dans quels cas le thread dump est utile ?
Le thread dump est un instantané de l'activité des threads de la JVM. Leur analyse est intéressante dans les cas où l'activité de la JVM ne semble pas normale :
Activité suspendue (deadlock/interblocage) ou partiellement suspendue (starvation/famine)Activité existante mais le "débit" est en deçà de ce qui est attendu (Goulot d'étranglement / Bottleneck)Activité existante mais le "débit" reste nul (Boucle infinie / Infinite Loop)Comment avoir un thread dump ?
Nous nous limitons ici à la machine virtuel IBM sous AIX. Dans ce cas là il est extrêmement simple de déclencher la création d'un thread dump : il suffit de faire un kill -3 sur le processus Java.

Un fichier dont le nom est javacore.[date].[numero_processus].[compteur].txt est produit. Sur la sortie standard du processus vous devriez voir la ligne suivante s'afficher :
JVMDUMP010I Java Dump written to .....
En général le dump est produit dans le réper…