#Hbase TIdx The solution of Hbase Update-Time's Secondary Index based on Apache Phoenix.

#Why To Use If you use some id to the hbase's rowkey, but you also need to scan table by the record's update-time, You can use Hbase Tidx.

#Feature

No Concurrent-Write Problem With The Pheonix Local Secondary Index
Automaticly Update Index Table With HBase Coprocessor
Native Scan Support Without The Phoenix SQL
Hive Integration

#Build

git clone ...
cd ...
mvn clean package

If you use hdp, you can use the hdp profile:

mvn clean package -Phdpxxx

If you use other phoenix/hbase/hive, you can edit the pom.xml.

#How To Use ##Prepare

Install

install apache phoenix, see details
install phoenix secondary index, see details
install hbase tidx: build firstly, then put the hbase-tidx-core-xxx.jar to the hbase lib directory

Create Table And Local Index In Phoenix

create table t1 (key varchar primary key, t unsigned_long, a varchar) VERSIONS=1;
create local index t1_local_index_0 on t1(t);

Add RegionObserver To The DataTable And IndexTable

get phoenix index id:

hbase com.github.dryangkun.hbase.tidx.tool.GetPhoenixIndexId --jdbc-url ... --data-table t1 --index-name t1_local_index_0

add region observer to data and index hbase table:

hbase shell
# --------add data update region observer--------
disable 'T1'
alter 'T1', 'coprocessor'=>'|com.github.dryangkun.hbase.tidx.TxDataRegionObserver|1001|tx.time.col=0:T,tx.phoenix.index.id=-32768'
enable 'T1'
# --------add index scan region observer --------
disable '_LOCAL_IDX_T1'
alter '_LOCAL_IDX_T1', 'coprocessor'=>'|com.github.dryangkun.hbase.tidx.TxRegionObserver|1001|tx.time.col=0:T,tx.phoenix.index.id=-32768'
enable '_LOCAL_IDX_T1'

or

hbase com.github.dryangkun.hbase.tidx.tool.AddRegionObservers --jdbc-url ... --data-table t1 --index-name t1_local_index_0

observer arguments:

tx.time.col: the update-time's family:qualifier, eg 0:T
tx.pidx.id: the phoenix local index id

##Update Data Table see TxConcurrencyTest, no difference with directly put data table.

##Scan Index Table see TxScanExample.

scan index table, but return the data table result.

time-check: if there is different between index update-time and data update-time, then don't return the record.

when time-check is set to false, then the returned-data-table-result contains special virtual faimly:qualifier(0:^T).

##Mapreduce see TxJobExample.

you can run the example:

java -cp`hadoop classpath`:`hbase mapredcp`:hbase-tidx-core-xxx.jar com.github.dryangkun.hbase.tidx.mapreduce.TxJobExample

##Hive Integration

Install

configure 'hive.aux.jars.path' in hive-site.xml:

<property>
    <name>hive.aux.jars.path</name>
    <value>
        file:///.../hbase-tidx-hive-xxx.jar,
        file:///.../hbase-tidx-core-xxx.jar,
        file:///phoenix-path/phoenix-core-xxx.jar,
        hbase-mapred-jars # get from shell `hbase mapredcp`
    </value>
</property>

Create Hive Table

create external table hbase_t1(
key string, 
t bigint, 
a string) 
stored by 'com.github.dryangkun.hbase.tidx.hive.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,0:T#b,0:A")
tblproperties("hbase.table.name"="T1","tx.hive.time.col"="0:T","tx.hive.pidx.id"="-32768");

other properties is the same with Hive HBaseIntegration

usage:

select * from hbase_t1 where t >= ... and t < ...

not support "between ... and ..." now.

#Limitation ##Don't guarantee the consistency between data-table and index-table because update index-table after data-table put success, so if update index-table fail, there is not consistent.

you can retry put operation.

only if the with-max-timestamp put success when there are same rowkey's puts, then update index-table.

there is success deleting rowkey in data-table, then delete index-table.

##Don't contains the same rowkey between put and delete in one batch operations that too complex.

##Scan.setBatch not support when scan idex-table

#Todo

automaticly get phoenix index id
hive-integration: support "between ... and ..."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Install

Create Table And Local Index In Phoenix

Add RegionObserver To The DataTable And IndexTable

Install

Create Hive Table

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Install

Create Table And Local Index In Phoenix

Add RegionObserver To The DataTable And IndexTable

Install

Create Hive Table