Skip to content

Latest commit

 

History

History
145 lines (111 loc) · 4.5 KB

readme.md

File metadata and controls

145 lines (111 loc) · 4.5 KB

#Hbase TIdx The solution of Hbase Update-Time's Secondary Index based on Apache Phoenix.

#Why To Use If you use some id to the hbase's rowkey, but you also need to scan table by the record's update-time, You can use Hbase Tidx.

#Feature

  • No Concurrent-Write Problem With The Pheonix Local Secondary Index
  • Automaticly Update Index Table With HBase Coprocessor
  • Native Scan Support Without The Phoenix SQL
  • Hive Integration

#Build

git clone ...
cd ...
mvn clean package

If you use hdp, you can use the hdp profile:

mvn clean package -Phdpxxx

If you use other phoenix/hbase/hive, you can edit the pom.xml.

#How To Use ##Prepare

Install

  • install apache phoenix, see details
  • install phoenix secondary index, see details
  • install hbase tidx: build firstly, then put the hbase-tidx-core-xxx.jar to the hbase lib directory

Create Table And Local Index In Phoenix

create table t1 (key varchar primary key, t unsigned_long, a varchar) VERSIONS=1;
create local index t1_local_index_0 on t1(t);

Add RegionObserver To The DataTable And IndexTable

get phoenix index id:

hbase com.github.dryangkun.hbase.tidx.tool.GetPhoenixIndexId --jdbc-url ... --data-table t1 --index-name t1_local_index_0

add region observer to data and index hbase table:

hbase shell
# --------add data update region observer--------
disable 'T1'
alter 'T1', 'coprocessor'=>'|com.github.dryangkun.hbase.tidx.TxDataRegionObserver|1001|tx.time.col=0:T,tx.phoenix.index.id=-32768'
enable 'T1'
# --------add index scan region observer --------
disable '_LOCAL_IDX_T1'
alter '_LOCAL_IDX_T1', 'coprocessor'=>'|com.github.dryangkun.hbase.tidx.TxRegionObserver|1001|tx.time.col=0:T,tx.phoenix.index.id=-32768'
enable '_LOCAL_IDX_T1'

or

hbase com.github.dryangkun.hbase.tidx.tool.AddRegionObservers --jdbc-url ... --data-table t1 --index-name t1_local_index_0

observer arguments:

  • tx.time.col: the update-time's family:qualifier, eg 0:T
  • tx.pidx.id: the phoenix local index id

##Update Data Table see TxConcurrencyTest, no difference with directly put data table.

##Scan Index Table see TxScanExample.

scan index table, but return the data table result.

time-check: if there is different between index update-time and data update-time, then don't return the record.

when time-check is set to false, then the returned-data-table-result contains special virtual faimly:qualifier(0:^T).

##Mapreduce see TxJobExample.

you can run the example:

java -cp`hadoop classpath`:`hbase mapredcp`:hbase-tidx-core-xxx.jar com.github.dryangkun.hbase.tidx.mapreduce.TxJobExample

##Hive Integration

Install

configure 'hive.aux.jars.path' in hive-site.xml:

<property>
    <name>hive.aux.jars.path</name>
    <value>
        file:///.../hbase-tidx-hive-xxx.jar,
        file:///.../hbase-tidx-core-xxx.jar,
        file:///phoenix-path/phoenix-core-xxx.jar,
        hbase-mapred-jars # get from shell `hbase mapredcp`
    </value>
</property>

Create Hive Table

create external table hbase_t1(
key string, 
t bigint, 
a string) 
stored by 'com.github.dryangkun.hbase.tidx.hive.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,0:T#b,0:A")
tblproperties("hbase.table.name"="T1","tx.hive.time.col"="0:T","tx.hive.pidx.id"="-32768");

other properties is the same with Hive HBaseIntegration

usage:

select * from hbase_t1 where t >= ... and t < ...

not support "between ... and ..." now.

#Limitation ##Don't guarantee the consistency between data-table and index-table because update index-table after data-table put success, so if update index-table fail, there is not consistent.

you can retry put operation.

only if the with-max-timestamp put success when there are same rowkey's puts, then update index-table.

there is success deleting rowkey in data-table, then delete index-table.

##Don't contains the same rowkey between put and delete in one batch operations that too complex.

##Scan.setBatch not support when scan idex-table

#Todo

  • automaticly get phoenix index id
  • hive-integration: support "between ... and ..."