java获取列族的列,我们可以从HBase表中获取所有列名吗?

红太狼 2022-08-29 15:45 378阅读 0赞

![Image 1][]

Setup:

I have an HBase table, with 100M+ rows and 1 Million+ columns. Every row has data for only 2 to 5 columns. There is in just 1 Column Family.

Problem:

I want to find out all the distinct qualifiers (columns) in this column family. Is there a quick way to do that?

I can think of about scanning the whole table, then getting familyMap for each row, get qualifier and add it to a Set<>. But that would be awfully slow, as there are 100M+ rows.

Can we do any better?

解决方案

You can use a mapreduce for this. In this case you don’t need to install a custom libs for hbase as in case for coprocessor.

Below a code for creating a mapreduce task.

Job setup

Job job = Job.getInstance(config);

job.setJobName(“Distinct columns”);

Scan scan = new Scan();

scan.setBatch(500);

scan.addFamily(YOU_COLUMN_FAMILY_NAME);

scan.setFilter(new KeyOnlyFilter()); //scan only key part of KeyValue (raw, column family, column)

scan.setCacheBlocks(false); // don’t set to true for MR jobs

TableMapReduceUtil.initTableMapperJob(

YOU_TABLE_NAME,

scan,

OnlyColumnNameMapper.class, // mapper

Text.class, // mapper output key

Text.class, // mapper output value

job);

job.setNumReduceTasks(1);

job.setReducerClass(OnlyColumnNameReducer.class);

job.setReducerClass(OnlyColumnNameReducer.class);

Mapper

public class OnlyColumnNameMapper extends TableMapper {

@Override

protected void map(ImmutableBytesWritable key, Result value, final Context context) throws IOException, InterruptedException {

CellScanner cellScanner = value.cellScanner();

while (cellScanner.advance()) {

Cell cell = cellScanner.current();

byte[] q = Bytes.copy(cell.getQualifierArray(),

cell.getQualifierOffset(),

cell.getQualifierLength());

context.write(new Text(q),new Text());

}

}

}

Reducer

public class OnlyColumnNameReducer extends Reducer {

@Override

protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {

context.write(new Text(key), new Text());

}

}

[Image 1]:

发表评论

表情:
评论列表 (有 0 条评论,378人围观)

还没有评论,来说两句吧...

相关阅读

    相关 Oracle获取所有

    想用insert into将一个表的数据导入另一个表,但两个表的列并不一样,后一个表比前者少几个,相同部分的名称是一样的,所以想直接获得目标表的所有列名,然后再从源表中导出这些

    相关 HBase高级配置

    HBase有几个高级特性,在你设计表时可以使用。这些特性不一定联系到模式或行键设计,但是它们定义了某些方面的表行为。本节我们讨论这些配置参数,以及你可以如何使用它们。 1