java获取列族的列,我们可以从HBase表中获取所有列名吗?
![Image 1][]
Setup:
I have an HBase table, with 100M+ rows and 1 Million+ columns. Every row has data for only 2 to 5 columns. There is in just 1 Column Family.
Problem:
I want to find out all the distinct qualifiers (columns) in this column family. Is there a quick way to do that?
I can think of about scanning the whole table, then getting familyMap for each row, get qualifier and add it to a Set<>. But that would be awfully slow, as there are 100M+ rows.
Can we do any better?
解决方案
You can use a mapreduce for this. In this case you don’t need to install a custom libs for hbase as in case for coprocessor.
Below a code for creating a mapreduce task.
Job setup
Job job = Job.getInstance(config);
job.setJobName(“Distinct columns”);
Scan scan = new Scan();
scan.setBatch(500);
scan.addFamily(YOU_COLUMN_FAMILY_NAME);
scan.setFilter(new KeyOnlyFilter()); //scan only key part of KeyValue (raw, column family, column)
scan.setCacheBlocks(false); // don’t set to true for MR jobs
TableMapReduceUtil.initTableMapperJob(
YOU_TABLE_NAME,
scan,
OnlyColumnNameMapper.class, // mapper
Text.class, // mapper output key
Text.class, // mapper output value
job);
job.setNumReduceTasks(1);
job.setReducerClass(OnlyColumnNameReducer.class);
job.setReducerClass(OnlyColumnNameReducer.class);
Mapper
public class OnlyColumnNameMapper extends TableMapper {
@Override
protected void map(ImmutableBytesWritable key, Result value, final Context context) throws IOException, InterruptedException {
CellScanner cellScanner = value.cellScanner();
while (cellScanner.advance()) {
Cell cell = cellScanner.current();
byte[] q = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
context.write(new Text(q),new Text());
}
}
}
Reducer
public class OnlyColumnNameReducer extends Reducer {
@Override
protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
context.write(new Text(key), new Text());
}
}
[Image 1]:
还没有评论,来说两句吧...