flume使用hive stream写入到hive 迈不过友情╰ 2021-06-24 15:58 421阅读 0赞 1、hive中创建表: create table customers (id string, name string, email string, street_address string, company string) partitioned by (time string) clustered by (id) into 5 buckets stored as orc location '/user/iteblog/salescust' TBLPROPERTIES ('transactional'='true'); 注意:采用orc存储,同时支持clustered 为了在Hive中启用事务,我们需要在Hive中进行如下的配置: hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 2、配置flume: $ vi flumetohive.conf flumeagent1.sources = source_from_kafka flumeagent1.channels = mem_channel flumeagent1.sinks = hive_sink # Define / Configure source flumeagent1.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource flumeagent1.sources.source_from_kafka.zookeeperConnect = sandbox.hortonworks.com:2181 flumeagent1.sources.source_from_kafka.topic = SalesDBTransactions flumeagent1.sources.source_from_kafka.groupID = flume flumeagent1.sources.source_from_kafka.channels = mem_channel flumeagent1.sources.source_from_kafka.interceptors = i1 flumeagent1.sources.source_from_kafka.interceptors.i1.type = timestamp flumeagent1.sources.source_from_kafka.consumer.timeout.ms = 1000 # Hive Sink flumeagent1.sinks.hive_sink.type = hive flumeagent1.sinks.hive_sink.hive.metastore = thrift://sandbox.hortonworks.com:9083 flumeagent1.sinks.hive_sink.hive.database = raj flumeagent1.sinks.hive_sink.hive.table = customers flumeagent1.sinks.hive_sink.hive.txnsPerBatchAsk = 2 flumeagent1.sinks.hive_sink.hive.partition = %y-%m-%d-%H-%M flumeagent1.sinks.hive_sink.batchSize = 10 flumeagent1.sinks.hive_sink.serializer = DELIMITED flumeagent1.sinks.hive_sink.serializer.delimiter = , flumeagent1.sinks.hive_sink.serializer.fieldnames = id,name,email,street_address,company # Use a channel which buffers events in memory flumeagent1.channels.mem_channel.type = memory flumeagent1.channels.mem_channel.capacity = 10000 flumeagent1.channels.mem_channel.transactionCapacity = 100 # Bind the source and sink to the channel flumeagent1.sources.source_from_kafka.channels = mem_channel flumeagent1.sinks.hive_sink.channel = mem_channel 1)source使用的是kafka; 2)sink使用的是hive,这里没有使用hdfs+hive创建外表的方式,主要是因为flume的hive sink内部使用了hive stream来做的orc文件追加,好处是文件小而且效率高。 参考:https://www.iteblog.com/archives/1771.html
还没有评论,来说两句吧...