hive 算子和用户自定义函数 UDF UDAF UDTF-蒲公英云

Table of Contents

Built-in Operators

关系运算符

算术运算符

逻辑运算符

字符串操作符

复杂类型构造函数

复杂类型上的运算符

Built-in Functions

Built-in Aggregate Functions (UDAF)

Built-in Table-Generating Functions (UDTF)

使用样例

Built-in Operators

关系运算符

下面的操作符比较传递的操作数，并根据操作数之间的比较是否成立生成一个TRUE或FALSE值。

返回类型	名称	描述
A = B	All primitive types	TRUE if expression A is equal to expression B otherwise FALSE.
A == B	All primitive types	Synonym for the = operator.
A <=> B	All primitive types	Returns same result with EQUAL(=) operator for non-null operands, but returns TRUE if both are NULL, FALSE if one of the them is NULL. (As of version 0.9.0.)
A <> B	All primitive types	NULL if A or B is NULL, TRUE if expression A is NOT equal to expression B, otherwise FALSE.
A != B	All primitive types	Synonym for the <> operator.
A < B	All primitive types	NULL if A or B is NULL, TRUE if expression A is less than expression B, otherwise FALSE.
A <= B	All primitive types	NULL if A or B is NULL, TRUE if expression A is less than or equal to expression B, otherwise FALSE.
A > B	All primitive types	NULL if A or B is NULL, TRUE if expression A is greater than expression B, otherwise FALSE.
A >= B	All primitive types	NULL if A or B is NULL, TRUE if expression A is greater than or equal to expression B, otherwise FALSE.
A [NOT] BETWEEN B AND C	All primitive types	NULL if A, B or C is NULL, TRUE if A is greater than or equal to B AND A less than or equal to C, otherwise FALSE. This can be inverted by using the NOT keyword. (As of version 0.9.0.)
A IS NULL	All types	TRUE if expression A evaluates to NULL, otherwise FALSE.
A IS NOT NULL	All types	FALSE if expression A evaluates to NULL, otherwise TRUE.
A IS [NOT] (TRUE\|FALSE)	Boolean types	Evaluates to TRUE only if A mets the condition. (since:3.0.0 ) Note: NULL is UNKNOWN, and because of that (UNKNOWN IS TRUE) and (UNKNOWN IS FALSE) both evaluates to FALSE.
A [NOT] LIKE B	strings	NULL if A or B is NULL, TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The character in B matches any character in A (similar to . in posix regular expressions) while the % character in B matches an arbitrary number of characters in A (similar to .* in posix regular expressions). For example, ‘foobar’ like ‘foo’ evaluates to FALSE whereas ‘foobar’ like ‘foo ‘ evaluates to TRUE and so does ‘foobar’ like ‘foo%’.
A RLIKE B	strings	NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B, otherwise FALSE. For example, ‘foobar’ RLIKE ‘foo’ evaluates to TRUE and so does ‘foobar’ RLIKE ‘^f.*r$’.
A REGEXP B	strings	Same as RLIKE.

算术运算符

以下操作符支持操作数上的各种通用算术操作。所有返回数字类型;如果任何操作数为NULL，那么结果也是NULL。

返回类型	名称	描述
A + B	All number types	Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. For example since every integer is a float, therefore float is a containing type of integer so the + operator on a float and an int will result in a float.
A - B	All number types	Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A * B	All number types	Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.
A / B	All number types	Gives the result of dividing A by B. The result is a double type in most cases. When A and B are both integers, the result is a double type except when the hive.compat configuration parameter is set to “0.13” or “latest” in which case the result is a decimal type.
A DIV B	Integer types	Gives the integer part resulting from dividing A by B. E.g 17 div 3 results in 5.
A % B	All number types	Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A & B	All number types	Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A \| B	All number types	Gives the result of bitwise OR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
A ^ B	All number types	Gives the result of bitwise XOR of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands.
~A	All number types	Gives the result of bitwise NOT of A. The type of the result is the same as the type of A.

逻辑运算符

以下操作符支持创建逻辑表达式。它们都返回布尔值TRUE、FALSE或NULL，具体取决于操作数的布尔值。NULL表现为一个“未知”标志，因此如果结果取决于未知的状态，那么结果本身就是未知的。

返回类型	名称	描述
A AND B	boolean	TRUE if both A and B are TRUE, otherwise FALSE. NULL if A or B is NULL.
A OR B	boolean	TRUE if either A or B or both are TRUE, FALSE OR NULL is NULL, otherwise FALSE.
NOT A	boolean	TRUE if A is FALSE or NULL if A is NULL. Otherwise FALSE.
! A	boolean	Same as NOT A.
A IN (val1, val2, …)	boolean	TRUE if A is equal to any of the values. As of Hive 0.13 subqueries are supported in IN statements.
A NOT IN (val1, val2, …)	boolean	TRUE if A is not equal to any of the values. As of Hive 0.13 subqueries are supported in NOT IN statements.
[NOT] EXISTS (subquery)		如果子查询至少返回一行，则为真。由Hive 0.13支持。

字符串操作符

返回类型	名称	描述
A \|\| B	strings	连接操作数——concat(A,B)的简写。从Hive 2.2.0开始支持。

复杂类型构造函数

以下函数构造复杂类型的实例。

返回类型	名称	描述
map	(key1, value1, key2, value2, …)	使用给定的键/值对创建映射。
struct	(val1, val2, val3, …)	使用给定的字段值创建结构。结构字段名将是col1, col2，…
named_struct	(name1, val1, name2, val2, …)	使用给定的字段名和值创建结构。(截至Hive 0.8.0)
array	(val1, val2, …)	用给定的元素创建一个数组。
create_union	(tag, val1, val2, …)	创建一个union类型，该类型的值由标记参数指向。

复杂类型上的运算符

以下操作符提供了访问复杂类型中的元素的机制。

返回类型	名称	描述
A[n]	A is an Array and n is an int	返回数组a中的第n个元素。第一个元素的索引为0。例如，如果A是一个由[‘foo’， ‘bar’]组成的数组，那么[0]返回’foo’，[1]返回’bar’。
M[key]	M is a Map<K, V> and key has type K	返回与映射中的键对应的值。例如，如果M是一个由{‘f’ -> ‘foo’， ‘b’ -> ‘bar’， ‘all’ -> ‘foobar’}组成的映射，那么M[‘all’]返回’foobar’。
S.x	S is a struct	返回s的x字段，例如结构体foobar {int foo, int bar}， foobar.foo返回存储在结构体foo字段中的整数。

Built-in Functions

数学函数

返回类型	名称	描述
DOUBLE	round(DOUBLE a)	返回 a 的近似值，四舍五入
DOUBLE	round(DOUBLE a, INT d)	返回 a 的近似值，小数点后留 d 位
DOUBLE	bround(DOUBLE a)	Example: bround(2.5) = 2, bround(3.5) = 4.
DOUBLE	bround(DOUBLE a, INT d)	Example: bround(8.25, 1) = 8.2, bround(8.35, 1) = 8.4.
BIGINT	floor(DOUBLE a)	返回等于或小于a的最大BIGINT值。
BIGINT	ceil(DOUBLE a), ceiling(DOUBLE a)	返回等于或大于a的最小BIGINT值。
DOUBLE	rand(), rand(INT seed)	返回一个从0到1均匀分布的随机数(从行到行变化)。指定 seed 将确保生成的随机数序列是不变的。
DOUBLE	exp(DOUBLE a), exp(DECIMAL a)	返回ea，其中e是自然对数的底数。在Hive 0.13.0中添加的十进制版本。
DOUBLE	ln(DOUBLE a), ln(DECIMAL a)	返回参数a的自然对数。在Hive 0.13.0中添加的十进制版本。
DOUBLE	log10(DOUBLE a), log10(DECIMAL a)	返回参数a的10进制对数，十进制版本添加在Hive 0.13.0中。
DOUBLE	log2(DOUBLE a), log2(DECIMAL a)	返回参数a的2进制对数，十进制版本添加在Hive 0.13.0中。
DOUBLE	log(DOUBLE base, DOUBLE a) log(DECIMAL base, DECIMAL a)	返回参数a的base-base对数。在Hive 0.13.0中添加的十进制版本。
DOUBLE	pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p)	Returns `ap`.
DOUBLE	sqrt(DOUBLE a), sqrt(DECIMAL a)	返回参数a的平方跟。在Hive 0.13.0中添加的十进制版本。
STRING	bin(BIGINT a)	返回二进制格式的数字(参见http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_bin)。
STRING	hex(BIGINT a) hex(STRING a) hex(BINARY a)	如果参数是INT或二进制的，十六进制格式的字符串返回数字。否则，如果数字是字符串，则将每个字符转换为十六进制表示形式并返回结果字符串。(See http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex, `BINARY` version as of Hive 0.12.0.)
BINARY	unhex(STRING a)	逆的十六进制。将每对字符解释为十六进制数，并将其转换为该数的字节表示形式。 (`BINARY` version as of Hive 0.12.0, used to return a string.)
STRING	conv(BIGINT num, INT from_base, INT to_base), conv(STRING num, INT from_base, INT to_base)	将数字从给定的基数转换为另一个基数 (see http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv).
DOUBLE	abs(DOUBLE a)	返回绝对值。
INT or DOUBLE	pmod(INT a, INT b), pmod(DOUBLE a, DOUBLE b)	返回a mod b的正值。
DOUBLE	sin(DOUBLE a), sin(DECIMAL a)	返回sine a (a的单位是弧度)十进制版本添加在Hive 0.13.0。
DOUBLE	asin(DOUBLE a), asin(DECIMAL a)	如果-1<=a<=1返回arc sin a，否则返回NULL。十进制版本添加在Hive 0.13.0。
DOUBLE	cos(DOUBLE a), cos(DECIMAL a)	返回cosine(a的单位是弧度)十进制版本添加在Hive 0.13.0。
DOUBLE	acos(DOUBLE a), acos(DECIMAL a)	返回arccosine(如果-1<=a<=1)，否则返回NULL。十进制版本添加在Hive 0.13.0。
DOUBLE	tan(DOUBLE a), tan(DECIMAL a)	返回tan (a的单位是弧度)十进制版本添加在Hive 0.13.0。
DOUBLE	atan(DOUBLE a), atan(DECIMAL a)	Returns the 反正切函数 of `a`. Decimal version added in Hive 0.13.0.
DOUBLE	degrees(DOUBLE a), degrees(DECIMAL a)	将a的值从弧度转换为角度。Decimal version added in Hive 0.13.0.
DOUBLE	radians(DOUBLE a), radians(DOUBLE a)	将a的值从度转换为弧度。 Decimal version added in Hive 0.13.0.
INT or DOUBLE	positive(INT a), positive(DOUBLE a)	Returns `a`.
INT or DOUBLE	negative(INT a), negative(DOUBLE a)	Returns `-a`.
DOUBLE or INT	sign(DOUBLE a), sign(DECIMAL a)	返回符号a为’1.0’(如果a是正数)或’-1.0’(如果a是负数)，否则’0.0’。十进制版本返回INT而不是DOUBLE。 Decimal version added in Hive 0.13.0.
DOUBLE	e()	Returns the value of `e`.
DOUBLE	pi()	Returns the value of `pi`.
BIGINT	factorial(INT a)	返回a的阶乘(从Hive 1.2.0开始)。有效的a是[0..20]。
DOUBLE	cbrt(DOUBLE a)	返回一个 double 的立方根(从Hive 1.2.0开始)。
INT BIGINT	shiftleft(TINYINT\|SMALLINT\|INT a, INT b) shiftleft(BIGINT a, INT b)	按位左移(从Hive 1.2.0开始)。将a b的位置向左移动。返回int for tinyint, smallint and int a.返回bigint for bigint a。
INT BIGINT	shiftright(TINYINT\|SMALLINT\|INT a, INT b) shiftright(BIGINT a, INT b)	按位右移(从Hive 1.2.0开始)。将a b的位置向右移动。返回int for tinyint, smallint and int a.返回bigint for bigint a。
INT BIGINT	shiftrightunsigned(TINYINT\|SMALLINT\|INT a, INT b), shiftrightunsigned(BIGINT a, INT b)	按位无符号右移(从Hive 1.2.0开始)。将a b的位置向右移动。返回int for tinyint, smallint and int a.返回bigint for bigint a。
T	greatest(T v1, T v2, …)	返回值列表的最大值(从Hive 1.1.0开始)。修正了当一个或多个参数为空时返回空值的问题，并且放宽了严格的类型限制，与“>”操作符一致(从Hive 2.0.0开始)。
T	least(T v1, T v2, …)	返回值列表中的最小值(从Hive 1.1.0开始)。修正了当一个或多个参数为空时返回NULL，严格的类型限制放宽，与“<”操作符一致(如Hive 2.0.0)。
INT	width_bucket(NUMERIC expr, NUMERIC min_value, NUMERIC max_value, INT num_buckets)	通过将expr映射到第i个大小相等的桶中，返回0到num_buckets+1之间的整数。桶是通过将[min_value, max_value]分割成大小相等的区域来实现的。如果expr < min_value，返回1，如果expr > max_value返回num_buckets+1。See https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm (as of Hive 3.0.0)

集合函数

返回类型	名称	描述
int	size(Map<K.V>)	返回映射类型中的元素数量。
int	size(Array<T>)	返回数组类型中的元素数。
array<K>	map_keys(Map<K.V>)	返回一个无序数组，其中包含输入映射的键。
array<V>	map_values(Map<K.V>)	返回一个无序数组，其中包含输入映射的值。
boolean	array_contains(Array<T>, value)	如果数组包含值，则返回TRUE。
array<t>	sort_array(Array<T>)	根据数组元素的自然顺序对输入数组按升序排序并返回它(从版本0.9.0开始)。

类型转换函数

返回类型	名称	描述
binary	binary(string\|binary)	将参数转换为二进制。
Expected “=” to follow “type”	cast(expr as <type>)	例如，cast(‘1’作为BIGINT)将把字符串’1’转换为它的整数表示形式。如果转换不成功，则返回 null。如果强制转换(expr为布尔值)，则 Hive 为非空字符串返回 true。

日期函数

返回类型	名称	描述
string	from_unixtime(bigint unixtime[, string format])	select from_unixtime(1576217619);时间戳转时间 2019-12-12 22:13:39
bigint	unix_timestamp()	获取当前Unix时间戳(以秒为单位)。这个函数是不确定的，它的值对于查询执行的范围也不是固定的，因此会妨碍查询的适当优化——从2.0开始就不提倡使用这个函数，而支持CURRENT_TIMESTAMP常量。
bigint	unix_timestamp(string date)	将yyyy-MM-dd HH:mm:ss格式的时间字符串转换为Unix时间戳(以秒为单位)，使用默认时区和默认地区，如果转换失败，返回0:unix_timestamp(‘2009-03-20 11:30:01’) = 1237573801
bigint	unix_timestamp(string date, string pattern)	转换时间字符串与给定的模式 (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return 0 if fail: unix_timestamp(‘2009-03-20’, ‘yyyy-MM-dd’) = 1237532400.
pre 2.1.0: string 2.1.0 on: date	to_date(string timestamp)	返回时间戳字符串的日期部分(pre-Hive 2.1.0): to_date(“1970-01-01 00:00:00”) = “1970-01-01”。从Hive 2.1.0开始，返回一个日期对象。在Hive 2.1.0 (Hive -13248)之前，返回类型是一个字符串，因为创建方法时不存在日期类型。
int	year(string date)	返回日期或时间戳字符串的年份部分:year(“1970-01-01 00:00:00”) = 1970, year(“1970-01: 01”) = 1970。
int	quarter(date/timestamp/string)	为范围为1到4的日期、时间戳或字符串返回一年中的季度(从Hive 1.3.0开始)。示例:quarter(‘2015-04-08’) = 2。
int	month(string date)	返回日期或时间戳字符串的月部分:month(“1970-11-01 00:00:00”) = 11, month(“1970-11-01”) = 11。
int	day(string date) dayofmonth(date)	返回日期或时间戳字符串的日期部分:day(“1970-11-01 00:00:00”) = 1, day(“1970-11-01”) = 1。
int	hour(string date)	返回时间戳的时间:hour(‘2009-07-30 12:58:59’) = 12, hour(‘12:58:59’) = 12。
int	minute(string date)	返回时间戳的分钟数。
int	second(string date)	返回时间戳的秒数。
int	weekofyear(string date)	返回时间戳字符串的周数:weekofyear(“1970-11-01 00:00:00”) = 44, weekofyear(“1970-11-01”) = 44。
int	extract(field FROM source)	从源检索字段，例如天或小时(从Hive 2.2.0开始)。源文件必须是一个日期、时间戳、间隔或一个可以转换为日期或时间戳的字符串。支持的字段包括:天，星期，小时，分钟，月，季度，第二，周和年。 Examples: select extract(month from “2016-10-20”) results in 10. select extract(hour from “2016-10-20 05:06:07”) results in 5. select extract(dayofweek from “2016-10-20 05:06:07”) results in 5. select extract(month from interval ‘1-3’ year to month) results in 3. select extract(minute from interval ‘3 12:20:30’ day to second) results in 20.
int	datediff(string enddate, string startdate)	返回从起始日期到结束日期的天数:datediff(‘2009-03-01’， ‘2009-02-27’) = 2。
pre 2.1.0: string 2.1.0 on: date	date_add(date/timestamp/string startdate, tinyint/smallint/int days)	添加日期日期:date_add(‘2008-12-31’， 1) = ‘2009-01-01’。在Hive 2.1.0 (Hive -13248)之前，返回类型是一个字符串，因为创建方法时不存在日期类型。
pre 2.1.0: string 2.1.0 on: date	date_sub(date/timestamp/string startdate, tinyint/smallint/int days)	减去开始日期的天数:date_sub(‘2008-12-31’， 1) = ‘2008-12-30’。在Hive 2.1.0 (Hive -13248)之前，返回类型是一个字符串，因为创建方法时不存在日期类型。
timestamp	from_utc_timestamp({ any primitive type} ts, string timezone)	将UTC中的时间戳转换为给定的时区(从Hive 0.8.0开始)。 timestamp是一个基本类型，包括timestamp/date、tinyint/smallint/int/bigint、float/double和decimal。分数值被认为是秒。整数值被认为是毫秒。例如，from_utc_timestamp(2592000.0，’PST’)、from_utc_timestamp(2592000000，’PST’)和from_utc_timestamp(timestamp ‘1970-01-30 16:00:00’，’PST’)都返回时间戳1970-01-30 08:00:00。
timestamp	to_utc_timestamp({ any primitive type} ts, string timezone)	将给定时区中的时间戳转换为UTC(从Hive 0.8.0开始)。 timestamp是一个基本类型，包括timestamp/date、tinyint/smallint/int/bigint、float/double和decimal。分数值被认为是秒。整数值被认为是毫秒。例如，to_utc_timestamp(2592000.0，’PST’)、to_utc_timestamp(2592000000，’PST’)和to_utc_timestamp(timestamp ‘1970-01-30 16:00:00’，’PST’)都返回时间戳1970-01-31 00:00:00。
date	current_date	返回查询求值开始时的当前日期(从Hive 1.2.0开始)。同一查询中的所有current_date调用都返回相同的值。
timestamp	current_timestamp	返回查询求值开始时的当前时间戳(从Hive 1.2.0开始)。同一查询中的所有current_timestamp调用都返回相同的值。
string	add_months(string start_date, int num_months, output_date_format)	返回start_date之后的num_months日期(从Hive 1.1.0开始)。start_date是一个字符串、日期或时间戳。num_months是一个整数。如果start_date是一个月的最后一天，或者结果月份的天数少于start_date的day组件，那么结果就是结果月份的最后一天。否则，结果具有与start_date相同的day组件。默认的输出格式是’yyyy-MM-dd’。在Hive 4.0.0之前，日期的时间部分被忽略。从Hive 4.0.0开始，add_months支持一个可选参数output_date_format，它接受一个represe字符串 For example : add_months(‘2009-08-31’, 1) returns ‘2009-09-30’. add_months(‘2017-12-31 14:15:16’, 2, ‘YYYY-MM-dd HH:mm:ss’) returns ‘2018-02-28 14:15:16’.
string	last_day(string date)	返回日期所属的月份的最后一天(从Hive 1.1.0开始)。日期是一个格式为’yyyy-MM-dd HH:mm:ss’或’yyyy-MM-dd’的字符串。日期的时间部分被忽略
string	next_day(string start_date, string day_of_week)	返回比start_date晚的第一个日期，并将其命名为day_of_week(从Hive 1.2.0开始)。start_date是一个字符串/日期/时间戳。day_of_week是指2个字母、3个字母或一周中某一天的全称(如Mo、tue、FRIDAY)。start_date的时间部分被忽略。 Example: next_day(‘2015-01-14’, ‘TU’) = 2015-01-20.
string	trunc(string date, string format)	返回截断到格式指定的单元的日期(从Hive 1.2.0开始)。支持的格式:MONTH/MON/MM, YEAR/YYYY/YY。 Example: trunc(‘2015-03-17’, ‘MM’) = 2015-03-01.
double	months_between(date1, date2)	返回日期date1和date2之间的月数(从Hive 1.2.0开始)。如果date1晚于date2，则结果为正数。如果date1早于date2，则结果为负。如果date1和date2是一个月的相同天数或两个月的最后几天，那么结果总是一个整数。否则，UDF根据31天的月份计算结果的小数部分，并考虑date1和date2时间组件的差异。date1和date2类型可以是“yyyy-MM-dd”或“yyyy-MM-dd HH:mm:ss”格式的日期、时间戳或字符串。结果四舍五入到小数点后八位。 Example: months_between(‘1997-02-28 10:30:00’, ‘1996-10-30’) = 3.94959677
string	date_format(date/timestamp/string ts, string fmt)	将日期/时间戳/字符串转换为日期格式fmt指定的字符串值(从Hive 1.2.0开始)。 Supported formats are Java SimpleDateFormat formats – https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. The second argument fmt should be constant. Example: date_format(‘2015-04-08’, ‘y’) = ‘2015’. date_format can be used to implement other UDFs, e.g.: dayname(date) is date_format(date, ‘EEEE’) dayofyear(date) is date_format(date, ‘D’)

条件函数

返回类型	名称	描述
T	if(boolean testCondition, T valueTrue, T valueFalseOrNull)	当testCondition为真时返回valueTrue，否则返回valueFalseOrNull。
boolean	isnull( a )	如果a为空，则返回true，否则返回false。
boolean	isnotnull ( a )	如果a不为空，则返回true，否则返回false。
T	nvl(T value, T default_value)	返回默认值，如果值为空，否则返回值(如HIve 0.11)。
T	COALESCE(T v1, T v2, …)	返回第一个不为空的v，如果所有v都为空，则返回NULL。
T	CASE a WHEN b THEN c [WHEN d THEN e] [ELSE f] END	When a = b, returns c; when a = d, returns e; else returns f.
T	CASE WHEN a THEN b [WHEN c THEN d] [ELSE e] END	When a = true, returns b; when c = true, returns d; else returns e.
T	nullif( a, b )	如果a=b返回NULL;否则返回一个(从Hive 2.3.0开始)。简写为: CASE WHEN a = b then NULL else a
void	assert_true(boolean condition)	如果’condition’不为真，则抛出异常，否则返回null(从Hive 0.8.0开始)。例如，选择assert_true(2<1)。

字符串函数

返回类型

名称

描述

int

ascii(string str)

返回str的第一个字符的数值。

string

base64(binary bin)

将参数从二进制转换为基本64字符串(如Hive 0.12.0)。

int

characterlength(string str)

返回包含在str中的UTF-8字符数(从Hive 2.2.0开始)。函数char_length是这个函数的简写。

string

chr(bigint|double A)

返回二进制等价于A的ASCII字符(从Hive 1.3.0和2.1.0开始)。如果A大于256，则结果等于chr(A % 256)。例如 select chr(88); returns “X”.

string

concat(string|binary A, string|binary B…)

按顺序将作为参数传入的字符串或字节连接起来，从而返回字符串或字节。例如，concat(‘foo’， ‘bar’)的结果是’foobar’。注意，这个函数可以接受任意数量的输入字符串。

array<struct<string,double>>

context_ngrams(array<array<string>>, array<string>, int K, int pf)

给定一串“上下文”，从一组标记化的句子中返回前k个上下文相关的n -gram. See StatisticsAndDataMining for more information.

string

concat_ws(string SEP, string A, string B…)

类似上面的concat()，但是使用自定义分隔符SEP。

string

concat_ws(string SEP, array<string>)

就像上面的concat_ws()一样，但是使用一个字符串数组。(as of Hive 0.9.0)

string

decode(binary bin, string charset)

使用提供的字符集(“US-ASCII”、“ISO-8859-1”、“UTF-8”、“UTF-16BE”、“UTF-16LE”、“UTF-16”中的一个)将第一个参数解码为字符串。如果其中一个参数为空，那么结果也将为空。 (As of Hive 0.12.0.)

string

elt(N int,str1 string,str2 string,str3 string,…)

返回索引号处的字符串。例如elt(2，’hello’，’world’)返回’world’。如果N小于1或大于参数的数目，则返回NULL。

(see https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_elt)

binary

encode(string src, string charset)

使用提供的字符集(“US-ASCII”、“ISO-8859-1”、“UTF-8”、“UTF-16BE”、“UTF-16LE”、“UTF-16”中的一个)将第一个参数编码为二进制。如果其中一个参数为空，那么结果也将为空。 (As of Hive 0.12.0.)

int

field(val T,val1 T,val2 T,val3 T,…)

返回val1、val2、val3、…列表或0，如果没有找到。例如，field(‘world’，’say’，’hello’，’world’)返回3。
支持所有基本类型，使用str.equals(x)比较参数。如果val为空，则返回值为0。

(see https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_field)

int

find_in_set(string str, string strList)

返回strList中第一次出现的str，其中strList是逗号分隔的字符串。如果其中一个参数为空，则返回null。如果第一个参数包含任何逗号，则返回0。

For example, find_in_set(‘ab’, ‘abc,b,ab,c,def’) returns 3.

string

format_number(number x, int d)

将数字X格式化为’#，###，###。四舍五入到D位小数，并以字符串形式返回结果。如果D为0，则结果没有小数点或小数部分。 (As of Hive 0.10.0; bug with float types fixed in Hive 0.14.0, decimal type support added in Hive 0.14.0)

string

get_json_object(string json_string, string path)

根据指定的json路径从json字符串中提取json对象，并返回提取的json对象的json字符串。如果输入的json字符串无效，则返回null。注意:json路径只能包含字符[0-9a-z]，即，没有大写或特殊字符。同样，键不能以数字开头。这是由于限制的hive 列名称

boolean

in_file(string str, string filename)

如果字符串str在文件名中显示为整行，则返回true。

int

instr(string str, string substr)

返回第一个出现在str中的substr的位置。如果其中一个参数为null，则返回null;如果在str中找不到substr，则返回0。注意，这不是基于0的。str中的第一个字符具有索引1。

int

length(string A)

返回字符串的长度。

int

locate(string substr, string str[, int pos])

返回第一个出现的substr的位置。

string

lower(string A) lcase(string A)

返回将所有B字符转换为小写字符后得到的字符串。例如，较低的(‘fOoBaR’)结果是’fOoBaR’。

string

lpad(string str, int len, string pad)

返回str，用pad左填充到len长度。如果str比len长，则返回值缩短为len字符。如果是空的pad字符串，则返回值为null。

string

ltrim(string A)

返回从a的开始(左手边)修剪空格所产生的字符串。例如，ltrim(‘ foobar ‘)结果’foobar ‘。

array<struct<string,double>>

ngrams(array<array<string>>, int N, int K, int pf)

从一组标记化的句子返回前k个n -gram，例如句子()UDAF返回的那些句子。

See StatisticsAndDataMining for more information.

int

octet_length(string str)

返回用UTF-8编码保存字符串str所需的字节数(从Hive 2.2.0开始)。注意，octet_length(str)可以大于character_length(str)。

string

parse_url(string urlString, string partToExtract [, string keyToExtract])

从URL返回指定的部分。partToExtract的有效值包括主机、路径、查询、引用、协议、权限、文件和USERINFO。例如,parse_url (http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1'， ‘HOST’)返回’facebook.com’。还可以通过将键作为第三个参数来提取查询中特定键的值，例如，parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1'， ‘查询’，’k1’)返回’v1’。

string

printf(String format, Obj… args)

返回按do printf格式格式字符串格式化的输入(如Hive 0.9.0)。

string

quote(String text)

返回引用字符串(包括任何单引号HIVE-4.0.0的转义字符)

NULL	NULL
DONT	‘DONT’
DON’T	‘DON\’T’

string

regexp_extract(string subject, string pattern, int index)

返回使用模式提取的字符串。例如，regexp_extract(‘foothebar’， ‘foo(.*?)(bar)’， 2)返回’bar。请注意，在使用预定义的字符类时需要注意:使用’\s’作为第二个参数将匹配字母s;’\s’是匹配空格等所必需的。“index”参数是Java regex Matcher group()方法索引。

See docs/api/java/util/regex/Matcher.html for more information on the ‘index’ or Java regex group() method.

string

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)

返回INITIAL_STRING中与模式中定义的java正则表达式语法匹配的所有子字符串替换后得到的字符串。例如，regexp_replace(“foobar”，“oo|ar”，“”)返回“fb”。请注意，在使用预定义的字符类时需要注意:使用’\s’作为第二个参数将匹配字母s;’\s’是匹配空格等所必需的。

string

repeat(string str, int n)

重复str n次。

string

replace(string A, string OLD, string NEW)

返回字符串A，所有不重叠的旧的替换为新的(如Hive 1.3.0和2.1.0)。例如:选择替换(“ababab”，“abab”，“Z”);返回“Zab”。

string

reverse(string A)

返回颠倒的字符串。

string

rpad(string str, int len, string pad)

返回str，用pad右填充到len的长度。如果str比len长，则返回值缩短为len字符。如果是空的pad字符串，则返回值为null。

string

rtrim(string A)

返回从A.的末端(右手边)修剪出的字符串。例如，rtrim(‘ foobar’)结果是’ foobar’。

array<array<string>>

sentences(string str, string lang, string locale)

将一串自然语言文本标记为单词和句子，其中每个句子在适当的句子边界处断开，并作为单词数组返回。“lang”和“locale”是可选参数。例如， sentences(‘Hello there! How are you?’) returns ( (“Hello”, “there”), (“How”, “are”, “you”) ).

string

space(int n)

返回一个包含n个空格的字符串。

array

split(string str, string pat)

围绕pat拆分str (pat是一个正则表达式)

map<string,string>

str_to_map(text[, delimiter1, delimiter2])

使用两个分隔符将文本分割为键-值对。Delimiter1将文本分隔成K-V对，Delimiter2将每个K-V对分隔开。默认的分隔符是 ‘,’ for delimiter1 and ‘:’ for delimiter2.

string

substr(string|binary A, int start) substring(string|binary A, int start)

返回从起始位置开始到字符串A结束的字节数组的子字符串或片段。例如，substr(‘foobar’， 4)结果’bar’(see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]).

string

substr(string|binary A, int start, int len) substring(string|binary A, int start, int len)

返回长度为len的起始位置的字节数组的子字符串或切片。例如，substr(‘foobar’， 4,1)的结果是’b’(see [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substr]).

string

substring_index(string A, string delim, int count)

从字符串A中返回分隔符delim的计数前的子字符串(从Hive 1.3.0开始)。如果count为正数，则返回最后分隔符左边的所有内容(从左边计数)。如果count为负数，则返回最后分隔符右边的所有内容(从右边计数)。Substring_index在搜索delim时执行区分大小写的匹配。

Example: substring_index(‘www.apache.org’, ‘.’, 2) = ‘www.apache’.

string

通过将from字符串中出现的字符替换为to字符串中相应的字符来转换输入字符串。这类似于PostgreSQL中的translate函数。如果此UDF的任何参数为空，则结果也为空。(Available as of Hive 0.10.0, for string types)

Char/varchar support added as of Hive 0.14.0.

string

trim(string A)

返回从a的两端修剪空格后得到的字符串。例如，修剪(‘ foobar’)结果’foobar’

binary

unbase64(string str)

将参数从基本64字符串转换为二进制。 (As of Hive 0.12.0.)

string

upper(string A) ucase(string A)

返回将A的所有字符转换为大写所得到的字符串。For example, upper(‘fOoBaR’) results in ‘FOOBAR’.

string

initcap(string A)

返回字符串，每个单词的首字母为大写，所有其他字母为小写。单词由空格分隔。 (As of Hive 1.1.0.)

int

levenshtein(string A, string B)

返回两个字符串之间的Levenshtein距离 (as of Hive 1.2.0). For example, levenshtein(‘kitten’, ‘sitting’) results in 3.

string

soundex(string A)

返回字符串的soundex代码 (as of Hive 1.2.0). For example, soundex(‘Miller’) results in M460.

数据屏蔽函数

返回类型	名称	描述
string	mask(string str[, string upper[, string lower[, string number]]])	返回一个隐藏的str版本，显示最后n个未隐藏的字符(as of Hive 2.1.0). 默认情况下，大写字母转换为“X”，小写字母转换为“X”，数字转换为“n”。例如，mask(“abcd-EFGH-8765-4321”)的结果是xx- xx- nnnn-nnnn。您可以通过提供额外的参数来覆盖掩码中使用的字符:第二个参数控制大写字母的掩码字符，第三个参数控制小写字母，第四个参数控制数字。For example, mask(“abcd-EFGH-8765-4321”, “U”, “l”, “#”) results in llll-UUUU-####-####.
string	mask_first_n(string str[, int n])	返回前n个值被屏蔽的str的屏蔽版本 (as of Hive 2.1.0). 大写字母转换为“X”，小写字母转换为“X”，数字转换为“n”。 For example, mask_first_n(“1234-5678-8765-4321”, 4) results in nnnn-5678-8765-4321.
string	mask_last_n(string str[, int n])	返回一个隐藏的str版本，最后n个值被隐藏 (as of Hive 2.1.0). 大写字母转换为“X”，小写字母转换为“X”，数字转换为“n”。例如，mask_last_n(“1234-5678-8765-4321”，4)的结果是1234-5678-8765-nnnn。
string	mask_show_first_n(string str[, int n])	返回一个隐藏的str版本，显示未隐藏的前n个字符 (as of Hive 2.1.0). 大写字母转换为“X”，小写字母转换为“X”，数字转换为“n”。例如，mask_show_first_n(“1234-5678-8765-4321”，4)的结果是1234-nnnn-nnnn-nnnn。
string	mask_show_last_n(string str[, int n])	返回一个隐藏的str版本，显示最后n个未隐藏的字符 (as of Hive 2.1.0). 大写字母转换为“X”，小写字母转换为“X”，数字转换为“n”。例如，mask_show_last_n(“1234-5678-8765-4321”，4)的结果是nnnn-nnnn-nnnn-4321。
string	mask_hash(string\|char\|varchar str)	返回一个基于str的散列值 (as of Hive 2.1.0). T哈希是一致的，可用于跨表连接带掩码的值。这个函数对于非字符串类型返回null。

混合函数

返回类型	名称	描述
varies	java_method(class, method[, arg1[, arg2..]])	类似于反射 (As of Hive 0.9.0.)
varies	reflect(class, method[, arg1[, arg2..]])	使用反射通过匹配参数签名来调用Java方法。(As of Hive 0.7.0.) See Reflect (Generic) UDF for examples.
int	hash(a1[, a2…])	返回参数的哈希值。 (As of Hive 0.4.)
string	current_user()	从配置的验证器管理器返回当前用户名(as of Hive 1.2.0). 可以与连接时提供的用户相同，但是对于某些身份验证管理器(例如HadoopDefaultAuthenticator)，情况可能不同。
string	logged_in_user()	从会话状态返回当前用户名 (as of Hive 2.2.0). 这是连接到Hive时提供的用户名。
string	current_database()	返回当前数据库名 (as of Hive 0.13.0).
string	md5(string/binary)	计算字符串或二进制文件的MD5 128位校验和 (as of Hive 1.3.0). 该值以32个十六进制数字的字符串形式返回，如果参数为NULL，则返回NULL。示例: md5(‘ABC’) = ‘902fbdd2b1df0c4f70b4a5d23525e932’.
string	sha1(string/binary) sha(string/binary)	计算字符串或二进制文件的SHA-1摘要，并以十六进制字符串的形式返回该值 (as of Hive 1.3.0). Example: sha1(‘ABC’) = ‘3c01bdbb26f358bab27f267924aa2c9a03fcfdb8’.
bigint	crc32(string/binary)	为字符串或二进制参数计算循环冗余校验值并返回bigint值 (as of Hive 1.3.0). Example: crc32(‘ABC’) = 2743272264.
string	sha2(string/binary, int)	计算SHA-2哈希函数族 (SHA-224, SHA-256, SHA-384, and SHA-512) (as of Hive 1.3.0). 第一个参数是要散列的字符串或二进制。第二个参数表示所需的结果位长，它的值必须是224、256、384、512或0(相当于256)。从Java 8开始支持SHA-224。如果参数为NULL，或者散列长度不是允许的值之一，则返回值为NULL。 Example: sha2(‘ABC’, 256) = ‘b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78’.
binary	aes_encrypt(input string/binary, key string/binary)	使用AES加密输入 (as of Hive 1.3.0). 可以使用128、192或256位的密钥长度。如果安装了Java Cryptography Extension (JCE)无限权限策略文件，则可以使用192和256位密钥。如果参数为NULL或键长度不属于允许的值之一，则返回值为NULL。 Example: base64(aes_encrypt(‘ABC’, ‘1234567890123456’)) = ‘y6Ss+zCYObpCbgfWfyNWTw==’.
binary	aes_decrypt(input binary, key string/binary)	使用AES对输入进行解密 (as of Hive 1.3.0). 可以使用128、192或256位的密钥长度。如果安装了Java Cryptography Extension (JCE)无限权限策略文件，则可以使用192和256位密钥。如果参数为NULL或键长度不属于允许的值之一，则返回值为NULL。Example: aes_decrypt(unbase64(‘y6Ss+zCYObpCbgfWfyNWTw==’), ‘1234567890123456’) = ‘ABC’.
string	version()	Returns the Hive version (as of Hive 2.1.0). The string contains 2 fields, the first being a build number and the second being a build hash. Example: “select version();” might return “2.1.0.2.5.0.0-1245 r027527b9c5ce1a3d7d0b6d2e6de2378fb0c39232”. Actual results will depend on your build.
bigint	surrogate_key([write_id_bits, task_id_bits])	在向表中输入数据时，自动生成行编号。只能用作acid或仅插入表的默认值。

get_json_object

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3pwZl85NDA4MTA2NTM4NDI_size_16_color_FFFFFF_t_70

Built-in Aggregate Functions (UDAF)

返回类型	名称	描述
BIGINT	count(), count(expr), count(DISTINCT expr[, expr…])	count() - Returns the total number of retrieved rows, including rows containing NULL values. count(expr) - Returns the number of rows for which the supplied expression is non-NULL. count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with hive.optimize.distinct.rewrite.
DOUBLE	sum(col), sum(DISTINCT col)	Returns the sum of the elements in the group or the sum of the distinct values of the column in the group.
DOUBLE	avg(col), avg(DISTINCT col)	Returns the average of the elements in the group or the average of the distinct values of the column in the group.
DOUBLE	min(col)	Returns the minimum of the column in the group.
DOUBLE	max(col)	Returns the maximum value of the column in the group.
DOUBLE	variance(col), var_pop(col)	返回组中数字列的方差。
DOUBLE	var_samp(col)	返回组中数字列的无偏样本方差。
DOUBLE	stddev_pop(col)	返回组中数字列的标准偏差。
DOUBLE	stddev_samp(col)	返回组中数字列的无偏样本标准差。
DOUBLE	covar_pop(col1, col2)	返回组中一对数字列的总体协方差。
DOUBLE	covar_samp(col1, col2)	返回组中一对数字列的样本协方差。
DOUBLE	corr(col1, col2)	返回组中一对数字列的皮尔逊相关系数。
DOUBLE	percentile(BIGINT col, p)	返回组中一个列的精确的第p个百分位数(不使用浮点类型)。p必须在0和1之间。注意:真正的百分位数只能对整数值进行计算。如果您的输入是非整数，则使用PERCENTILE_APPROX。
array<double>	percentile(BIGINT col, array(p1 [, p2]…))	返回准确的百分比p1, p2，…组中的列(不与浮点类型一起工作)。必须在0和1之间。注意:真正的百分位数只能对整数值进行计算。如果您的输入是非整数，则使用PERCENTILE_APPROX。
DOUBLE	percentile_approx(DOUBLE col, p [, B])	返回组中数字列(包括浮点类型)的第一个百分位数的近似值。B参数以内存为代价来控制近似精度。值越大，近似值越大，默认值为10,000。当col中不同值的数目小于B时，这就给出了一个精确的百分位数值。
array<double>	percentile_approx(DOUBLE col, array(p1 [, p2]…) [, B])	与上面相同，但是接受并返回一个百分比值数组，而不是单个百分比值。
double	regr_avgx(independent, dependent)	Equivalent to avg(dependent). As of Hive 2.2.0.
double	regr_avgy(independent, dependent)	Equivalent to avg(independent). As of Hive 2.2.0.
double	regr_count(independent, dependent)	返回用于拟合线性回归线的非空对的数目。As of Hive 2.2.0.
double	regr_intercept(independent, dependent)	返回线性回归线的y轴截距，即 dependent = a independent + b. As of Hive 2.2.0.
double	regr_r2(independent, dependent)	返回回归的确定系数. As of Hive 2.2.0.
double	regr_slope(independent, dependent)	返回线性回归线的斜率，即 equation dependent = a independent + b. As of Hive 2.2.0.
double	regr_sxx(independent, dependent)	相当于 regr_count(independent, dependent) var_pop(dependent). As of Hive 2.2.0.
double	regr_sxy(independent, dependent)	相当于 regr_count(independent, dependent) covar_pop(independent, dependent). As of Hive 2.2.0.
double	regr_syy(independent, dependent)	相当于 regr_count(independent, dependent) * var_pop(independent). As of Hive 2.2.0.
array<struct { `‘x’,’y’`}>	histogram_numeric(col, b)	使用b个非均匀间隔的容器计算组中一个数字列的直方图。输出是一个大小为b的双值(x,y)坐标数组，代表 bin centers and heights
array	collect_set(col)	返回一组消除了重复元素的对象。
array	collect_list(col)	返回具有重复项的对象列表。 (As of Hive 0.13.0.)
INTEGER	ntile(INTEGER x)	将一个有序的分区划分为x组，称为bucket，并为分区中的每一行分配一个bucket编号。这样可以方便地计算三位数、四分位数、十分位数、百分位数和其他常见的汇总统计数据。(As of Hive 0.11.0.)

Built-in Table-Generating Functions (UDTF)

返回类型	名称	描述
T	explode(ARRAY<T> a)	将数组转为为多行。返回一个具有单个列(col)的行集，数组中的每个元素对应一行。
Tkey,Tvalue	explode(MAP<Tkey,Tvalue> m)	将映射转为为多行。返回具有两列(键、值)的行集，即输入映射中的每个键-值对对应一行. (As of Hive 0.8.0.).
int,T	posexplode(ARRAY<T> a)	将一个数组转为为多个行，附加一个int类型的位置列(初始数组中项目的位置，从0开始)。
T1,…,Tn	inline(ARRAY<STRUCT<f1:T1,…,fn:Tn>> a)	将结构数组转为为多行。返回一个有N列的行集(N =结构中顶层元素的数量)，每个结构从数组中取出一行。 (As of Hive 0.10.)
T1,…,Tn/r	stack(int r,T1 V1,…,Tn/r Vn)	分解n个V1，…，Vn变成r行。每一行都有n/r列。r必须是常数。

string1,…,stringn	json_tuple(string jsonStr,string k1,…,string kn)	获取JSON字符串和一组n个键，并返回一个包含n个值的元组。这是get_json_object UDF的一个更有效的版本，因为它可以通过一个调用获得多个键。
string 1,…,stringn	parse_url_tuple(string urlStr,string p1,…,string pn)	获取URL字符串和一组n个URL部分，并返回一个包含n个值的元组。这类似于parse_url() UDF，但可以同时从URL提取多个部分。有效的部分名称有: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY>.

使用样例

explode (array)

select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;

explode (map)

select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

A	10
B	20
C	30

posexplode (array)

select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

0	A
1	B
2	C

inline (array of structs)

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

A	10	2015-01-01
B	20	2016-02-02

stack (values)

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;