Hive Built-in Functions
Hive Built-in Functions are pre-defined operations that we can use directly in our Hive queries to:
- Perform mathematical calculations
- Manipulate strings (text)
- Work with dates and times
- Check conditions using logical operators
Mathematical Functions -
Hive provides several mathematical functions to help you perform calculations directly in your query.
Function Name (Signature) | Description |
---|---|
round(double a) | returns the rounded BIGINT value of the double |
floor(double a) | returns the maximum BIGINT value that is equal or less than the double |
ceil(double a) | returns the minimum BIGINT value that is equal or greater than the double |
rand(), rand(int seed) | returns a random number that changes from row to row |
Aggregate built-in functions -
The below are the list of aggregate functions:
Aggregation Function Name (Signature) | Description |
---|---|
count(*), count(expr) | Returns the total number of retrieved rows, including rows containing NULL values; |
sum(col), sum(DISTINCT col) | returns the sum of the elements in the group or the sum of the distinct values of the column |
avg(col), avg(DISTINCT col) | returns the average of the elements in the group or the average of the distinct values of the column |
min(col) | returns the minimum value of the column |
max(col) | returns the maximum value of the column |
String Functions -
In Hive, String Functions help you work with text. You can change case, extract parts of text, or find out how long a string is.
Function Name (Signature) | Description |
---|---|
concat(string A, string B,...) | returns the string resulting from concatenating B after A |
substr(string A, int start) | returns the substring of A starting from start position till the end of string A |
substr(string A, int start, int length) | returns the substring of A starting from start position with the given length |
upper(string A) | returns the string resulting from converting all characters of A to upper case |
ucase(string A) | Same as upper |
lower(string A) | returns the string resulting from converting all characters of B to lower case |
lcase(string A) | Same as lower |
trim(string A) | returns the string resulting from trimming spaces from both ends of A |
ltrim(string A) | returns the string resulting from trimming spaces from the beginning(left hand side) of A |
rtrim(string A) | returns the string resulting from trimming spaces from the end(right hand side) of A |
regexp_replace(string A, string B, string C) | returns the string resulting from replacing all substrings in B that match the Java regular expression syntax with C |
Date Functions -
Date and time handling is very common in data processing. Hive provides easy-to-use functions to work with dates, extract parts of dates, or calculate differences.
Function Name (Signature) | Description |
---|---|
to_date(string timestamp) | Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" |
year(string date) | Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 |
month(string date) | Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11 |
day(string date) | Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 |
get_json_object(string json_string, string path) | "Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid." |
Complex Functions -
Function Name (Signature) | Description |
---|---|
size(Map | returns the number of elements in the map type |
size(Array | returns the number of elements in the array type |
cast(<expr> as <type>) | "converts the results of the expression expr to <type>. A null is returned if the conversion does not succeed." |
from_unixtime(int unixtime) | convert the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" |
Examples -
Scenario1: Round a number
SELECT ROUND(15.678);
Result: 16
Scenario2: Convert text to uppercase
SELECT UPPER('hadoop');
Result: HADOOP
Scenario3: Get first 4 letters of a string
SELECT SUBSTR('DataEngineering', 1, 4);
Result: Data
Scenario4: Get today’s date
SELECT CURRENT_DATE;
Result: 2025-07-04 (for example)
Scenario5: Find the difference between two dates
SELECT DATEDIFF('2025-12-31', '2025-01-01');
Result: 364