Hive自定义UDF、UDAF、UDTF开发,临时注册以及注册到Metastore中

Hive admin 2年前 (2019-01-08) 244次浏览 0个评论 扫描二维码

User-defined function用户自定义方法
UDF:一进一出
UDAF:多进一出
UDTF:一进多出

1、自定义一个最简单的UDF

pom文件,使用的cdh5.12.0的包

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.post.bigdata</groupId>
  <artifactId>spark</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <scala.version>2.11.8</scala.version>
    <spark.version>2.3.0</spark.version>
    <hadoop.version>2.6.0-cdh5.12.0</hadoop.version>
    <scalikejdbc.version>2.5.2</scalikejdbc.version>
    <mysql.version>5.1.38</mysql.version>
    <hive-exec.version>1.1.0-cdh5.12.0</hive-exec.version>
  </properties>

  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>

    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <!--Scala dependency-->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <!--Spark core dependency-->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <!--hadoop dependency-->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>${hive-exec.version}</version>
    </dependency>
    <!--Test dependency-->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
    </dependency>
    <!--scalikejdbc dependency-->
    <dependency>
      <groupId>org.scalikejdbc</groupId>
      <artifactId>scalikejdbc_2.11</artifactId>
      <version>${scalikejdbc.version}</version>
    </dependency>
    <dependency>
      <groupId>org.scalikejdbc</groupId>
      <artifactId>scalikejdbc-config_2.11</artifactId>
      <version>${scalikejdbc.version}</version>
    </dependency>
    <!--mysql dependency-->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>${mysql.version}</version>
    </dependency>

    <!--日志-->
    <dependency>
      <groupId>ch.qos.logback</groupId>
      <artifactId>logback-classic</artifactId>
      <version>1.2.3</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-repl_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

  </dependencies>

  <build>
    <sourceDirectory>src/main/java</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

最简单的demo,打包
注意点:由于我的项目原始是scala的 <sourceDirectory>src/main/scala</sourceDirectory> 项目入口是scala,但是写UDF是新建了一个java的入口,结果怎么测试都不通过,最后将入口修改后才成功<sourceDirectory>src/main/java</sourceDirectory>

/** UDF注释必须以evaluate作为方法名称
* <li>{@code public int evaluate();}</li>
 * <li>{@code public int evaluate(int a);}</li>
 * <li>{@code public double evaluate(int a, double b);}</li>
 * <li>{@code public String evaluate(String a, int b, Text c);}</li>
 * <li>{@code public Text evaluate(String a);}</li>
 * <li>{@code public String evaluate(List<Integer> a);} (Note that Hive Arrays are represented as
 */
package com.post.bigdata;

import org.apache.hadoop.hive.ql.exec.UDF;

public class BigdataUDF extends UDF {

    public String evaluate(String name){
        return "Bigdata:" + name;
    }

    public static void main(String[] args) {
        BigdataUDF udf = new BigdataUDF();
        String e = udf.evaluate("蜗牛");
        System.out.println(e);
    }
}

1.1注册到hive

1)临时注册,注册后仅当前窗口可用

add jar /Users/wn/ide/spark/target/spark-1.3-SNAPSHOT.jar;
CREATE temporary function bigdata_udf as 'com.post.bigdata.BigdataUDF';

2)注册到metastore,永久注册

从Hive0.13之后,可以被注册到metastore中

jar包上传hdfs
hadoop000:package wn$ hdfs dfs -put ~/ide/spark/target/spark-1.3-SNAPSHOT.jar /package/
19/03/06 13:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop000:package wn$ hdfs dfs -ls /package
19/03/06 13:36:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 wn supergroup       3308 2019-03-06 13:35 /package/spark-1.3-SNAPSHOT.jar

在hive中注册

hive (bigdata)> create function bigdata_udf as "com.post.bigdata.BigdataUDF" using jar "hdfs:///package/spark-1.3-SNAPSHOT.jar";
converting to local hdfs:///package/spark-1.3-SNAPSHOT.jar
Added [/private/var/folders/62/lwnjjc294qv12csy0sm3g93m0000gp/T/b8aa0797-b4ec-43a7-8d1c-eee83c5da3a0_resources/spark-1.3-SNAPSHOT.jar] to class path
Added resources: [hdfs:///package/spark-1.3-SNAPSHOT.jar]
OK
Time taken: 0.442 seconds
###function使用
hive (bigdata)> select bigdata_udf(name) as new_name from test2;
OK
new_name
bigdata:zhangsan
bigdata:zhangsan
bigdata:zhangsan
Time taken: 0.146 seconds, Fetched: 3 row(s)

在metastore中查看

mysql> select * from funcs;
+---------+-----------------------------+-------------+-------+-------------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME                  | CREATE_TIME | DB_ID | FUNC_NAME   | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-----------------------------+-------------+-------+-------------+-----------+------------+------------+
|       1 | com.post.bigdata.BigdataUDF |  1551850837 |     6 | bigdata_udf |         1 | NULL       | USER       |
+---------+-----------------------------+-------------+-------+-------------+-----------+------------+------------+
1 row in set (0.00 sec)

codeobj , 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Hive自定义UDF、UDAF、UDTF开发,临时注册以及注册到Metastore中
喜欢 (0)
[a37free@163.com]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址