Tuesday 18 June 2013

Pig UDF basic tutorial

In this Blog i we will discus how to write UDF for PIG script . UDF is stand for (User Define Function) which is like normal function we use in our program .
You can write UDF in java and some other language. There are few steps you have to keep in mind for write your UDF :

Steps are as Follows :

1) Download Following Jar File :

                                              commons-lang3-3.1.jar
                                      commons-logging-1.1.3.jar
                                      Pig.jar

all the above file you can search on google.

2)  Create a new class for your UDF in the eclipse or any type you want. In this tutorial i am using eclipse. After creating class put the following code in the class :


code :

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

  public class UPPER extends EvalFunc<String>
  {
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            String str = (String)input.get(0);
           return str.toUpperCase();
        }catch(Exception e){
            throw new IOException("Caught exception processing input row ", e);
        }
    }
 }         

in the above code EvalFunc<String> type of UDF we want to create there are four kind of UDF we can create in bottom of this tutorial i provide a video link for that more detail.  In UDF every input is come in the from of tuple and EvalFunc is the UDF which return the same value which it get as a input.
The exec  it is function which EvalFunc call as default.

3) Create the jar of your class and save it to any path you want in my case i am using Ubuntu so i am save it on my home folder. In eclipse you can create jar by right click on project -> properties -> Export .

4) Then now time to add jar in the pig. in my previous blog mention the step that how to start pig on the Ubuntu terminal you can follow that blog for pig basic.
after enter in pig grunt shell we have to write following script for add our Jar.

grunt>register udf1.jar;

here  register is the keyword which use to add jar in pig script in my case i am having it on the home folder so i am not providing the path of the jar but you can edit it for your path like :

register "your_path_of_jar";

5) Then have to load the file for get data for our working. We can call file in pig script by following pig script:

  grunt> A = LOAD 'stu1.txt' as (name:chararray);

in above code stu1.txt is my file on my hadoop HDFS .Pig use Hadoop HDFS as default you can edit it with your path of your file.

6) Next thing is we have to add our UDF with our pig script. We can add our UDF in pig script by following code :

grunt> B = FOREACH A GENERATE UPPER(name);

in the above code UPPER is the name of class which we have in our jar our UDF will be the same name of our class and we pass name ass the argument.

7) Then for see the data we use simple DUMP for that like:

grunt> DUMP B;

we have data in our stu1.txt is:

simmant
mohan
rohan

then output is :

(SIMMANT)
(MOHAN)
(ROHAN)

Reference for Today's Blog is:

for UDF Detail 
1)  https://www.youtube.com/watch?v=OoFNQDpcWR0

for PIG Detail
2)  http://wiki.apache.org/pig/PigBuiltins 

for UDF manual
3) http://wiki.apache.org/pig/UDFManual

4 comments:

  1. I am having knowledge on hadoop.Recently I visited www.hadooponlinetutor.com.Those guys are offering the videos at $20 only.Can you please suggest me whether to go for that or not?

    ReplyDelete
    Replies
    1. Hi Harika
      There is lots of online tutors available for Big data and they are well qualified as well. So as for the answer of your question best way to find any tutor is good or bed just ask them for demo classes for some session.
      In the demo session you will judge easily about trainer and quality of course both. So i recommend you to ask for the training session before start the full time course.

      Delete
    2. Hello harika, dont purchase online videos. Waste it is. I highly recommend practice practice practice. practice make perfect, video watching is not make perfect. those videos also waste, but follow few blogs to learn Hadoop like this.
      I highly recommend, u do one thing install hotonwork and practice in your system. already hotonwork provides many materials for practice.
      If you want material just mail me ill forward few books, thats enough ok all the best. mail me at venu@bigdataanalyst with "Requesting for hadoop material".

      Delete
  2. Hi ,
    How can I help you in that ?

    ReplyDelete