Home Forums Hive / HCatalog Insert Into Table

This topic contains 17 replies, has 4 voices, and was last updated by  Yi Zhang 1 year, 4 months ago.

  • Creator
    Topic
  • #18309

    Does the any version of Hortonworks Hive support “Insert into” operation?

    I am using the Sandbox to do some research for our needs cant get to append data into an existing table. It overwrites the data even when I don’t use the keyword “overwrite”.

    Any help is much appreciated.

Viewing 17 replies - 1 through 17 (of 17 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #18924

    Yi Zhang
    Moderator

    Hi Swapna,

    It is strange that system:java.class.path returns empty, at least when hive is started, it picks up hadoop class path and other paths defined by hive’s configuration files too. Are other system: environment variables set when you run hive>set; That may give a clue.

    How was hive started?

    Thanks,
    Yi.

    Collapse
    #18908

    Hello YI,

    I just ran the insert into now and it works. Do you know what might have helped it? Because I have been testing this on a local copy of the sandbox.. and we need this to be working on the sandbox on our server for our prototyping.

    My guess is that running “hive> set system:java.class.path” command set the java class path which was probably not set properly before? Can you please shed some light so that we can fix our other environment?

    Thanks
    Swapna

    Collapse
    #18907

    select * from test2;

    id name role
    0 2 Name2 Role2

    Collapse
    #18906

    Describe extended test2:
    col_name data_type comment
    col_name data_type comment
    0 id int
    1 name string
    2 role string
    3
    4 Detailed Table Information Table(tableName:test2, dbName:default, owner:sandbox, createTime:1363587451, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:role, type:string, comment:null)], location:hdfs://sandbox:8020/apps/hive/warehouse/test2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{numPartitions=0, numFiles=1, transient_lastDdlTime=1363589787, totalSize=14, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

    Collapse
    #18905

    select * from Test1:

    id name role
    0 1 Name1 Role1

    Collapse
    #18904

    Describe extended test1:

    col_name data_type comment
    col_name data_type comment
    0 id int
    1 name string
    2 role string
    3
    4 Detailed Table Information Table(tableName:test1, dbName:default, owner:sandbox, createTime:1363517637, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:role, type:string, comment:null)], location:hdfs://sandbox:8020/apps/hive/warehouse/test1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{numPartitions=0, numFiles=1, transient_lastDdlTime=1363590247, totalSize=14, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

    Collapse
    #18903

    Hello Yi,

    I have checked the jar files and they are all 0.10.0.21 version. I tried to check the class path and it doesn’t return anything.

    I tried hive> set system:java.class.path and it returns nothing. Can you let me know how can I check the class path?

    Thanks,
    Swapna

    Collapse
    #18899

    Yi Zhang
    Moderator

    Hi Swapna,

    The explain plan looks good, where ‘replace’ is set to false.

    In hive version earlier than 0.10.0 there were some bugs

    https://issues.apache.org/jira/browse/HIVE-3062

    https://issues.apache.org/jira/browse/HIVE-3465

    However, the sandbox on our website uses hive 0.10.0.21 where those bugs are fixed.

    Is it possible that the hive on your sandbox is somehow using older version? Can you check into /usr/lib/hive/lib and see what versions are the jar files? for example, hive-common-0.10.0.21.jar, etc

    Also can you check the hive java class path
    hive> set system:java.class.path

    Do you see hive jar file versions to be different than 0.10.0.21?

    Can you post the details of your tables and the query you run verbatim?

    hive> describe extended test1;
    hive>select * from test1;
    hive>describe extended test2;
    hive> select * from test2;

    Thanks,
    Yi.

    Collapse
    #18892

    Below is the explain output: I can’t interpret this but if you can help me understand this, it would be great.

    ABSTRACT SYNTAX TREE:
    (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test2))) (TOK_INSERT (TOK_INSERT_INTO (TOK_TAB (TOK_TABNAME test1))) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))

    STAGE DEPENDENCIES:
    Stage-1 is a root stage
    Stage-5 depends on stages: Stage-1 , consists of Stage-4, Stage-3
    Stage-4
    Stage-0 depends on stages: Stage-4, Stage-3
    Stage-2 depends on stages: Stage-0
    Stage-3

    STAGE PLANS:
    Stage: Stage-1
    Map Reduce
    Alias -> Map Operator Tree:
    test2
    TableScan
    alias: test2
    Select Operator
    expressions:
    expr: id
    type: int
    expr: name
    type: string
    expr: role
    type: string
    outputColumnNames: _col0, _col1, _col2
    File Output Operator
    compressed: false
    GlobalTableId: 1
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    Stage: Stage-5
    Conditional Operator

    Stage: Stage-4
    Move Operator
    files:
    hdfs directory: true
    destination: hdfs://sandbox:8020/tmp/hive-beeswax-sandbox/hive_2013-03-18_02-28-43_318_775037256059233952/-ext-10000

    Stage: Stage-0
    Move Operator
    tables:
    replace: false
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    Stage: Stage-2
    Stats-Aggr Operator

    Stage: Stage-3
    Map Reduce
    Alias -> Map Operator Tree:
    hdfs://sandbox:8020/tmp/hive-beeswax-sandbox/hive_2013-03-18_02-28-43_318_775037256059233952/-ext-10002
    File Output Operator
    compressed: false
    GlobalTableId: 0
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    Collapse
    #18890

    Yi Zhang
    Moderator

    Hi Swapna,

    Can you run the ‘explain’ to see the hive plan for your simple sample insert?

    For example,
    hive> explain ‘insert into table test1 select * from test2′;

    Thanks,
    Yi.

    Collapse
    #18840

    Hi Robert,

    I have tried the simple insert into as you suggested and I still can’t get it to work. i.e., it is overwriting the existing data. Just to reiterate, I am using the current release of Sandbox (available on the website) to test this.

    THanks
    Swapna

    Collapse
    #18580

    Robert
    Participant

    Hi Swapna,
    I did quick test with the sandbox and I was able to use the INSERT INTO statement and was able to successfully append data to an existing table. Here is my simple statement:
    INSERT INTO TABLE test2 SELECT * FROM test3

    Basically test2 initially had the following data:

    horton yi lee product

    After the above statement was executed, it displayed the following:

    horton yi lee product
    google john doe dev

    Hence, original data was retained and new data was appended.

    In your case, maybe you can try to simplify and isolate it. Maybe you can remove all the regex syntax initially and see what results you get without them in the query.

    Hope that helps.
    Regards,
    Robert

    Collapse
    #18477

    Robert
    Participant

    Hi Swapna,
    Will give this a try and see if I can reproduce.

    Regards,
    Robert

    Collapse
    #18385

    Hi Ted,

    The command overwrote the Orders table and it wiped out the old data and put the new data from the OrderData table.

    Collapse
    #18315

    tedr
    Moderator

    Hi Swapna,

    Just to make sure I am understanding you, the command overwrote the ‘OrderData’ table or did it overwrite the ‘Orders’ Table? and there was data in the table that was wiped out by the overwriting, correct?

    Thanks,
    Ted.

    Collapse
    #18311

    insert into table Orders
    SELECT
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){1}’, 1) SiteID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){2}’, 1) ReportID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){3}’, 1) OrderID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){4}’, 1) Procedures,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){7}’, 1) ProcedureDesc,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){8}’, 1) Anatomy,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){9}’, 1) Modality,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){12}’, 1) Accession,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){13}’, 1) OrderStatus,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){21}’, 1) CreateDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){23}’, 1) LastCorrectedDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){24}’, 1) LastSignDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){26}’, 1) LastModifiedDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){20}’, 1) PatientAge
    from OrderData

    The above is the query I ran and it overwrote the table with the new “orderData” table..

    Collapse
    #18310

    tedr
    Moderator

    Hi Swapna,

    The version of Hive we include is supposed to support ‘insert into.’ Can you give us an example of the Hive query you are trying?

    Thanks,
    Ted.

    Collapse
Viewing 17 replies - 1 through 17 (of 17 total)