Hive / HCatalog Forum

Insert Into Table

  • #18309

    Does the any version of Hortonworks Hive support “Insert into” operation?

    I am using the Sandbox to do some research for our needs cant get to append data into an existing table. It overwrites the data even when I don’t use the keyword “overwrite”.

    Any help is much appreciated.

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #18310
    tedr
    Moderator

    Hi Swapna,

    The version of Hive we include is supposed to support ‘insert into.’ Can you give us an example of the Hive query you are trying?

    Thanks,
    Ted.

    #18311

    insert into table Orders
    SELECT
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){1}’, 1) SiteID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){2}’, 1) ReportID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){3}’, 1) OrderID,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){4}’, 1) Procedures,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){7}’, 1) ProcedureDesc,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){8}’, 1) Anatomy,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){9}’, 1) Modality,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){12}’, 1) Accession,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){13}’, 1) OrderStatus,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){21}’, 1) CreateDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){23}’, 1) LastCorrectedDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){24}’, 1) LastSignDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){26}’, 1) LastModifiedDate,
    regexp_extract(col_value, ‘^(?:([^,]*)\,?){20}’, 1) PatientAge
    from OrderData

    The above is the query I ran and it overwrote the table with the new “orderData” table..

    #18315
    tedr
    Moderator

    Hi Swapna,

    Just to make sure I am understanding you, the command overwrote the ‘OrderData’ table or did it overwrite the ‘Orders’ Table? and there was data in the table that was wiped out by the overwriting, correct?

    Thanks,
    Ted.

    #18385

    Hi Ted,

    The command overwrote the Orders table and it wiped out the old data and put the new data from the OrderData table.

    #18477
    Robert
    Participant

    Hi Swapna,
    Will give this a try and see if I can reproduce.

    Regards,
    Robert

    #18580
    Robert
    Participant

    Hi Swapna,
    I did quick test with the sandbox and I was able to use the INSERT INTO statement and was able to successfully append data to an existing table. Here is my simple statement:
    INSERT INTO TABLE test2 SELECT * FROM test3

    Basically test2 initially had the following data:

    horton yi lee product

    After the above statement was executed, it displayed the following:

    horton yi lee product
    google john doe dev

    Hence, original data was retained and new data was appended.

    In your case, maybe you can try to simplify and isolate it. Maybe you can remove all the regex syntax initially and see what results you get without them in the query.

    Hope that helps.
    Regards,
    Robert

    #18840

    Hi Robert,

    I have tried the simple insert into as you suggested and I still can’t get it to work. i.e., it is overwriting the existing data. Just to reiterate, I am using the current release of Sandbox (available on the website) to test this.

    THanks
    Swapna

    #18890
    Yi Zhang
    Moderator

    Hi Swapna,

    Can you run the ‘explain’ to see the hive plan for your simple sample insert?

    For example,
    hive> explain ‘insert into table test1 select * from test2′;

    Thanks,
    Yi.

    #18892

    Below is the explain output: I can’t interpret this but if you can help me understand this, it would be great.

    ABSTRACT SYNTAX TREE:
    (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test2))) (TOK_INSERT (TOK_INSERT_INTO (TOK_TAB (TOK_TABNAME test1))) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))

    STAGE DEPENDENCIES:
    Stage-1 is a root stage
    Stage-5 depends on stages: Stage-1 , consists of Stage-4, Stage-3
    Stage-4
    Stage-0 depends on stages: Stage-4, Stage-3
    Stage-2 depends on stages: Stage-0
    Stage-3

    STAGE PLANS:
    Stage: Stage-1
    Map Reduce
    Alias -> Map Operator Tree:
    test2
    TableScan
    alias: test2
    Select Operator
    expressions:
    expr: id
    type: int
    expr: name
    type: string
    expr: role
    type: string
    outputColumnNames: _col0, _col1, _col2
    File Output Operator
    compressed: false
    GlobalTableId: 1
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    Stage: Stage-5
    Conditional Operator

    Stage: Stage-4
    Move Operator
    files:
    hdfs directory: true
    destination: hdfs://sandbox:8020/tmp/hive-beeswax-sandbox/hive_2013-03-18_02-28-43_318_775037256059233952/-ext-10000

    Stage: Stage-0
    Move Operator
    tables:
    replace: false
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    Stage: Stage-2
    Stats-Aggr Operator

    Stage: Stage-3
    Map Reduce
    Alias -> Map Operator Tree:
    hdfs://sandbox:8020/tmp/hive-beeswax-sandbox/hive_2013-03-18_02-28-43_318_775037256059233952/-ext-10002
    File Output Operator
    compressed: false
    GlobalTableId: 0
    table:
    input format: org.apache.hadoop.mapred.TextInputFormat
    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    name: default.test1

    #18899
    Yi Zhang
    Moderator

    Hi Swapna,

    The explain plan looks good, where ‘replace’ is set to false.

    In hive version earlier than 0.10.0 there were some bugs
    https://issues.apache.org/jira/browse/HIVE-3062
    https://issues.apache.org/jira/browse/HIVE-3465
    However, the sandbox on our website uses hive 0.10.0.21 where those bugs are fixed.

    Is it possible that the hive on your sandbox is somehow using older version? Can you check into /usr/lib/hive/lib and see what versions are the jar files? for example, hive-common-0.10.0.21.jar, etc

    Also can you check the hive java class path
    hive> set system:java.class.path

    Do you see hive jar file versions to be different than 0.10.0.21?

    Can you post the details of your tables and the query you run verbatim?

    hive> describe extended test1;
    hive>select * from test1;
    hive>describe extended test2;
    hive> select * from test2;

    Thanks,
    Yi.

    #18903

    Hello Yi,

    I have checked the jar files and they are all 0.10.0.21 version. I tried to check the class path and it doesn’t return anything.

    I tried hive> set system:java.class.path and it returns nothing. Can you let me know how can I check the class path?

    Thanks,
    Swapna

    #18904

    Describe extended test1:

    col_name data_type comment
    col_name data_type comment
    0 id int
    1 name string
    2 role string
    3
    4 Detailed Table Information Table(tableName:test1, dbName:default, owner:sandbox, createTime:1363517637, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:role, type:string, comment:null)], location:hdfs://sandbox:8020/apps/hive/warehouse/test1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{numPartitions=0, numFiles=1, transient_lastDdlTime=1363590247, totalSize=14, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

    #18905

    select * from Test1:

    id name role
    0 1 Name1 Role1

    #18906

    Describe extended test2:
    col_name data_type comment
    col_name data_type comment
    0 id int
    1 name string
    2 role string
    3
    4 Detailed Table Information Table(tableName:test2, dbName:default, owner:sandbox, createTime:1363587451, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:role, type:string, comment:null)], location:hdfs://sandbox:8020/apps/hive/warehouse/test2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{numPartitions=0, numFiles=1, transient_lastDdlTime=1363589787, totalSize=14, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

    #18907

    select * from test2;

    id name role
    0 2 Name2 Role2

    #18908

    Hello YI,

    I just ran the insert into now and it works. Do you know what might have helped it? Because I have been testing this on a local copy of the sandbox.. and we need this to be working on the sandbox on our server for our prototyping.

    My guess is that running “hive> set system:java.class.path” command set the java class path which was probably not set properly before? Can you please shed some light so that we can fix our other environment?

    Thanks
    Swapna

    #18924
    Yi Zhang
    Moderator

    Hi Swapna,

    It is strange that system:java.class.path returns empty, at least when hive is started, it picks up hadoop class path and other paths defined by hive’s configuration files too. Are other system: environment variables set when you run hive>set; That may give a clue.

    How was hive started?

    Thanks,
    Yi.

    #71892
    tushar pacharne
    Participant

    Hi,
    I am using Hortonworks V2.2.4-2. I have created below table and want to insert values in same table. But by using insert into.. command I am unable to insert value in the table. Please tell me if there is any other way to insert value into the table.
    Regards,
    Tushar

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.