Home Forums Hive / HCatalog Importing Data From Sqoop to Hive

Tagged: , , , ,

This topic contains 1 reply, has 2 voices, and was last updated by  Tom Hanlon 2 months, 2 weeks ago.

  • Creator
    Topic
  • #55838

    Vijay Kumar
    Participant

    As We have a Table with 2 columns lets suppose in SQL
    ( we doesnt have any created_date,Updated_date,Flag columns in SQL Source Table and not to modify source Table )

    id is primary key
    id name
    1 AAAAA
    2 BBBBB
    3 CCCCC
    4 ADAEAB
    5 GGAGAG
    i pull the data using sqoop into hive as a Main table its ok
    But if the source data is Updated like below
    id name
    1 ACACA
    2 BASBA
    3 CCHAH
    4 AASDA1
    5 GGAGAG

    Problem :
    —————–

    My Issue is that without effecting the Main table data in hive i need to pull the Updated or Inserted or Deleted data using Sqoop and also simultaneously update in the Hive Main Table without effecting the Existing once….
    i have tried tried to use
    –incremental …. so on properties but no result….

    Result Should be:
    ——————————–

    output main table is having all the 10 records… it should be 5 records….

    Requirement:
    ——————————
    on day1 i have 1millions of records
    on day 2 i have 1million + current day + updated lets say 2 million
    on day2 i have to pull only updated and newly inserted data rather than whole data.
    and also
    can Anyone Help me how to combine day1 hive data with day2 updated data…

    In case if Anyone has Any other solution like any Alternative please suggest me Clearly Becoz i m new to hadoop….

Viewing 1 replies (of 1 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #55957

    Tom Hanlon
    Participant

    Synchronizing a source table in SQL and a hive table using sqoop will be challenging.

    Can you simply load the complete current state of the table once a day ? If so that is the simplest solution.

    A daily sqoop import into a new or empty hive table with all the records ?

    As far as I recall the increment option only tracks largest auto_increment primary key and imports keys larger than that. Basically select * from table where primary key > max inserted last primary key.

    So sqoop increment, (unless it has changed in ways I was not aware) is not going to help.

    Without knowing more about your system it is hard to advise.

    I can try to monitor this forum and if you provide more information perhaps I can advise.

    If you can clarify this question.
    “can Anyone Help me how to combine day1 hive data with day2 updated data‚Ķ”
    perhaps I can help.

    Are you saying on day1 you pull all of the data, and on day2 you pull all of the data , and you want a resultset of rows that have changed ?

    In general I think the best start would be to pull all the data all at once, and repeat once a day. Each day’s import would have the up to date records. Why do you need old versions ? The database is not keeping old versions.

    Collapse
Viewing 1 replies (of 1 total)