Home Forums Sqoop Importing Data Using Sqoop

Tagged: , , , ,

This topic contains 5 replies, has 3 voices, and was last updated by  Mahesh Balakrishnan 6 months ago.

  • Creator
    Topic
  • #55837

    Vijay Kumar
    Participant

    As We have a Table with 2 columns lets suppose in SQL
    ( we doesnt have any created_date,Updated_date,Flag columns in SQL Source Table and not to modify source Table )

    id is primary key
    id name
    1 AAAAA
    2 BBBBB
    3 CCCCC
    4 ADAEAB
    5 GGAGAG
    i pull the data using sqoop into hive as a Main table its ok
    But if the source data is Updated like below
    id name
    1 ACACA
    2 BASBA
    3 CCHAH
    4 AASDA1
    5 GGAGAG

    Problem :
    —————–

    My Issue is that without effecting the Main table data in hive i need to pull the Updated or Inserted or Deleted data using Sqoop and also simultaneously update in the Hive Main Table without effecting the Existing once….
    i have tried tried to use
    –incremental …. so on properties but no result….

    Result Should be:
    ——————————–

    output main table is having all the 10 records… it should be 5 records….

    Requirement:
    ——————————
    on day1 i have 1millions of records
    on day 2 i have 1million + current day + updated lets say 2 million
    on day2 i have to pull only updated and newly inserted data rather than whole data.
    and also
    can Anyone Help me how to combine day1 hive data with day2 updated data…

    In case if Anyone has Any other solution like any Alternative please suggest me Clearly Becoz i m new to hadoop….

    Thanks…….

Viewing 5 replies - 1 through 5 (of 5 total)

The topic ‘Importing Data Using Sqoop’ is closed to new replies.

  • Author
    Replies
  • #56012

    Hi Vijay,

    Per the information provided, the only thing that I can think of is to have a view which does a select command for the most recent changes on the actual table and you can use the sqoop to use this view as a table to load the data into HDFS.

    -Mahesh

    Collapse
    #55849

    Vijay Kumar
    Participant

    Yes itss ture but my source is comming from sql server its my mistake…..

    please tell me the best approach to use….

    Collapse
    #55848

    MC Brown
    Participant

    Hi,

    This is difficult to do with Sqoop, since it expects to either take everything, or be able to identify only the changes by identifying them from the table sources, either using a unique ID or using an update timestamp from which to perform the data movement.

    For the type of movement you are looking for, you need some form of replication that will take the changes. Since you are are using MySQL, have you looked at Tungsten Replicator (see https://mcslp.wordpress.com/2014/03/31/continuent-replication-to-hadoop-now-in-stereo/ for more info). That might suit your needs better to get the live stream of changes.

    MC

    Collapse
    #55847

    Vijay Kumar
    Participant

    Thank you for ur reply….

    As i m using telecom data which is structured and placed on SQL or MYSQL or any but the data we get is more ex: 10 million records daily…..
    so no modification done like adding a trigger on source data…..
    Source data will be modified on daily basis(Insertion,Deletion,Updation)…..
    in our scenario lets take two fields like id,name where id is primary key….. where is modified daily…
    i have to take only modified data……using Hive is a good idea or Using Hbase is a good idea or will hbase directly support these kind of situation …can hbase link up with sqoop for daily modification purpose ….. or using hivehbase integration can solve this problem……….
    i m confused….please help me

    Collapse
    #55846

    MC Brown
    Participant

    Hi,

    For incremental to work with Sqoop you must update the the table to contain an identifier so that you can pull the changes. Take a look at this article:

    http://www.ibm.com/developerworks/library/bd-sqltohadoop3/

    For more information.

    What data are you using for the source data?

    MC

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)