Synchronizing a source table in SQL and a hive table using sqoop will be challenging.
Can you simply load the complete current state of the table once a day ? If so that is the simplest solution.
A daily sqoop import into a new or empty hive table with all the records ?
As far as I recall the increment option only tracks largest auto_increment primary key and imports keys larger than that. Basically select * from table where primary key > max inserted last primary key.
So sqoop increment, (unless it has changed in ways I was not aware) is not going to help.
Without knowing more about your system it is hard to advise.
I can try to monitor this forum and if you provide more information perhaps I can advise.
If you can clarify this question.
“can Anyone Help me how to combine day1 hive data with day2 updated data…”
perhaps I can help.
Are you saying on day1 you pull all of the data, and on day2 you pull all of the data , and you want a resultset of rows that have changed ?
In general I think the best start would be to pull all the data all at once, and repeat once a day. Each day’s import would have the up to date records. Why do you need old versions ? The database is not keeping old versions.