As We have a Table with 2 columns lets suppose in SQL
( we doesnt have any created_date,Updated_date,Flag columns in SQL Source Table and not to modify source Table )
id is primary key
i pull the data using sqoop into hive as a Main table its ok
But if the source data is Updated like below
My Issue is that without effecting the Main table data in hive i need to pull the Updated or Inserted or Deleted data using Sqoop and also simultaneously update in the Hive Main Table without effecting the Existing once….
i have tried tried to use
–incremental …. so on properties but no result….
Result Should be:
output main table is having all the 10 records… it should be 5 records….
on day1 i have 1millions of records
on day 2 i have 1million + current day + updated lets say 2 million
on day2 i have to pull only updated and newly inserted data rather than whole data.
can Anyone Help me how to combine day1 hive data with day2 updated data…
In case if Anyone has Any other solution like any Alternative please suggest me Clearly Becoz i m new to hadoop….