Assets table is having numeric sequential ID and a one number out of (1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say 10 rows only.
hmmm.. after some more surfing, i came to know that we have to manually denormalize a relational DB. there are no preset rules for denormalization.
today I was doing a denormalization.. can you please confirm if rdbms to hbase change i have done is correct or not:
These tables are from stock trading application:
1. I have two tables…
AssetID | assetName
second table is:
InstrumentID | assetID | Symbol | name | created | modifiedAt
While converting them into Hbase two questions were there in my mind:
1. Will there be two tables or one? If we have a web GUI and have two screens, one for assets management and one for instuments management. In instrument management, there would be a combo having assets. An asset can be assigned to multiple instruments.
So if user will first populate all assets thru assets screen. So there should be one master table for Assets. because it is not dependent on any other entity. Am I right here?
And now user will populate instruments thru instrument management screen. Here another table would come into picture. “INSTRUMENTS”. this table would be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID properties, all assets information should be duplicated here too. Now question here, if I delete or update some asset in future, and there are corresponding records in INSTRUMENTS table. what if asset is deletion or updation goes successful for assets table but fails for instruments table? How we can maintain consistancy and integrity here?
HBASE table conversion here:
row key – AssetID
Columns: CF1 – AssetName
row key– instrumentID(a sequential numeric value)
Please confirm, two tables would be required or one only? If one only, how assets population and after that instruments population would be possible? If two, how we can maintain integrity and consistancy during assets deletion\updation?
Please confirm if my approach is correct?
Yes, as read everywhere, in hbase, we must think about purpose of data in our application before creating tables. So purpose here is, to populate\delete\update assets and instruments thru web UI and then run a job which extract instruments by asset name. Also, assets by instrument name.
After above conversion, I got stuck on one more point. “SELF-JOIN”
Lets say, I have a table InstrumentsStock having 10 columns in it, out of it one primary key stockID and other normal column BaseCurrencyID, both points to an instrumentID.
StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID is ID of an instrument whose assetID is 3.
stockID | AssetID | BaseCurrencyID | 10 more columns here
where StockID = InstrumentID of Instrument with AssetID 1
and BaseCurrencyID = InstrumentID of Instrument with AssetID 3
StockID is primary key
E.g. I have below data in tables:
InstrumentID | assetsID | Symbol | Name | Comment
22 | 1 | WMT | Ins1| createdByRashmi
23 | 1 | HOG | Ins2| createdByRashmi
40 | 3 | HPQ | Ins3| createdByRashmi
41 | 3 | KO | Ins4| createdByRashmi
StockID | assetsID | BasecurrencyID | x | y | z
22 | 1 | 41 | stock1 | y1 | z1
23 | 1 | 40 | stock2 | y2 | z2
24 | 1 | 41 | stock3 | y3 | z3
25 | 1 | 40 | stock4 | y4 | z4
Relationship between Instruments and InstrumentsStock is one to one.
Question: retrieve all Stocks where baseCurrencyID is 40.
Change I made: Combined all three tables, Instruments, Assets, InstrumentsStock and created one table “INstruments_Asset_Stocks” with three families (AssetInfo, InstrumentInfo, StockINfo)
rowKey : | | |
InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo
| assetID | assetName | x, y, z, BaseCurrencyID | symbol, name
22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1
23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2
24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3
25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4
40 | 3 | assetTwo | | HPQ, Ins5
41 | 3 | assetTwo | | KO, Ins6
On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all Stock relation detail from this table. On basis of AssetInfo family, assets related stocks, on basis of IntrumentInfo, related stock we can retrieve…
Is my understanding correct, or I am so far very wrong.
Here SELF-JOIn was used in relational schema… how do we achieve same JOIn in hbase, as above or some other way?
What is nested entities and how do we create nested entities using “Hbase Shell” and using hbase java API. There is API to create family, column… is there API to create Nested Entities?
There is one more table:
Same StockID and localCurrencyID are populated using InstrumentID only. Data of this table is like:
ID | stockID | LocalCurrencyID | location
L1| 22 | 51 | India
L1| 22 | 52 | Japan
L1| 22 | 40 | London
L1| 23 | 57 | USA
L1| 23 | 41 | Africa
Here again self-join is used and many-many relationship…
How do I integrate this table with above INstruments_Asset_Stocks table? I want to get All stocks details for given locaCurrency.
What I did:
rowKey – instrumentID_ListingID
Cf1: AssetINnfo (assetID, assetName)
Cf2: InstrumentInfo(symbol, name, comment)
cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID, nested_enties_for_LocalCurrency )
Nested entities for local currency would be having localCurrencyID as key and location_listingID as value.
Is this change correct to get all Stocks for given localCurrency. As read somewhere, I have to use map\reduce to get this data from nested entities? is it?
How do we implement nested entities? How(from which API or shell command) client scans, searched these entities? Can I search it like I do for column qualifier BaseCurrencyID?
Your inputs would really help me in understanding this Hbase design.