Home Forums HBase RDBMS to HBASE schema design

This topic contains 2 replies, has 2 voices, and was last updated by  Sasha J 2 years, 1 month ago.

  • Creator
    Topic
  • #6494

    Hi,

    I want to change my RDBMS to HBASE schema, to be used with Hadoop platform.

    I have changed two RDBMS tables into HBASE tables. I have ignored constraints, indexes and foreign key relationship. Because I dont know how to convert these relationships in Hbase schema.

    Please confirm if the change I have made is correct?

    Relational Schema of tables are:
    ==========================

    TABLE : ASSTES
    ——————————————————————

    CREATE TABLE [dbo].[Assets](
    [AssetId] [int] NOT NULL,
    [AssetName] [varchar](50) NOT NULL,
    CONSTRAINT [PK_Assets] PRIMARY KEY CLUSTERED
    (
    [AssetId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    ——————————————————————————————————————————-

    ——————————–
    TABLE: Instruments
    ———————————-
    CREATE TABLE [dbo].[Instruments](
    [InstrumentId] [int] IDENTITY(1,1) NOT NULL,
    [AssetId] [int] NOT NULL,
    [Symbol] [varchar](50) NOT NULL,
    [Name] [varchar](250) NOT NULL,
    [Created] [datetime] NOT NULL,
    [Modified] [datetime] NULL,
    [Comments] [varchar](250) NULL,
    CONSTRAINT [PK_Instruments_InstrumentId] PRIMARY KEY CLUSTERED
    (
    [InstrumentId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
    CONSTRAINT [PK_Instruments_InstrumentIdAssetId] UNIQUE NONCLUSTERED
    (
    [InstrumentId] ASC,
    [AssetId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING ON
    GO

    ALTER TABLE [dbo].[Instruments] WITH CHECK ADD CONSTRAINT [FK_Instruments_Assets] FOREIGN KEY([AssetId])
    REFERENCES [dbo].[Assets] ([AssetId])
    ON UPDATE CASCADE
    GO

    ALTER TABLE [dbo].[Instruments] CHECK CONSTRAINT [FK_Instruments_Assets]
    GO

    ALTER TABLE [dbo].[Instruments] ADD CONSTRAINT [DF_Instruments_Created_1] DEFAULT (getdate()) FOR [Created]
    GO

    ====================================

    HBASE CONVERSION OF ASSTES AND INSTRUMENTS tables:
    ====================================================================

    Assets table
    ===============
    RowKey – AssetID
    ColumnFamilies (AssetName ) —- ColumnName – (Name)

    Instruments table
    ==============
    RowKey - InstrumentID
    ColumnFamilies – (Content) — Columns: Symbol, Name, Created, Modified, Comments
    (Assets) — Columns: AssetID

    Please confirm if given conversion is proper?

    Also, how do i convert constraints and indexes and foreign key relationship?

    Thanks in advance

    Regards,
    rashmi

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #6717

    Sasha J
    Moderator

    Hello Rashmi,

    It sounds like you may want to familiarize yourself more with the basic design principals and concepts of a NoSQL Database.

    If you have not already done so, you may want to start with this very basic primer:

    http://hbase.apache.org/book/schema.html

    Rather then going into a detailed response to your questions, lets start with:

    “why are you moving this particular data set into HBase, what is the requirement, goal, advantage?”

    thanks in advance,

    Sasha

    Collapse
    #6495

    Assets table is having numeric sequential ID and a one number out of (1,2,3,4,5,6,7, 8, 9, 10) for AssetName. This is a master table with say 10 rows only.

    hmmm.. after some more surfing, i came to know that we have to manually denormalize a relational DB. there are no preset rules for denormalization.

    today I was doing a denormalization.. can you please confirm if rdbms to hbase change i have done is correct or not:
    These tables are from stock trading application:

    1. I have two tables…
    Assets
    =========
    AssetID | assetName

    second table is:

    INSTRUMENTS
    ============
    InstrumentID | assetID | Symbol | name | created | modifiedAt

    While converting them into Hbase two questions were there in my mind:
    1. Will there be two tables or one? If we have a web GUI and have two screens, one for assets management and one for instuments management. In instrument management, there would be a combo having assets. An asset can be assigned to multiple instruments.

    So if user will first populate all assets thru assets screen. So there should be one master table for Assets. because it is not dependent on any other entity. Am I right here?

    And now user will populate instruments thru instrument management screen. Here another table would come into picture. “INSTRUMENTS”. this table would be denormalized, as hbase doesnt support JOIN. Also, to maintain ACID properties, all assets information should be duplicated here too. Now question here, if I delete or update some asset in future, and there are corresponding records in INSTRUMENTS table. what if asset is deletion or updation goes successful for assets table but fails for instruments table? How we can maintain consistancy and integrity here?

    HBASE table conversion here:

    ASSETS
    ==================
    row key – AssetID
    ================
    Columns: CF1 – AssetName
    ================================================

    INSTRUMENTS
    =====================================
    row key– instrumentID(a sequential numeric value)
    ====================================
    InstrumentInfo(family) –
    Symbol
    name
    AssetsInfo(family2) –
    AssetID
    AssetName
    =====================================

    Please confirm, two tables would be required or one only? If one only, how assets population and after that instruments population would be possible? If two, how we can maintain integrity and consistancy during assets deletion\updation?

    Please confirm if my approach is correct?

    Yes, as read everywhere, in hbase, we must think about purpose of data in our application before creating tables. So purpose here is, to populate\delete\update assets and instruments thru web UI and then run a job which extract instruments by asset name. Also, assets by instrument name.

    ======================================

    After above conversion, I got stuck on one more point. “SELF-JOIN”

    Lets say, I have a table InstrumentsStock having 10 columns in it, out of it one primary key stockID and other normal column BaseCurrencyID, both points to an instrumentID.

    StockID is an ID of an instrument whose assetID is 1. and BaseCurrencyID is ID of an instrument whose assetID is 3.

    InstrumentsStock
    =======================
    stockID | AssetID | BaseCurrencyID | 10 more columns here
    ========================================

    where StockID = InstrumentID of Instrument with AssetID 1
    and BaseCurrencyID = InstrumentID of Instrument with AssetID 3

    StockID is primary key

    E.g. I have below data in tables:
    Instruments:
    ======================================
    InstrumentID | assetsID | Symbol | Name | Comment
    ======================================
    22 | 1 | WMT | Ins1| createdByRashmi
    23 | 1 | HOG | Ins2| createdByRashmi
    ….
    ….
    ….
    40 | 3 | HPQ | Ins3| createdByRashmi
    41 | 3 | KO | Ins4| createdByRashmi

    InstrumentsStock
    ==========================================
    StockID | assetsID | BasecurrencyID | x | y | z
    ======================================
    22 | 1 | 41 | stock1 | y1 | z1
    23 | 1 | 40 | stock2 | y2 | z2
    24 | 1 | 41 | stock3 | y3 | z3
    25 | 1 | 40 | stock4 | y4 | z4

    Relationship between Instruments and InstrumentsStock is one to one.

    Question: retrieve all Stocks where baseCurrencyID is 40.

    Change I made: Combined all three tables, Instruments, Assets, InstrumentsStock and created one table “INstruments_Asset_Stocks” with three families (AssetInfo, InstrumentInfo, StockINfo)

    INstruments_Asset_Stocks
    ==============================================================
    rowKey : | | |
    InstrumentID | CF1: AssetInfo | CF2: StockInfo |cf3: InstrumentInfo
    ===============================================================
    | assetID | assetName | x, y, z, BaseCurrencyID | symbol, name
    ===============================================================
    22 | 1 | assetOne |stock1 | y1 | z1 | 41 | WMT, Ins1
    23 | 1 | assetOne |stock2 | y2 | z2 | 40 | HOG, Ins2
    24 | 1 | assetOne |stock3 | y3 | z3 | 41 | WMT1, Ins3
    25 | 1 | assetOne |stock4 | y4 | z4 | 40 | WMT2, Ins4
    40 | 3 | assetTwo | | HPQ, Ins5
    41 | 3 | assetTwo | | KO, Ins6

    On basis of StockInfo and qualifier BaseCurrencyID, we can retrieve all Stock relation detail from this table. On basis of AssetInfo family, assets related stocks, on basis of IntrumentInfo, related stock we can retrieve…

    Is my understanding correct, or I am so far very wrong.

    Here SELF-JOIn was used in relational schema… how do we achieve same JOIn in hbase, as above or some other way?

    What is nested entities and how do we create nested entities using “Hbase Shell” and using hbase java API. There is API to create family, column… is there API to create Nested Entities?

    There is one more table:

    Same StockID and localCurrencyID are populated using InstrumentID only. Data of this table is like:
    Listing
    ==============================
    ID | stockID | LocalCurrencyID | location
    ==============================
    L1| 22 | 51 | India
    L1| 22 | 52 | Japan
    L1| 22 | 40 | London
    L1| 23 | 57 | USA
    L1| 23 | 41 | Africa

    Here again self-join is used and many-many relationship…

    How do I integrate this table with above INstruments_Asset_Stocks table? I want to get All stocks details for given locaCurrency.

    What I did:
    INstruments_Asset_Stocks_Listing
    ===============================================
    rowKey – instrumentID_ListingID
    ===============================================
    Cf1: AssetINnfo (assetID, assetName)
    Cf2: InstrumentInfo(symbol, name, comment)
    cf3: StockInfo (X1, Y1, Z1, BaseCurrencyID, nested_enties_for_LocalCurrency )

    Nested entities for local currency would be having localCurrencyID as key and location_listingID as value.

    Is this change correct to get all Stocks for given localCurrency. As read somewhere, I have to use map\reduce to get this data from nested entities? is it?

    How do we implement nested entities? How(from which API or shell command) client scans, searched these entities? Can I search it like I do for column qualifier BaseCurrencyID?

    Your inputs would really help me in understanding this Hbase design.

    Regards,
    Rashmi

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)