Home Forums Hive / HCatalog Hive and ORDER BY

This topic contains 8 replies, has 3 voices, and was last updated by  Mark Robinson 11 months, 1 week ago.

  • Creator
    Topic
  • #32056

    I would very much appreciate some help as to what is the problem with the HQL I am using in Hive in the Sandbox.

    The following two queries work and return the correct results:

    select taccode, count(*)
    from tac_code
    group by taccode
    order by taccode;

    select substring(taccode, 1, 4), count(*)
    from tac_code
    group by substring(taccode, 1, 4);

    The following two queries do not work, with error: Error occurred executing hive query: Unknown exception.

    select substring(taccode, 1, 4), count(*)
    from tac_code
    group by taccode
    order by taccode;

    select substring(taccode, 1, 4), count(*)
    from tac_code
    group by substring(taccode, 1, 4)
    order by taccode;

    select substring(taccode, 1, 4), count(*)
    from tac_code
    group by substring(taccode, 1, 4)
    order by substring(taccode, 1, 4);

    As you can see there is not much difference between the queries but I can only assume that the substring and order by combined is related to the problem.

    Can anybody please help with this? Thank you!

Viewing 8 replies - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #33055

    Hi Yi,

    That does work – but if I try and group by the subtaccode (which is the query I really want to run) it fails again with the unknown exception. But thank you for the help!

    I have got around the issue by using a subquery for the group by and then use order by in the main query.

    Collapse
    #32989

    Yi
    Member

    Hi Mark,

    Will this work for you:
    select substring(taccode, 1, 4) as subtaccode, count(*)
    from tac_code
    group by taccode
    order by subtaccode;

    Thanks,
    Yi

    Collapse
    #32807

    Thanks again for your help, Carter.

    I will use the subquery approach to solve the problem.

    Any advice you can give me on how to turn error messages on in the Sandbox for Hive so I get a sensible error rather than just the ‘Unknown exception’ would be much appreciated – I get the same error message for any problem with the query and it can take a long time to work out what is causing the problem (for example, forgetting to name the subquery gives the same error).

    Collapse
    #32762

    Carter Shanklin
    Participant

    So turns out my code was not correct. What I forgot is that Hive will group before applying the UDF. In a lot of SQL systems you can do it either way.

    So this works:
    select substring(taccode, 1, 4) as tsub, count(*)
    from tac_code
    group by taccode
    order by tsub;

    But may not be the result you’re looking for if you really want to group on the 4 character string which I’m guessing you do.

    To get the other sort of grouping you’ll need to use a nested select or a view like you ended up doing.

    select tsub, count(*)
    from
    ( select substring(taccode, 1, 4) as tsub from tac_code ) sub
    group by tsub
    order by tsub;

    This should give you the same results as your view.

    Oddly enough I got semanticerrors for the problems you reported but not “unknown error”.

    Collapse
    #32698

    Thank you Carter,

    The data is a CSV file – sample rows below (I am happy to share the entire dataset but don’t know how to attach it!):

    12024009,Apple,iPhone 3G
    12040009,Apple,iPhone 3G S
    01296400,Apple,iPhone 4
    01294300,Apple,iPhone 4S
    35574604,HTC,ChaCha
    85383304,HTC,Desire
    35798802,HTC,G2
    35681204,HTC,Desire HD
    35896704,HTC,Desire S
    35911603,HTC,Desire Z
    35467204,HTC,Flyer
    35495703,HTC,Google Nexus One
    35412003,HTC,HD2
    35954603,HTC,HTC Touch Diamond2
    35530104,HTC,Incredible S
    35831304,HTC,Sensation
    35644004,HTC,Sensation/Pyramid
    35902803,HTC,Wildfire
    35870104,HTC,Wildfire S

    So it’s formatted, and the table create statement is:

    CREATE TABLE TAC_CODE
    (
    TacCode STRING,
    Manufacturer STRING,
    Handset STRING
    );

    I have found a workaround – first I define a view:

    CREATE VIEW V_TAC_CODE
    AS
    SELECT *,
    SUBSTRING(TacCode, 1, 4) AS TacRoot
    FROM TAC_CODE;

    Then I can run the query below:

    select TacRoot, count(*)
    from TAC_CODE
    group by TacRoot,
    order by TacRoot;

    Not sure why that should work but it does! Obviously it’s not ideal as I have to change the view if I change the query

    Collapse
    #32670

    Carter Shanklin
    Participant

    Can you share your dataset or other info about how it is stored (e.g. format, SerDe, etc.)?

    Collapse
    #32599

    Thank you for the reply, Carter.

    Unfortunately your suggestion still terminates with the same error: Error occurred executing hive query: Unknown exception – though I did expect this as the following query also fails as per my original post:

    select substring(taccode, 1, 4), count(*)
    from tac_code
    group by substring(taccode, 1, 4)
    order by substring(taccode, 1, 4);

    Any further help would be very much appreciated! Thank you…

    Collapse
    #32573

    Carter Shanklin
    Participant

    You didn’t select taccode so the first query can’t work. Try

    select substring(taccode, 1, 4) as tsub, count(*)
    from tac_code
    group by tsub
    order by tsub;

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)