The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

select * from nyse_stocks where stock_symbol="IBM" fails

  • #16269
    Brian Feeny

    I am going through tutorial 1, and the query select * from nyse_stocks where stock_symbol=”IBM” returns no rows. If I view the file via the File Browser, I can see all the data is there. But when I view the nyse_stocks table in HCatalog, it only contains stocks that begin with letter A. So its as though the creation of the table from the file has created a truncated table. All I did was enter the table name and description and select the file (which I verified has all of the stocks in it). On the next page I just accepted the defaults (delimiter of tab), and on the final page for HCatalog I set my columns to the appropriate data types. My “describe nyse_stocks” output matches the tutorial exactly.

    Does anyone have any idea what may be causing this strange behavior? I have dropped the table in HCatalog and tried to re-create but I get the same results. I even dropped the file and re-uploaded it. Same result.

  • Author
  • #16279
    Brian Feeny

    I just reinstalled and tested this again. Same outcome, so I think there is an issue with the latest Sandbox build. I am using the Fusion version. It appears that perhaps the file is partitioned into multiple parts and HCatalog is only importing/using one of those parts (stocks_symbols that begin with “A”).

    The file itself is correct, but when moved into HCatalog its truncated and so the select on symbol_name IBM fails. And of course the SELECT count(*) returns less rows than is shown.

    Yi Zhang

    Hi Brian,

    Can you browse into /apps/hive/warehouse/nyse_stocks and see if it has the complete data? There should be many blocks there.

    If the data is not complete there, can you try another upload, watch the logs in /var/log/webhcat, any clue there?



    Brian Feeny

    Ok, so here is the update. If I download the NYSE stock data and have Safari automatically unzip it, by having the Preferences->General->Open “safe” files after downloading set, then the situation is as I explained.

    If I have Safari not automatically decompress the file, then it works fine! This is concerning, because it should not matter should it?

    As a test, I uploaded the file uncompressed, and called the file and table “test”, and then I uploaded the file compressed, and called it “nyse_stocks”. You can see the difference, the test file is 1048446 in size and the nyse_stocks file is 44005832. As a test, someone from hortonworks may wish to replicate, just have the file decompressed before uploading. Its almost as if Hive clips the data after so much size, so if you have it compressed your good, but if not, you will lose data. It doesn’t make sense to me but what I can tell you is this is repeatable, all you have to do is have safari set to automatically decompress files on download, and then upload the decompressed stock data.

    Yi Zhang

    Hi Brian,

    This is a known issue stated here:

    The Hortonworks Sandbox is built on the Hortonworks Data Platform 1.2. However, excluded from this are:

    Third party tools and downloads (like Talend)
    The Hortonworks Management Console (Apache Ambari)
    Data sets uncompressed by Safari from .gz extension to .tsv extensions may not fully import. To solve this issue, using Safari on a Mac, please ensure that the following configuration is set in Preferences: General->uncheck “Open “safe” files after downloading”.

    – See more at:



    Brian Feeny

    Thanks for letting me know, I must have skipped past reading that in my excitement to get started. I appreciate all the help, and things are working well now!

    Bill Blaney

    FWIW, the problem will also occur if you unzip the file in Windows and upload the unzipped version.


    Hi Brian,

    Thanks for the information.


The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.