Sqoop Forum

sqoop import error

  • #13872
    Trang Nguyen
    Member

    When I try to run sqoop with a query clause, I always get the error:
    At minimum, you must specify –connect and –table

    sqoop import –connect jdbc:mysql://trangnlx.sea.openmarket.com:3306/wmpfe –verbose –table tMessageAchive -e “SELECT MessageId, ParentId, CarrierId, OPTIONS, TypeId, TicketId FROM tMessageArchive where TsUpdated > ‘2012-11-01′ and TsUpdated < '2012-11-02'" –username trangn -P –outdir sms_message_mysql_20121101 –hive-import –hive-table sms_message_20121102

    I've tried ommited and adding the "–table" option but it doesn't seem to make a difference. Anything that I'm doing wrong?

    Thanks,
    Trang

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #13916
    tedr
    Member

    Hi Trang,

    Thanks for using HDP. We are looking into this issue, to help us better understand what is going on could you post the relevant bits of the hive and sqoop logs?

    Thanks,
    Ted.

    #13929
    Trang Nguyen
    Member

    Hi Ted,

    I was able to run the sqoop import if I structure the command as:
    sqoop import –connect “jdbc:mysql://$DB_HOST:3306/$SERVER_ID” –verbose –table tMessageArchive –columns “MessageId, ParentId, CarrierId, Options, TypeId, TicketId, SubscriberId, InterfaceId, AppTypeId, StateId, ReasonId, SourceTon, SourceAddr, DestTon, DestAddr, UdhId, NotifyId, NotifyStateId, NotifyReasonId, DcsId, Data, BillingId, BillingPrice, ChargeType, ChargePrice, UserAgent, UserIpAddr, TsSubmitted, TsScheduled, TsExpire, TsTransferred, TsDelivered, TsUpdated, ReferenceId, ProgramId, TsDbUpdatedUtc, message_purpose_id, ad_requirement_id, advice_of_charge_message, sms_state_id, sms_error_code, initiate_billing, tmobile_content_delivery_type_id, note, retry_of_ticket_id, carrier_response_code, currency_code, destination_address_country_id” –where “TsDbUpdatedUtc >= ‘2012-12-${from} 00:00:00′ and TsDbUpdatedUtc = ‘2012-12-01 00:00:00′ and TsDbUpdatedUtc < '2012-12-02 00:00:00'" –username trangn -P –outdir outdir/sms_message_test –hive-import –hive-table sms_message_mysql_test
    13/01/21 12:34:48 DEBUG tool.BaseSqoopTool: Enabled debug logging.
    Enter password:
    13/01/21 12:34:51 DEBUG sqoop.Sqoop: Must specify destination with –target-dir.
    Try –help for usage instructions.
    Must specify destination with –target-dir.

    If I specify the –target-dir (why is it needed?), I get a different error:
    When importing query results in parallel, you must specify –split-by.

    The original query did not require either one of these parameters so I'm not sure why it sqoop is asking for them.

    Thanks,
    Trang

    #13930
    Trang Nguyen
    Member

    Somehow the blog posting got clobbered.
    This query works:
    sqoop import –connect “jdbc:mysql://$DB_HOST:3306/$SERVER_ID” –verbose –table tMessageArchive –columns “MessageId, ParentId, CarrierId, Options, TypeId, TicketId, SubscriberId, InterfaceId, AppTypeId, StateId, ReasonId, SourceTon, SourceAddr, DestTon, DestAddr, UdhId, NotifyId, NotifyStateId, NotifyReasonId, DcsId, Data, BillingId, BillingPrice, ChargeType, ChargePrice, UserAgent, UserIpAddr, TsSubmitted, TsScheduled, TsExpire, TsTransferred, TsDelivered, TsUpdated, ReferenceId, ProgramId, TsDbUpdatedUtc, message_purpose_id, ad_requirement_id, advice_of_charge_message, sms_state_id, sms_error_code, initiate_billing, tmobile_content_delivery_type_id, note, retry_of_ticket_id, carrier_response_code, currency_code, destination_address_country_id” –where “TsDbUpdatedUtc >= ‘2012-12-${from} 00:00:00′ and TsDbUpdatedUtc = ‘2012-12-01 00:00:00′ and TsDbUpdatedUtc < '2012-12-02 00:00:00'" –username trangn -P –outdir outdir/sms_message_test –hive-import –hive-table sms_message_mysql_test –target-dir test

    Not sure why sqoop is not accepting the –query syntax and asking for additional parameters…

    Thanks,
    Trang

    #13931
    Trang Nguyen
    Member

    Somehow the posting gets clobbered when it’s stored. I’ll try to repost the successful import statement as two separate postings.

    Sucessfull:
    sqoop import –connect “jdbc:mysql://$DB_HOST:3306/$SERVER_ID” –verbose –table tMessageArchive –columns “MessageId, ParentId, CarrierId, Options, TypeId, TicketId, SubscriberId, InterfaceId, AppTypeId, StateId, ReasonId, SourceTon, SourceAddr, DestTon, DestAddr, UdhId, NotifyId, NotifyStateId, NotifyReasonId, DcsId, Data, BillingId, BillingPrice, ChargeType, ChargePrice, UserAgent, UserIpAddr, TsSubmitted, TsScheduled, TsExpire, TsTransferred, TsDelivered, TsUpdated, ReferenceId, ProgramId, TsDbUpdatedUtc, message_purpose_id, ad_requirement_id, advice_of_charge_message, sms_state_id, sms_error_code, initiate_billing, tmobile_content_delivery_type_id, note, retry_of_ticket_id, carrier_response_code, currency_code, destination_address_country_id” –where “TsDbUpdatedUtc >= ‘2012-12-${from} 00:00:00′ and TsDbUpdatedUtc < '2012-12-${to} 00:00:00'" –username trangn -P –outdir outdir/sms_message_${SERVER_ID}_2012_12_${from}_${to} –hive-import –hive-table sms_message_mysql_2012_12_${from}_${to}

    #13932
    Trang Nguyen
    Member

    Failing import:
    sqoop import –connect “jdbc:mysql://tdc-prd-rdb-044:3306/wmp44″ –verbose –query “select MessageId, ParentId, CarrierId, Options, TypeId, TicketId, SubscriberId, InterfaceId, AppTypeId, StateId, ReasonId, SourceTon, SourceAddr, DestTon, DestAddr, UdhId, NotifyId, NotifyStateId, NotifyReasonId, DcsId, Data, BillingId, BillingPrice, ChargeType, ChargePrice, UserAgent, UserIpAddr, TsSubmitted, TsScheduled, TsExpire, TsTransferred, TsDelivered, TsUpdated, ReferenceId, ProgramId, TsDbUpdatedUtc, message_purpose_id, ad_requirement_id, advice_of_charge_message, sms_state_id, sms_error_code, initiate_billing, tmobile_content_delivery_type_id, note, retry_of_ticket_id, carrier_response_code, currency_code, destination_address_country_id from tMessageArchive where TsDbUpdatedUtc >= ‘2012-12-01 00:00:00′ and TsDbUpdatedUtc < '2012-12-02 00:00:00'" –username trangn -P –outdir outdir/sms_message_test –hive-import –hive-table sms_message_mysql_test

    #13933
    Trang Nguyen
    Member

    On the failing import, sqoop is rejecting the syntax and asking for additional paratemeters (target-dir, split-by,…) which was not required in the successful import.

    Thanks,
    Trang

    #14500
    tedr
    Member

    Hi Trang,

    Here’s a couple of things I noticed when reading the Sqoop documentation about the freeform queries (using the –query):
    1) When importing a free-form query, you must specify a destination directory with –target-dir
    2) You must also select a splitting column with –split-by
    Also these docs mention that if you want to run this type of query in parallel (as in using more than one mapper) you must include the token $CONDITIONS (these conditions are inferred by sqoop and if you double quoter the query, as you did, you must escape the token like this \$CONDITIONS.

    So from this it looks like when using the –query format these specific parameters are required by the sqoop language definition. It’s not an issue, it is just the way sqoop was written.

    Thanks,
    Ted.

The topic ‘sqoop import error’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.