The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Sqoop Forum

Import Twitter data via Flume into HDP1.3/2 sandbox

  • #44001
    Manuel
    Participant

    Hello!

    For an internal presentation I want to show live tutorial #13, “How To Refine and Visualize Sentiment Data”.
    But I would like to show as well the import of Twitter data.
    To set-up the conf file from scratch is still above my knowledge, so I searched for some tutorials on this.
    Unfortunately I haven’t found so far anything specific including the three topics Twitter, Flume and Hortonworks (preferably sandbox HDP2).

    I would be very happy if someone could point me to a source on this.

    Thanks!

  • Author
    Replies
  • #44094
    #44477
    Manuel
    Participant

    Hi Dave,

    many thanks for your feedback!

    I managed to get to following output on the HDP 2 sandbox (virtualbox).
    My OS is Win7 64bit.

    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 250 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 500 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 1000 milliseconds
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Waiting for 2000 milliseconds
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Waiting for 4000 milliseconds
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Waiting for 8000 milliseconds
    13/11/27 03:38:43 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: Waiting for 16000 milliseconds
    13/11/27 03:39:19 INFO twitter4j.TwitterStreamImpl: Establishing connection.

    Any hint how to solve this?

    Many thanks for your support!

    Best,

    Manuel

    #44775
    Robert Molina
    Keymaster

    Hi Manuel,
    Can you clarify what is occurring? Are you getting an error? I would probably start a new thread and provide explanation of the problem you are having.

    Regards,
    Robert

    #46286
    Rahul Dhond
    Participant

    Hi Manuel,
    Were you able to do the import of Twitter Data as well? I would be very interested in knowing how to.

    Most of the examples on the net are for CDH3 (Cloudera) version but havent found any for HDP.

    I have tried to run the agent doing the relevant changes but it does stalls after 2 to 3 seconds into the run.

    Could you or anyone else please let me know? Thanks.

    regards,
    Rahul

    #46291
    Rahul Dhond
    Participant

    I am getting a “HDFS IO error java.io.exception callable timed out after 10000 ms”…..any idea on how this could be fixed? Thanks.
    regards,
    Rahul

    #46306
    Rahul Dhond
    Participant

    I could get the tweets flowing on my screen after I changed the flume.conf parameters (bathsize, rollount, capacity, transactioncapacity) to match the ones in this link: http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html

    I didnt get any file(s) created for the raw tweets. I am wondering why. Here is what I have in flume.conf
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/

    The dir. /tweets is hue(owner):hdfs(group). My TwitterAgent is run as “root” . Do I need to change anything there?

    thanks & regards,
    Rahul Dhond

    #46308
    Rahul Dhond
    Participant

    Found the solution. I had to replace “localhost” in the flume.conf file with “sandbox”. It works fine now. Thanks.

    #46474
    Koelli Mungee
    Moderator

    Thanks for the information!

The topic ‘Import Twitter data via Flume into HDP1.3/2 sandbox’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.