Sqoop Forum

Import Twitter data via Flume into HDP1.3/2 sandbox

  • #44001
    Manuel
    Participant

    Hello!

    For an internal presentation I want to show live tutorial #13, “How To Refine and Visualize Sentiment Data”.
    But I would like to show as well the import of Twitter data.
    To set-up the conf file from scratch is still above my knowledge, so I searched for some tutorials on this.
    Unfortunately I haven’t found so far anything specific including the three topics Twitter, Flume and Hortonworks (preferably sandbox HDP2).

    I would be very happy if someone could point me to a source on this.

    Thanks!

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #44094
    #44477
    Manuel
    Participant

    Hi Dave,

    many thanks for your feedback!

    I managed to get to following output on the HDP 2 sandbox (virtualbox).
    My OS is Win7 64bit.

    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 250 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 500 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 1000 milliseconds
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Waiting for 2000 milliseconds
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Waiting for 4000 milliseconds
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Waiting for 8000 milliseconds
    13/11/27 03:38:43 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: Waiting for 16000 milliseconds
    13/11/27 03:39:19 INFO twitter4j.TwitterStreamImpl: Establishing connection.

    Any hint how to solve this?

    Many thanks for your support!

    Best,

    Manuel

    #44775
    Robert Molina
    Moderator

    Hi Manuel,
    Can you clarify what is occurring? Are you getting an error? I would probably start a new thread and provide explanation of the problem you are having.

    Regards,
    Robert

    #46286
    Rahul Dhond
    Participant

    Hi Manuel,
    Were you able to do the import of Twitter Data as well? I would be very interested in knowing how to.

    Most of the examples on the net are for CDH3 (Cloudera) version but havent found any for HDP.

    I have tried to run the agent doing the relevant changes but it does stalls after 2 to 3 seconds into the run.

    Could you or anyone else please let me know? Thanks.

    regards,
    Rahul

    #46291
    Rahul Dhond
    Participant

    I am getting a “HDFS IO error java.io.exception callable timed out after 10000 ms”…..any idea on how this could be fixed? Thanks.
    regards,
    Rahul

    #46306
    Rahul Dhond
    Participant

    I could get the tweets flowing on my screen after I changed the flume.conf parameters (bathsize, rollount, capacity, transactioncapacity) to match the ones in this link: http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html

    I didnt get any file(s) created for the raw tweets. I am wondering why. Here is what I have in flume.conf
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/

    The dir. /tweets is hue(owner):hdfs(group). My TwitterAgent is run as “root” . Do I need to change anything there?

    thanks & regards,
    Rahul Dhond

    #46308
    Rahul Dhond
    Participant

    Found the solution. I had to replace “localhost” in the flume.conf file with “sandbox”. It works fine now. Thanks.

    #46474
    Koelli Mungee
    Moderator

    Thanks for the information!

The topic ‘Import Twitter data via Flume into HDP1.3/2 sandbox’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.