Home Forums Sqoop Import Twitter data via Flume into HDP1.3/2 sandbox

This topic contains 8 replies, has 5 voices, and was last updated by  Koelli Mungee 6 months, 2 weeks ago.

  • Creator
    Topic
  • #44001

    Manuel
    Member

    Hello!

    For an internal presentation I want to show live tutorial #13, “How To Refine and Visualize Sentiment Data”.
    But I would like to show as well the import of Twitter data.
    To set-up the conf file from scratch is still above my knowledge, so I searched for some tutorials on this.
    Unfortunately I haven’t found so far anything specific including the three topics Twitter, Flume and Hortonworks (preferably sandbox HDP2).

    I would be very happy if someone could point me to a source on this.

    Thanks!

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Import Twitter data via Flume into HDP1.3/2 sandbox’ is closed to new replies.

  • Author
    Replies
  • #46474

    Koelli Mungee
    Moderator

    Thanks for the information!

    Collapse
    #46308

    Rahul Dhond
    Participant

    Found the solution. I had to replace “localhost” in the flume.conf file with “sandbox”. It works fine now. Thanks.

    Collapse
    #46306

    Rahul Dhond
    Participant

    I could get the tweets flowing on my screen after I changed the flume.conf parameters (bathsize, rollount, capacity, transactioncapacity) to match the ones in this link: http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html

    I didnt get any file(s) created for the raw tweets. I am wondering why. Here is what I have in flume.conf
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/

    The dir. /tweets is hue(owner):hdfs(group). My TwitterAgent is run as “root” . Do I need to change anything there?

    thanks & regards,
    Rahul Dhond

    Collapse
    #46291

    Rahul Dhond
    Participant

    I am getting a “HDFS IO error java.io.exception callable timed out after 10000 ms”…..any idea on how this could be fixed? Thanks.
    regards,
    Rahul

    Collapse
    #46286

    Rahul Dhond
    Participant

    Hi Manuel,
    Were you able to do the import of Twitter Data as well? I would be very interested in knowing how to.

    Most of the examples on the net are for CDH3 (Cloudera) version but havent found any for HDP.

    I have tried to run the agent doing the relevant changes but it does stalls after 2 to 3 seconds into the run.

    Could you or anyone else please let me know? Thanks.

    regards,
    Rahul

    Collapse
    #44775

    Robert Molina
    Moderator

    Hi Manuel,
    Can you clarify what is occurring? Are you getting an error? I would probably start a new thread and provide explanation of the problem you are having.

    Regards,
    Robert

    Collapse
    #44477

    Manuel
    Member

    Hi Dave,

    many thanks for your feedback!

    I managed to get to following output on the HDP 2 sandbox (virtualbox).
    My OS is Win7 64bit.

    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 250 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 500 milliseconds
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:28 INFO twitter4j.TwitterStreamImpl: Waiting for 1000 milliseconds
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:29 INFO twitter4j.TwitterStreamImpl: Waiting for 2000 milliseconds
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:31 INFO twitter4j.TwitterStreamImpl: Waiting for 4000 milliseconds
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:38:35 INFO twitter4j.TwitterStreamImpl: Waiting for 8000 milliseconds
    13/11/27 03:38:43 INFO twitter4j.TwitterStreamImpl: Establishing connection.
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: stream.twitter.com
    13/11/27 03:39:03 INFO twitter4j.TwitterStreamImpl: Waiting for 16000 milliseconds
    13/11/27 03:39:19 INFO twitter4j.TwitterStreamImpl: Establishing connection.

    Any hint how to solve this?

    Many thanks for your support!

    Best,

    Manuel

    Collapse
    #44094
    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)