Hortonworks Sandbox Forum

Twitter Flume Configuration

  • #20290
    Mur Raguthu
    Participant

    Hi Team, I installed flume-ng sucessful. But I tried to get ‘twitter feeds’ to HDFS. I am getting enclosed errors. Can some help me here?

    Thanks.
    *******************flume.conf************
    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS

    TwitterAgent.sources.Twitter.type = exec
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = values from twitter
    TwitterAgent.sources.Twitter.consumerSecret = values from twitter
    TwitterAgent.sources.Twitter.accessToken = values from twitter
    TwitterAgent.sources.Twitter.accessTokenSecret = values from twitter

    TwitterAgent.sources.Twitter.keywords = hadoop, big data

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://192.168.52.128:8020/user/flume
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100

    ***************flume.log*************
    02 Apr 2013 22:47:07,904 INFO [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133) – Reloading configuration file:conf/flume.conf
    02 Apr 2013 22:47:07,914 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,914 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:loggerSink
    02 Apr 2013 22:47:07,915 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,915 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930) – Added sinks: HDFS Agent: TwitterAgent
    02 Apr 2013 22:47:07,915 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,915 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,916 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,916 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,916 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,916 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:HDFS
    02 Apr 2013 22:47:07,917 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016) – Processing:loggerSink
    02 Apr 2013 22:47:07,937 WARN [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid:319) – Agent configuration for ‘agent’ does not contain any channels. Marking it as invalid.
    02 Apr 2013 22:47:07,939 WARN [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:127) – Agent configuration invalid for agent ‘agent’. It will be removed.
    02 Apr 2013 22:47:07,939 INFO [conf-file-poller-0] (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) – Post-validation flume configuration contains configuration for agents: [TwitterAgent]
    02 Apr 2013 22:47:07,941 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) – Creating channels
    02 Apr 2013 22:47:07,955 INFO [conf-file-poller-0] (org.apache.flume.channel.DefaultChannelFactory.create:40) – Creating instance of channel MemChannel type memory
    02 Apr 2013 22:47:07,960 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) – Created channel MemChannel
    02 Apr 2013 22:47:07,962 INFO [conf-file-poller-0] (org.apache.flume.source.DefaultSourceFactory.create:39) – Creating instance of source Twitter, type exec
    02 Apr 2013 22:47:07,965 ERROR [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.loadSources:366) – Source Twitter has been removed due to an error during configuration
    java.lang.IllegalStateException: The parameter command must be specified
    at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
    at org.apache.flume.source.ExecSource.configure(ExecSource.java:215)

    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    02 Apr 2013 22:47:07,967 INFO [conf-file-poller-0] (org.apache.flume.sink.DefaultSinkFactory.create:40) – Creating instance of sink: HDFS, type: hdfs
    02 Apr 2013 22:47:08,237 INFO [conf-file-poller-0] (org.apache.flume.sink.hdfs.HDFSEventSink.authenticate:492) – Hadoop Security enabled: false
    02 Apr 2013 22:47:08,240 INFO [conf-file-poller-0] (org.apache.flume.node.AbstractConfigurationProvider.getConfiguration:119) – Channel MemChannel connected to [HDFS]
    02 Apr 2013 22:47:08,246 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:138) – Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@148238f4 counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
    02 Apr 2013 22:47:08,260 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:145) – Starting Channel MemChannel
    02 Apr 2013 22:47:08,297 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:89) – Monitoried counter group for type: CHANNEL, name: MemChannel, registered successfully.
    02 Apr 2013 22:47:08,298 INFO [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:73) – Component type: CHANNEL, name: MemChannel started
    02 Apr 2013 22:47:08,298 INFO [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173) – Starting Sink HDFS
    02 Apr 2013 22:47:08,301 INFO [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:89) – Monitoried counter group for type: SINK, name: HDFS, registered successfully.
    02 Apr 2013 22:47:08,301 INFO [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:73) – Component type: SINK, name: HDFS started

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #20424
    Robert
    Participant

    Hi Mur,
    How did you install flume, was it from HDP repo? Sandbox does not have flume installed by default. Also, where did you get the twitter source? I believe HDP flume distro does not have this source.

    Regards,
    Robert

    #20426
    Robert
    Participant

    Hi Mur,
    One thing I see is on this line
    TwitterAgent.sources.Twitter.type = exec

    is incorrect. You should look into the twitter source docs to see the proper type that needs to be specified. Exec type is picking up a different source than the one you are intending.

    Hope that helps.

    Regards,
    Robert

    #25460

    TwitterAgent.sources.Twitter.type = exec
    TwitterAgent.sources.Twitter.channels = MemChannel —
    TwitterAgent.sources.Twitter.consumerKey = values from twitter
    TwitterAgent.sources.Twitter.consumerSecret = values from twitter
    TwitterAgent.sources.Twitter.accessToken = values from twitter
    TwitterAgent.sources.Twitter.accessTokenSecret = values from twitter
    Try to get the consumerkey, consumer,secre, access token, acces tokensecret details from Dev.twitter.com and put them in that and try …it will be success. i did the same

    #25461

    TwitterAgent.sources.Twitter.type = exec TwitterAgent.sources.Twitter.channels = MemChannel – TwitterAgent.sources.Twitter.consumerKey = values from twitter TwitterAgent.sources.Twitter.consumerSecret = values from twitter TwitterAgent.sources.Twitter.accessToken = values from twitter TwitterAgent.sources.Twitter.accessTokenSecret = values from twitter Try to get the consumerkey, consumer,secre, access token, acces tokensecret details from Dev.twitter.com and put them in that and try …it will be success. i did the same … Did you get those values correctly from twitter app configuration…..plz check them once…

    #25550
    Mur Raguthu
    Participant

    Thanks Sampath. I check it out and let you know.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.