Sqoop Forum

Flume multiple collectors

  • #43981
    JuanFra Rodriguez
    Participant

    Hi:
    I’m wondering what nodes (roles) are more suitable in my hadoop cluster to install Flume collectors.
    I mean, several independent nodes are needed or datanodes could assume this role also?

    Thanks a lot!

    Regards,
    JuanFra.

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #44084
    Robert Molina
    Moderator

    Hi JuanFra,
    What source are you trying to fetch? Can you provide more explanation what you plan to do as far as the use case?

    Regards,
    Robert

    #44093
    JuanFra Rodriguez
    Participant

    Ok, this is our use case:
    We are planning to mount a cluster with the following machines:
    – 1 Management Server
    – 1 Namenode Server
    – 1 Resource Manager
    – 20 Datanodes
    Flume agents will be running in 5 nodes reading system logs.
    Our doubt is about collector tier, where is more suitable to run them? in separated nodes or within datanodes?

    Thanks again!

    #44776
    Robert Molina
    Moderator

    Hi JuanFra,
    It all depends how much data you are fetching and sending to hadoop. If you have a test cluster up, I would suggest running some tests on the cluster and run the flume agent and mimic the amount of data you would be pushing. Watch the resources on the box. Use this data as a baseline and hence you can make a better decision if that particular flume agent can coexist with any other services on that the machine while it is running jobs versus having dedicated machines for flume.

    I hope that helps.

    Regards,
    Robert

    #44814
    JuanFra Rodriguez
    Participant

    Got it Robert!
    As soon as I have any conclusion, I’ll let you know it.

    Thanks for your support!

    Regards,
    JuanFra.

The topic ‘Flume multiple collectors’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.