Recent Changes - Search:

edit SideBar

Ganglia on IBM Power systems: Best Practices

On this page I describe my view how Ganglia could be used best on IBM POWER5/6/7 systems.

Some things you should consider before you start:

  • Hostnames
    • To Ganglia a new hostname is a new machine
    • It has to resolve IP addresses so please use DNS
  • Stable IP addresses
    • Make sure you are not going to change IP addresses
  • Time and date
    • Make sure the time zone, time and date is consistent on all machines in a cluster
    • Use of NTP is highly recommended
  • So
    • these are normal requirements on production machines
    • for prototype and test systems – get this right before starting Ganglia
  • Read the simple Ganglia How-To available for people setting up their first Ganglia system at:

Preferred setup

  • Define each System p machine with all its LPARs as a separate cluster
  • Use Unicast for network communication
  • Define at least two LPARs per System p machine as gmond hosts for gmetad
    • One would be sufficient, however, two is better for high availibility reasons
  • Define those two LPARs in <confdir>/gmetad.conf as the information brokers for that machine
  • From gmetad: Don’t poll the gmond hosts more frequently than every 15 seconds
  • Know upfront what time intervals to use for sampling ("RRAs" stanza in <confdir>/gmetad.conf), see below
  • Use my extensions for
    • Ethernet adapters (including Shared Ethernet Adapters)
    • Fibre Channel adapters
    • Web interface

Ganglia sampling intervals

Important to know:

  • The sampling interval is defined in <confdir>/gmetad.conf.
  • The "RRAs" stanza is used to defined individual settings.
  • The sampling settings are global.
  • If no "RRAs" stanza is defined a default configuration is used.
  • For historic reasons all values are specified in intervals of 15 seconds.

Example: Default settings in Ganglia

     RRAs "RRA:AVERAGE:0.5:1:240"              "RRA:AVERAGE:0.5:24:240"             "RRA:AVERAGE:0.5:168:240"            "RRA:AVERAGE:0.5:672:240"            "RRA:AVERAGE:0.5:5760:370"

Translation:

  • Take 240 samples at 1 × 15 seconds intervals (used for display of hour)
  • Take 240 samples at 24 × 15 seconds (= 6 minutes) intervals (used for display of day)
  • Take 240 samples at 168 × 15 seconds (= 42 minutes) intervals (used for display of week)
  • Take 240 samples at 672 × 15 seconds (= 168 minutes) intervals (used for display of month)
  • Take 370 samples at 5760 × 15 seconds (= 24 hours) intervals (used for display of year)

Example: 1-minute sampling for one year

     RRAs "RRA:AVERAGE:0.5:4:525600"

Translation:

  • Take 525600 samples at 4 × 15 seconds (= 1 minute) intervals
    • 525600 = 60 (samples/hour) × 24 (hours) × 365 (days) × 1 (year)

Example: 1-minute sampling for 6 months, 5-minute sampling for 2 years

     RRAs "RRA:AVERAGE:0.5:4:259200"  \ 
          "RRA:AVERAGE:0.5:20:210240"

Translation:

  • Take 259200 samples at every 4 × 15 seconds (= 1 minute) intervals
    • 259200 = 60 (samples/hour) × 24 (hours) × 30 (days) × 6 (months)
  • Take 210240 samples at every 20 × 15 seconds (= 5 minutes) intervals
    • 210240 = 12 (samples/hour) × 24 (hours) × 365 (days) × 2 (years)

Example: 15-second sampling for 1 day, 1-minute sampling for 2 months, 10-minute sampling for 1 year

     RRAs "RRA:AVERAGE:0.5:1:5760"   \ 
          "RRA:AVERAGE:0.5:4:86400"  \ 
          "RRA:AVERAGE:0.5:40:52560"

Translation:

  • Take 5760 samples at every 1 × 15 seconds intervals
    • 5760 = 4 (samples/minute) 60 (samples/hour) × 24 (hours)
  • Take 86400 samples at every 4 × 15 seconds (= 1 minute) intervals
    • 86400 = 60 (samples/hour) × 24 (hours) × 30 (days) × 2 (months)
  • Take 52560 samples at every 40 × 15 seconds (= 10 minutes) intervals
    • 52560 = 6 (samples/hour) × 24 (hours) × 365 (days) × 1 (year)

Example: 1-minute sampling for 2 months, 5-minute sampling for 6 months, 15-minute sampling for 3 years

     RRAs "RRA:AVERAGE:0.5:4:86400"   \ 
          "RRA:AVERAGE:0.5:20:51840"  \ 
          "RRA:AVERAGE:0.5:60:105120"

Translation:

  • Take 86400 samples at every 4 × 15 seconds (= 1 minute) intervals
    • 86400 = 60 (samples/hour) × 24 (hours) × 30 (days) × 2 (months)
  • Take 210240 samples at every 20 × 15 seconds (= 5 minutes) intervals
    • 51840 = 12 (samples/hour) × 24 (hours) × 30 (days) × 6 (month)
  • Take 105120 samples at every 60 × 15 seconds (= 15 minutes) intervals
    • 105120 = 4 (samples/hour) × 24 (hours) × 365 (days) × 3 (years)

Default ports

Ganglia by default uses the following ports:

  • 8649
    • The port gmond uses for
      • Sending to other gmonds via UDP (udp_send_channel in <confdir>/gmond.conf)
      • Receiving from other gmonds via UDP (udp_receive_channel in <confdir>/gmond.conf)
      • Sending an XML description of the state of the cluster (tcp_accept_channel in <confdir>/gmond.conf)
  • 8651
    • The port gmetad will answer requests for XML.
  • 8652
    • The port gmetad will answer queries for XML.
    • This facility allows simple subtree and summation views of the XML tree.

Shared Ethernet Adapter statistics

Question: How to monitor SEA statistics on the VIO server ?

  • The AIX libperfstat library seems not to report any statistics about Ethernet adapters if there are no interfaces defined on that adapter.
  • Only seldom interfaces are defined on SEAs.
  • The AIX command 'entstat' however provides these statistics.

Solution: Extension through gmetric via a shell script

Fibre Channel statistics

Question: How to monitor Fibre Channel statistics on the VIO server ?

  • The AIX libperfstat library seems not to report any statistics about Fibre Channel adapters if there are no disks attached to the adapter.
  • Tapes, for instance, would be left out.
  • The AIX command 'fcstat' however provides these statistics.

Solution: Extension through gmetric via a shell script

Enhanced web interface

Please take a look at http://www.perzl.org/ganglia/webinterface.html.

Edit - History - Print - Recent Changes - Search
Page last modified on November 22, 2011, at 09:24 AM