Mantis - orca_web
Viewing Issue Advanced Details
4546 regular use major always 2010-09-08 16:59 2011-11-22 21:26
GlenG  
dam  
normal  
acknowledged  
open  
none    
none  
0004546: network interface with greater than 1 Gbit of bandwidth does plot correctly
(This is a copy of my post to 'orca-users@orcaware.com')

I'm trying to add a plot for a network interface with greater than 1 Gbit of bandwidth. I'm getting a plot but the y axis tops out at 1000Mb/s and when the interface is receiving > 1000Mb/s that variable is not plotted (the text portion Max. does not report the larger values either). Here's my plot configuration. I copied a working 1Gbit plot and changed data_max from 1000000000 to 2000000000. Also, tried deleting data_max.
 
# Interface bits per second for > 1 Gbit interfaces.
plot {
title %g Interface Bits Per Second: $1
source orcallator
data 1024 * 8 * ((?:(?:aggr))\d+)InKB/s
data 1024 * 8 * $1OuKB/s
line_type area
line_type line1
legend Input
legend Output
y_legend Bits/s
data_min 0
data_max 2000000000
plot_width 800
href http://www.orcaware.com/orca/docs/orcallator.html#interface_bits_per_second [^]
}
I've uploaded an orcallator collection file containing data with bandwidth utilization > 1Gbit.
? file icon beaker.saved.orcallator-2010-09-02-000 [^] (974,098 bytes) 2010-09-08 16:59
png file icon o_beaker_gauge_1024_X_8_X_aggr1InKB_per_s,__1024_X_8_X_aggr1OuKB_per_s-daily.png [^] (29,060 bytes) 2010-09-08 17:01
png file icon B.o_beaker_gauge_1024_X_8_X_aggr1InKB_per_s,__1024_X_8_X_aggr1OuKB_per_s-daily.png [^] (28,241 bytes) 2010-09-13 20:49
Issue History
2010-09-08 16:59 GlenG New Issue
2010-09-08 16:59 GlenG File Added: beaker.saved.orcallator-2010-09-02-000
2010-09-08 17:01 GlenG File Added: o_beaker_gauge_1024_X_8_X_aggr1InKB_per_s,__1024_X_8_X_aggr1OuKB_per_s-daily.png
2010-09-08 17:03 GlenG Note Added: 0008278
2010-09-13 20:49 GlenG File Added: B.o_beaker_gauge_1024_X_8_X_aggr1InKB_per_s,__1024_X_8_X_aggr1OuKB_per_s-daily.png
2010-09-13 21:00 GlenG Note Added: 0008286
2011-11-21 16:09 dam Status new => assigned
2011-11-21 16:09 dam Assigned To => dam
2011-11-21 16:23 GlenG Note Added: 0009428
2011-11-21 17:14 dam Note Added: 0009429
2011-11-22 21:03 GlenG Note Added: 0009431
2011-11-22 21:26 dam Note Added: 0009432
2011-11-22 21:26 dam Status assigned => acknowledged

Notes
(0008278)
GlenG   
2010-09-08 17:03   
The png file is not from plotting the attached orcallator file.
(0008286)
GlenG   
2010-09-13 21:00   
I got the plot to work by:

1. removing data_max from orcallator.cfg

# Interface bits per second for > 1 Gbit interfaces.
# data_max 2000000000
plot {
title %g Interface Bits Per Second: $1
source orcallator
data 1024 * 8 * ((?:(?:aggr))\d+)InKB/s
data 1024 * 8 * $1OuKB/s
line_type area
line_type line1
legend Input
legend Output
y_legend Bits/s
data_min 0
plot_width 800
href http://www.orcaware.com/orca/docs/orcallator.html#interface_bits_per_second [^]
}

2. killing Orca master:

pkill orca

3. deleting rrd files:

ls -ltrh /var/opt/csw/orca/rrd/orcallator/o_beaker | grep aggr
-rw-r--r-- 1 root root 49K Sep 2 14:43 gauge_1024_X_18_X_aggr1OuKB_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 2 14:43 gauge_1024_X_18_X_aggr1InKB_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:23 gauge_1024_X_8_X_aggr1InKB_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:23 gauge_1024_X_8_X_aggr1OuKB_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1Coll_pct.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1Defr_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1IErr_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1InDtSz_per_p.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1InOvH_pct_per_p.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1Ipkt_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1NoCP_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1OErr_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1Opkt_per_s.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1OuDtSz_per_p.rrd
-rw-r--r-- 1 root root 49K Sep 13 11:50 gauge_aggr1OuOvH_pct_per_p.rrd

cd /var/opt/csw/orca/rrd/orcallator/o_beaker
rm gauge_1024_X_18_X_aggr1OuKB_per_s.rrd gauge_1024_X_18_X_aggr1InKB_per_s.rrd gauge_1024_X_8_X_aggr1InKB_per_s.rrd gauge_1024_X_8_X_aggr1OuKB_per_s.rrd

4. restarting master

/opt/csw/bin/orca -d /opt/csw/etc/orcallator.cfg


I attached an "after" plot file.


If I should have done something else please let me know.
(0009428)
GlenG   
2011-11-21 16:23   
This appears to be working for me. Did I fail to update this issue correctly?

Thanks,
GlenG
(PS. I am still having trouble(?) with Orca master crashing after about 10 days. I believe this is the result of a memory leak. SMF is restarting, so I guess in someways the trouble is minimal.)
(0009429)
dam   
2011-11-21 17:14   
I happen to just have a machine with nxge in the lab which allows me to fix other issues while I am at it.
How high should aggr be? If you bundle up 4 gigabit interfaces it should be even higher. 10 GBE? Or 20 GBE for trunking 10 GBE interfaces?

Regarding the crash: I can't promise when I will have a reasonable amount of time to look into this. Nonetheless I am working on fixing all SE/orcallator issues in one go now (apart from the leak). IIRC the was splitting off orcallator to limit deps on server machines and new nxge interfaces. Any other issues you have for a new release?

Best regards

  -- Dago
(0009431)
GlenG   
2011-11-22 21:03   
(sorry of the delay in replying - major application software upgrades over the weekend)

>>How high should aggr be?
As the number of NICs in a aggr can change dynamically, I think the best choice is to let the max float.

The things that I have run into:
1. the single threaded-ness of the master means my graphs tend to update less frequently than I like
2. memory leak leading to abend/dump (at this point /var is a little too small and it fills up - until a cron task moves the dump elsewhere) - Note: The suggestion from the orcalist is that this is a pearl-solaris 10 problem)
3. dynamic change in the number of CPUs causes collection to fail (T5220 with LDoms)
4. restarting csworca causes maintenance state, although clearing allows startup. The console from today follows:

ex=0 11:30:27 fozzie ~ gunselmg $sudo /usr/sbin/svcadm -v restart svc:/network/csworca:default
Action restart set for svc:/network/csworca:default.
ex=1 11:30:50 fozzie ~ gunselmg $sudo svcs -l svc:/network/csworca
fmri svc:/network/csworca:default
enabled true
state online
next_state offline
state_time Tue Nov 22 11:30:30 2011
logfile /var/svc/log/network-csworca:default.log
restarter svc:/system/svc/restarter:default
contract_id 908171
dependency require_all/none svc:/system/filesystem/local (online)
dependency require_all/none svc:/network/loopback (online)
ex=0 11:31:04 fozzie ~ gunselmg $sudo svcs -l svc:/network/csworca
fmri svc:/network/csworca:default
enabled true
state maintenance
next_state none
state_time Tue Nov 22 11:31:31 2011
logfile /var/svc/log/network-csworca:default.log
restarter svc:/system/svc/restarter:default
contract_id 908171
dependency require_all/none svc:/system/filesystem/local (online)
dependency require_all/none svc:/network/loopback (online)
ex=1 11:32:06 fozzie ~ gunselmg $sudo vi /var/svc/log/network-csworca:default.log
"/var/svc/log/network-csworca:default.log" 749 lines, 56006 characters
...
Version string '1.05 ' contains invalid data; ignoring: ' ' at /opt/csw/bin/orca line 66.
[ Nov 22 11:30:30 Stopping because service restarting. ]
[ Nov 22 11:30:30 Executing stop method ("/var/opt/csw/svc/method/svc-csworca stop") ]
/var/opt/csw/svc/method/svc-csworca: kill: no such process
[ Nov 22 11:30:30 Method "stop" exited with status 0 ]
[ Nov 22 11:31:30 Method or service exit timed out. Killing contract 908171 ]
[ Nov 22 11:31:31 Method or service exit timed out. Killing contract 908171 ]
:q
ex=2 11:32:22 fozzie ~ gunselmg $sudo /usr/sbin/svcadm -v clear svc:/network/csworca:default
Action maint_off set for svc:/network/csworca:default.
ex=0 11:32:29 fozzie ~ gunselmg $sudo svcs -l svc:/network/csworca
fmri svc:/network/csworca:default
enabled true
state online
next_state none
state_time Tue Nov 22 11:32:29 2011
logfile /var/svc/log/network-csworca:default.log
restarter svc:/system/svc/restarter:default
contract_id 955499
dependency require_all/none svc:/system/filesystem/local (online)
dependency require_all/none svc:/network/loopback (online)
ex=0 11:32:36 fozzie ~ gunselmg $
(0009432)
dam   
2011-11-22 21:26   
Hi Glen,

> 1. the single threaded-ness of the master means my graphs tend to update less frequently than I like

This is not easy to overcome. While there are solutions like Parallel::ForkManager it would probably mean restructuring large chunks of the code. As a workaround I usually partition the monitored machines and run multiple instances of orcaweb at the same time.

> 2. memory leak leading to abend/dump (at this point /var is a little too small and it fills up - until a cron task moves the dump elsewhere) - Note: The suggestion from the orcalist is that this is a pearl-solaris 10 problem)

If you happen to have a core it would be nice if you could link that. Maybe I can get something out of it.

> 3. dynamic change in the number of CPUs causes collection to fail (T5220 with LDoms)

Why should this happen? Is this related to the code or a general restriction of the reconfiguration?

> 4. restarting csworca causes maintenance state, although clearing allows startup. The console from today follows:

I thought this was fixed in 0004505 ?