Mantis - squid
Viewing Issue Advanced Details
5163 regular use crash have not tried 2014-04-11 23:13 2016-04-04 15:10
hudesd  
dam  
normal  
closed  
fixed  
none    
none  
0005163: squid 3.4.4 crashes on Solaris 10
I have been using Squid 3.1 for quite awhile with no problem. I recently upgraded all my CSW packages and Squid 3.4.4 came with it, no option otherwise it's in stable/unstable/testing.
The problem is that it is NOT stable: it exits after awhile.
It's running as a service (cswsquid) as per the package.
This on a T2000 Solaris 10 148888-05 with 8GB RAM and about 600GB of available disk space .
I had made no change to the configuration between 3.1 and 3.4. I subsequently have tried both aufs and my original ufs (diskd isn't available) to no avail.
I increased the size of the disk and memory cache to no avail.
Squid will run happily as long as users are only tunneling through it; once some caching gets going with regular http it exits.

I'm not finding any core dumps in /var/opt/csw/squid/cache or the 00 directory under that.
I can provide squid config files and log files.
? file icon squid.conf [^] (4,220 bytes) 2014-04-11 23:22
Issue History
2014-04-11 23:13 hudesd New Issue
2014-04-11 23:22 hudesd File Added: squid.conf
2014-04-12 03:16 dam Status new => assigned
2014-04-12 03:16 dam Assigned To => dam
2014-04-12 03:21 dam Note Added: 0010799
2014-04-12 03:21 dam Status assigned => feedback
2014-04-23 19:09 hudesd Note Added: 0010806
2014-04-24 00:09 hudesd Note Added: 0010809
2014-05-02 15:52 dam Note Added: 0010815
2014-05-02 16:32 dam Note Added: 0010816
2014-05-02 16:32 dam Status feedback => confirmed
2014-05-05 17:49 hudesd Note Added: 0010819
2014-05-23 09:41 dam Note Added: 0010838
2014-05-23 15:30 hudesd Note Added: 0010839
2014-09-12 19:07 maciej Note Added: 0010908
2014-09-27 22:49 dam Note Added: 0010925
2014-12-11 11:19 dam Note Added: 0010996
2014-12-11 11:19 dam Status confirmed => feedback
2015-05-04 13:21 dam Note Added: 0011034
2016-04-04 15:10 dam Note Added: 0011129
2016-04-04 15:10 dam Status feedback => closed
2016-04-04 15:10 dam Resolution open => fixed

Notes
(0010799)
dam   
2014-04-12 03:21   
I am using 3.4.4,REV=2014.03.14 with no problems so far. Please check
  /var/svc/log/network-cswsquid:default.log
for messages.

However, 3.4.4 is only in testing and unstable, whereas stable still has 3.1:

root@web [web]:/var/opt/csw/squid/cache > ls -l /export/mirror/opencsw-official/*/sparc/5.10/squid-*
-rw-r--r-- 3 web web 2619826 Mar 14 14:10 /export/mirror/opencsw-official/bratislava/sparc/5.10/squid-3.4.4,REV=2014.03.14-SunOS5.10-sparc-CSW.pkg.gz
lrwxrwxrwx 1 web web 65 Mar 15 03:16 /export/mirror/opencsw-official/dublin/sparc/5.10/squid-2.7,REV=2010.10.05_STABLE9-SunOS5.9-sparc-CSW.pkg.gz -> ../5.9/squid-2.7,REV=2010.10.05_STABLE9-SunOS5.9-sparc-CSW.pkg.gz
-rw-r--r-- 2 web web 2483888 Sep 25 2012 /export/mirror/opencsw-official/kiel/sparc/5.10/squid-3.1,REV=2012.06.15_20-SunOS5.10-sparc-CSW.pkg.gz
-rw-r--r-- 3 web web 714982 Oct 8 2009 /export/mirror/opencsw-official/legacy/sparc/5.10/squid-2.6,REV=2007.09.02_STABLE15-SunOS5.8-sparc-CSW.pkg.gz
-rw-r--r-- 2 web web 2483888 Sep 25 2012 /export/mirror/opencsw-official/stable/sparc/5.10/squid-3.1,REV=2012.06.15_20-SunOS5.10-sparc-CSW.pkg.gz
-rw-r--r-- 3 web web 2619826 Mar 14 14:10 /export/mirror/opencsw-official/testing/sparc/5.10/squid-3.4.4,REV=2014.03.14-SunOS5.10-sparc-CSW.pkg.gz
-rw-r--r-- 3 web web 2619826 Mar 14 14:10 /export/mirror/opencsw-official/unstable/sparc/5.10/squid-3.4.4,REV=2014.03.14-SunOS5.10-sparc-CSW.pkg.gz
(0010806)
hudesd   
2014-04-23 19:09   
The service method script needs to be updated: it is using -D which is deprecated and slated to be removed.
(0010809)
hudesd   
2014-04-24 00:09   
Apr 23 13:08:23 Leaving maintenance because clear requested. ]
[ Apr 23 13:08:23 Enabled. ]
[ Apr 23 13:08:23 Executing start method ("/var/opt/csw/svc/method/svc-cswsquid start") ]
starting squid server.
[ Apr 23 13:08:23 Method "start" exited with status 0 ]
2014/04/23 13:08:23| WARNING: -D command-line option is obsolete.
[ Apr 23 16:08:25 Stopping because process dumped core. ]
[ Apr 23 16:08:26 Executing stop method ("/var/opt/csw/svc/method/svc-cswsquid stop") ]
squid server is already down
[ Apr 23 16:08:26 Method "stop" exited with status 0 ]
[ Apr 23 16:09:27 Method or service exit timed out. Killing contract 8579326 ]
(0010815)
dam   
2014-05-02 15:52   
I have now an idea what goes wrong: when retreiving something via FTP squid dumps core. Here is the stacktrace:


pstack core.squid.8044

core 'core.squid.8044' of 8044: (squid-1) -D
 fe6c8e07 _lwp_kill (1, 6, feffe248, fe670ff1) + 7
 fe670ffd raise (6, 0, feffe298, fe6487ad) + 25
 fe6487cd abort (0, 1, 2b, 8647430, fe766c80, fe762000) + f5
 082a2b2d _Z5deathi (b, 0, feffe3e0, fe69e537, fdf72a40, fe762000) + 1cd
 fe6c4b05 __sighndlr (b, 0, feffe3e0, 82a2960) + 15
 fe6b7eae call_user_handler (b) + 2d2
 fe6b8346 sigacthandler (b, 0, feffe3e0) + ee
 --- called from signal handler with signal 11 (SIGSEGV) ---
 083434c5 _ZNK2Ip7Address4portEv (4, fe762000, feffe998, fe661667, 965adf0, fe762000) + 15
 081c7c01 ???????? (913fdac, 845f909, feffea08, fe6bd29c, 97bb4a0, fe762000)
 081c97bf ???????? (913bca0, 25, 913fda8, feffeabc, b9, 25)
 081c8bd9 _ZN12FtpStateData18handleControlReplyEv (913bca0, 94c2048, 832f25b, 94c2040, 94c2020) + 149
 081ce2a2 _ZN13CommCbMemFunTI12FtpStateData14CommIoCbParamsE6doDialEv (94c203c, 94c2020, feffeb28, 81cdc03, 94c2040, fe762000) + 32
 081ce013 _ZN9JobDialerI12FtpStateDataE4dialER9AsyncCall (94c203c, 94c2020, feffeb58, fe661c5a, fe763098, 84a4020) + 33
 081ce178 _ZN10AsyncCallTI13CommCbMemFunTI12FtpStateData14CommIoCbParamsEE4fireEv (94c2020, 84a137f, feff1b7f, 81a9daf, 88000000, 4056e1fc) + 18
 0832d42d _ZN9AsyncCall4makeEv (94c2020, feffecc4, e650d871, fe76930c, cf5d3200, 8) + 3bd
 083317b6 _ZN14AsyncCallQueue8fireNextEv (86b1a70, feffecdc, feffec08, 82a0d4a, 86086f0, 0) + 1f6
 08331ba0 _ZN14AsyncCallQueue4fireEv (86b1a70, feffecb0, 1, 842e5f9, 40, 402e0000) + 30
 081aadd4 _ZN9EventLoop7runOnceEv (feffecdc, 402e0000, 1, feffecb0, 0, feffecb4) + 104
 081aaf70 _ZN9EventLoop3runEv (feffecdc, feffecb4, 0, 0, 402e0000, 1) + 20
 0822b2d4 _Z9SquidMainiPPc (2, feffed60, 84a4020, feffed1c, feffed3c, fe7fa8bc) + 14b4
 08430c7d main (2, feffed60, feffed6c) + 1d
 0811f2e0 _start (2, feffee38, feffee42, 0, 86b60b0, feffee5d) + 80

dam@unstable10s [unstable10s]:/home/dam/tmp > cat yyy | /opt/SUNWspro/bin/c++filt
core 'core.squid.8044' of 8044: (squid-1) -D
 fe6c8e07 _lwp_kill (1, 6, feffe248, fe670ff1) + 7
 fe670ffd raise (6, 0, feffe298, fe6487ad) + 25
 fe6487cd abort (0, 1, 2b, 8647430, fe766c80, fe762000) + f5
 082a2b2d death(int) (b, 0, feffe3e0, fe69e537, fdf72a40, fe762000) + 1cd
 fe6c4b05 __sighndlr (b, 0, feffe3e0, 82a2960) + 15
 fe6b7eae call_user_handler (b) + 2d2
 fe6b8346 sigacthandler (b, 0, feffe3e0) + ee
 --- called from signal handler with signal 11 (SIGSEGV) ---
 083434c5 Ip::Address::port() const (4, fe762000, feffe998, fe661667, 965adf0, fe762000) + 15
 081c7c01 ???????? (913fdac, 845f909, feffea08, fe6bd29c, 97bb4a0, fe762000)
 081c97bf ???????? (913bca0, 25, 913fda8, feffeabc, b9, 25)
 081c8bd9 FtpStateData::handleControlReply() (913bca0, 94c2048, 832f25b, 94c2040, 94c2020) + 149
 081ce2a2 CommCbMemFunT<FtpStateData, CommIoCbParams>::doDial() (94c203c, 94c2020, feffeb28, 81cdc03, 94c2040, fe762000) + 32
 081ce013 JobDialer<FtpStateData>::dial(AsyncCall&) (94c203c, 94c2020, feffeb58, fe661c5a, fe763098, 84a4020) + 33
 081ce178 AsyncCallT<CommCbMemFunT<FtpStateData, CommIoCbParams> >::fire() (94c2020, 84a137f, feff1b7f, 81a9daf, 88000000, 4056e1fc) + 18
 0832d42d AsyncCall::make() (94c2020, feffecc4, e650d871, fe76930c, cf5d3200, 8) + 3bd
 083317b6 AsyncCallQueue::fireNext() (86b1a70, feffecdc, feffec08, 82a0d4a, 86086f0, 0) + 1f6
 08331ba0 AsyncCallQueue::fire() (86b1a70, feffecb0, 1, 842e5f9, 40, 402e0000) + 30
 081aadd4 EventLoop::runOnce() (feffecdc, 402e0000, 1, feffecb0, 0, feffecb4) + 104
 081aaf70 EventLoop::run() (feffecdc, feffecb4, 0, 0, 402e0000, 1) + 20
 0822b2d4 SquidMain(int, char**) (2, feffed60, 84a4020, feffed1c, feffed3c, fe7fa8bc) + 14b4
 08430c7d main (2, feffed60, feffed6c) + 1d
 0811f2e0 _start (2, feffee38, feffee42, 0, 86b60b0, feffee5d) + 80

Digging further.
(0010816)
dam   
2014-05-02 16:32   
I am pretty confident this is the same bug as this one:
  http://bugs.squid-cache.org/show_bug.cgi?id=4004 [^]
(0010819)
hudesd   
2014-05-05 17:49   
While it is interesting that you found a bug with FTP tunneling, I don't have much of that in my organization. What I do have a LOT of is HTTPS tunneling. The HP SAN equipment likes to "phone home" a LOT -- of the last 80 entries in access.log, 68 are CONNECT requests to trilogy2.3pardata.com and 141 of the last 180 requests are CONNECT (the other major CONNECT sources is Oracle Ops Center).
(0010838)
dam   
2014-05-23 09:41   
I pushed 3.4.5 in the meantime which still crashes from time to time. What is interesting is that the crashes vanish completely when squidguard is not used (that means no URL filtering is used at all).
(0010839)
hudesd   
2014-05-23 15:30   
I found a problem. The configuration file from 3.1 had been overwrittn by the default 3.4 file. I had changed memory cache to 1G but not incread disk cache from default 100 16 256. I thought that was 100*16*256 but I reread docs and found it is 100M. So memory cache presumably fills. But it is bigger than disk cache so it can't swap all out. So it exits, silently.
When I tried similar on squid 3.1 as delivered with Solaris 11.1, squid complained about the configuration and refused to start (appropriate behavior).
Changing the disk cache to 1024 16 256 resolved that issue but squid 3.4 should complain just as 3.1 did not gamely try to work then fail.

I'll see about 3.4.5 on a test machine.
(0010908)
maciej   
2014-09-12 19:07   
FYI this bug blocks the integration of squid from unstable to testing.
(0010925)
dam   
2014-09-27 22:49   
@Maciej: I think it is good that it blocks integration as this is probably an important issue. I'll keep an eye on all open issues, however this one seems hard to fix.
(0010996)
dam   
2014-12-11 11:19   
Meanwhile I released 3.4.10 and adjusted some build flags as reported in another bug, can you please retry and see if the error is still present?
(0011034)
dam   
2015-05-04 13:21   
Meanwhile 3.5.4 has been released to experimental:
  http://buildfarm.opencsw.org/experimental.html#squid [^]
Please give it a try.
(0011129)
dam   
2016-04-04 15:10   
I am not sure if this is fixed in the latest release, but this is definitely an upstream issue, so I'll close the issue for now here.