Bug 130 - Segfault when deleting a TCP socket in the "connection fail callback"
Segfault when deleting a TCP socket in the "connection fail callback"
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: internet
pre-release
All All
: P3 normal
Assigned To: ns-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-04 13:47 UTC by Gustavo J. A. M. Carneiro
Modified: 2008-02-06 12:36 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gustavo J. A. M. Carneiro 2008-02-04 13:47:49 UTC
I have a script that removes the last reference to a TCP socket during the "connection failed callback".  That causes a subsequent segmentation fault.  Valgrind trace follows:

TCP connection failed.
==10582== Invalid read of size 8
==10582==    at 0x522792D: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:992)
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582==    by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent() (simulator.cc:155)
==10582==    by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183)
==10582==    by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418)
==10582==    by 0x40C442: main (bench-olsr.cc:750)
==10582==  Address 0x67fb660 is 0 bytes inside a block of size 408 free'd
==10582==    at 0x4C2162D: operator delete(void*) (vg_replace_malloc.c:336)
==10582==    by 0x522D253: ns3::TcpSocket::~TcpSocket() (tcp-socket.cc:101)
==10582==    by 0x5109A49: ns3::Object::MaybeDelete() const (object.cc:512)
==10582==    by 0x41107A: ns3::Object::Unref() const (object.h:437)
==10582==    by 0x4110A4: ns3::Ptr<ns3::Socket>::~Ptr() (ptr.h:402)
==10582==    by 0x51E6BA9: ns3::Socket::NotifyConnectionFailed() (socket.cc:143)
==10582==    by 0x5227240: ns3::TcpSocket::Retransmit() (tcp-socket.cc:1030)
==10582==    by 0x52279CB: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:999)
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582== 
==10582== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==10582==  Bad permissions for mapped region at address 0x61D9F0
==10582==    at 0x61D9F0: ???
==10582==    by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661)
==10582==    by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39)
==10582==    by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent() (simulator.cc:155)
==10582==    by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183)
==10582==    by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418)
==10582==    by 0x40C442: main (bench-olsr.cc:750)
==10582==
Comment 1 Gustavo J. A. M. Carneiro 2008-02-04 13:50:59 UTC
I see some lines like:

       m_retxEvent = Simulator::Schedule (rto,&TcpSocket::ReTxTimeout,this);

Most likely there's a scheduled event that was not properly cancelled in the socket destructor.

*sigh* Why doesn't anyone believe me when I say that EventGarbageCollector is needed? :-(
Comment 2 Gustavo J. A. M. Carneiro 2008-02-04 13:54:58 UTC
Patch to fix the bug:

diff -r a1edfbcca81d src/internet-node/tcp-socket.cc
--- a/src/internet-node/tcp-socket.cc   Mon Feb 04 14:15:18 2008 +0000
+++ b/src/internet-node/tcp-socket.cc   Mon Feb 04 18:53:38 2008 +0000
@@ -121,6 +121,7 @@ TcpSocket::Destroy (void)
   m_node = 0;
   m_endPoint = 0;
   m_tcp = 0;
+  m_retxEvent.Cancel ();
 }
 int
 TcpSocket::FinishBind (void)
Comment 3 Rajib Bhattacharjea 2008-02-05 10:02:23 UTC
Pushed Gustavo's fix as revision 2315 in ns-3-dev (changeset df5e3f1d2a1a)
Comment 4 Gustavo J. A. M. Carneiro 2008-02-06 12:09:55 UTC
I found another similar segfault today.  I am not sure how to reproduce, though.
Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to ~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
Comment 5 Rajib Bhattacharjea 2008-02-06 12:25:32 UTC
(In reply to comment #4)
> I found another similar segfault today.  I am not sure how to reproduce,
> though.
> Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to
> ~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
> 

Please feel free to push your change...I'm not sure I understand the issue of moving the location of the canceling of the event, but if it fixes a segfault then please do.
Comment 6 Gustavo J. A. M. Carneiro 2008-02-06 12:36:24 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > I found another similar segfault today.  I am not sure how to reproduce,
> > though.
> > Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to
> > ~TcpSocket()?  I tried it locally, and it seems to fix the issue for me...
> > 
> 
> Please feel free to push your change...I'm not sure I understand the issue of
> moving the location of the canceling of the event, but if it fixes a segfault
> then please do.
> 

The issue is that, if you look at ~TcpSocket, you'll see that not all code paths lead to Destroy() being called, i.e. it appears (on the surface) possible for a TcpSocket to be freed without calling the Destroy() method first.

Fix pushed as 2318:d4217d7ba4ba.