Bugzilla – Bug 130
Segfault when deleting a TCP socket in the "connection fail callback"
Last modified: 2008-02-06 12:36:24 UTC
I have a script that removes the last reference to a TCP socket during the "connection failed callback". That causes a subsequent segmentation fault. Valgrind trace follows: TCP connection failed. ==10582== Invalid read of size 8 ==10582== at 0x522792D: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:992) ==10582== by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661) ==10582== by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39) ==10582== by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent() (simulator.cc:155) ==10582== by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183) ==10582== by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418) ==10582== by 0x40C442: main (bench-olsr.cc:750) ==10582== Address 0x67fb660 is 0 bytes inside a block of size 408 free'd ==10582== at 0x4C2162D: operator delete(void*) (vg_replace_malloc.c:336) ==10582== by 0x522D253: ns3::TcpSocket::~TcpSocket() (tcp-socket.cc:101) ==10582== by 0x5109A49: ns3::Object::MaybeDelete() const (object.cc:512) ==10582== by 0x41107A: ns3::Object::Unref() const (object.h:437) ==10582== by 0x4110A4: ns3::Ptr<ns3::Socket>::~Ptr() (ptr.h:402) ==10582== by 0x51E6BA9: ns3::Socket::NotifyConnectionFailed() (socket.cc:143) ==10582== by 0x5227240: ns3::TcpSocket::Retransmit() (tcp-socket.cc:1030) ==10582== by 0x52279CB: ns3::TcpSocket::ReTxTimeout() (tcp-socket.cc:999) ==10582== by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661) ==10582== by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39) ==10582== ==10582== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==10582== Bad permissions for mapped region at address 0x61D9F0 ==10582== at 0x61D9F0: ??? ==10582== by 0x52247E5: ns3::Ptr<ns3::EventImpl> ns3::Simulator::MakeEvent<void (ns3::TcpSocket::*)(), ns3::TcpSocket*>(void (ns3::TcpSocket::*)(), ns3::TcpSocket*)::EventMemberImpl0::Notify() (simulator.h:661) ==10582== by 0x5164346: ns3::EventImpl::Invoke() (event-impl.cc:39) ==10582== by 0x516C159: ns3::SimulatorPrivate::ProcessOneEvent() (simulator.cc:155) ==10582== by 0x516C1A1: ns3::SimulatorPrivate::Run() (simulator.cc:183) ==10582== by 0x516C2E2: ns3::Simulator::Run() (simulator.cc:418) ==10582== by 0x40C442: main (bench-olsr.cc:750) ==10582==
I see some lines like: m_retxEvent = Simulator::Schedule (rto,&TcpSocket::ReTxTimeout,this); Most likely there's a scheduled event that was not properly cancelled in the socket destructor. *sigh* Why doesn't anyone believe me when I say that EventGarbageCollector is needed? :-(
Patch to fix the bug: diff -r a1edfbcca81d src/internet-node/tcp-socket.cc --- a/src/internet-node/tcp-socket.cc Mon Feb 04 14:15:18 2008 +0000 +++ b/src/internet-node/tcp-socket.cc Mon Feb 04 18:53:38 2008 +0000 @@ -121,6 +121,7 @@ TcpSocket::Destroy (void) m_node = 0; m_endPoint = 0; m_tcp = 0; + m_retxEvent.Cancel (); } int TcpSocket::FinishBind (void)
Pushed Gustavo's fix as revision 2315 in ns-3-dev (changeset df5e3f1d2a1a)
I found another similar segfault today. I am not sure how to reproduce, though. Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to ~TcpSocket()? I tried it locally, and it seems to fix the issue for me...
(In reply to comment #4) > I found another similar segfault today. I am not sure how to reproduce, > though. > Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to > ~TcpSocket()? I tried it locally, and it seems to fix the issue for me... > Please feel free to push your change...I'm not sure I understand the issue of moving the location of the canceling of the event, but if it fixes a segfault then please do.
(In reply to comment #5) > (In reply to comment #4) > > I found another similar segfault today. I am not sure how to reproduce, > > though. > > Raj, do you mind moving the "m_retxEvent.Cancel ();" from Destroy() to > > ~TcpSocket()? I tried it locally, and it seems to fix the issue for me... > > > > Please feel free to push your change...I'm not sure I understand the issue of > moving the location of the canceling of the event, but if it fixes a segfault > then please do. > The issue is that, if you look at ~TcpSocket, you'll see that not all code paths lead to Destroy() being called, i.e. it appears (on the surface) possible for a TcpSocket to be freed without calling the Destroy() method first. Fix pushed as 2318:d4217d7ba4ba.