Bug 154 - Attach sockets to nodes
Attach sockets to nodes
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: network
pre-release
All All
: P3 normal
Assigned To: ns-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-27 12:18 UTC by Gustavo J. A. M. Carneiro
Modified: 2008-07-01 13:32 UTC (History)
0 users

See Also:


Attachments
This is Gustavo's test case worked out in full. (2.64 KB, patch)
2008-03-27 20:48 UTC, Rajib Bhattacharjea
Details | Diff
Proposed solution (628 bytes, patch)
2008-03-27 20:58 UTC, Rajib Bhattacharjea
Details | Diff
raj's patch updated to ns-3-dev (673 bytes, text/x-diff)
2008-05-28 12:04 UTC, Gustavo J. A. M. Carneiro
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gustavo J. A. M. Carneiro 2008-03-27 12:18:19 UTC
I just spent about an hour debugging a problem in a student's NS-3 program.  And the student spent probably a day :-)  But I think this problem can be easily prevented...

The student's code had roughly this structure:

using namespace ns3;
class Test
{
public:

  void ReceivePacket (Ptr<Socket> socket, Ptr<Packet> packet, const Address &from);
  bool run(void);
};

void Test::ReceivePacket (Ptr<Socket> socket, Ptr<Packet> packet, const Address &from)
{
  NS_LOG_INFO("receive packet");
  [...]
}



bool
Test::run(void)
{
[...] creates nodes [...]
    NS_LOG_INFO ("create socket  receive");
    Ptr<SocketFactory> socketFactory= rxNode->GetObject<Udp> ();
    Ptr<Socket> rxSocket= socketFactory->CreateSocket();
    rxSocket=->Bind (InetSocketAddress (Ipv4Address ("10.1.4.2"), 1234));
    rxSocket=->SetRecvCallback (MakeCallback (&Test::ReceivePacket, this));

    NS_LOG_INFO("send socket ");
    Ptr<SocketFactory> n0SocketFactory = n0->GetObject<Udp> ();
    Ptr<Socket> n0socket = n0SocketFactory->CreateSocket ();
    Ptr<Packet> packet=Create <Packet>();
    NS_LOG_INFO("....");
    n0socket->SendTo (InetSocketAddress (Ipv4Address("10.1.4.2"), 1234),packet); 
    
    return true;
}

int 
main (int argc, char *argv[])
{
  Test teste;
  teste.run();
[...]
  NS_LOG_INFO ("Run Simulation.");
  Simulator::StopAt (Seconds (3.0));
  Simulator::Run ();    
  Simulator::Destroy ();
  NS_LOG_INFO ("Done.");
}

This code does not work because "Ptr<Socket> rxSocket" inside Test.run() goes out of scope before the simulation even begins, and so the socket is destroyed.

I would propose that nodes would keep a list of pointers to sockets.  They would release those references when the socket is Close()ed, or when the node is disposed.  This is more or less what is done with Applications already...
Comment 1 Mathieu Lacage 2008-03-27 12:26:55 UTC
The expectation is that sockets are held by Applications which are held by Nodes. So, if you want to avoid these problems, the best way to do so is by providing new subclasses of the Application base class which hold your sockets.

Comment 2 Gustavo J. A. M. Carneiro 2008-03-27 12:37:33 UTC
Holding sockets in some class, be it Application or otherwise, is more work than needed.  NS-3 can save the programmer work by doing this work by itself, with little cost.

Actually the cost here is _discovering_ the problem.  It wasn't at all obvious, even for me, and only through debugging with NS_LOG was I able to track it down.  The solution to problem is easy; diagnosing the problem was hard.

Eliminating nasty surprises in the simulator is part of our job.  So I strongly recommend either make  it work (preferably) or detect the problem automatically and give an error message.  I only know how to make it work, not detect.
Comment 3 Mathieu Lacage 2008-03-27 12:55:37 UTC
(In reply to comment #2)
> Holding sockets in some class, be it Application or otherwise, is more work
> than needed.  NS-3 can save the programmer work by doing this work by itself,

I understand you feel pretty strongly that this is a user bug which can be avoided but,

> with little cost.

The memory cost is not "little" so, I am very uneasy about this proposal, especially since there is another fix, as outlined in my previous comment. I am also pretty uneasy about this proposed change because I really think that it is a bug to create a socket, release it, and expect it to stay around while you run the simulation. 

Furthermore, I have a very hard time to see how your proposal would gracefully deal with appearing and disappearing sockets: how can you (user) make sure that a socket is really released ? Do you have to go into the array attached to the node and remove it from there ? _that_ sounds really bad.

Comment 4 Gustavo J. A. M. Carneiro 2008-03-27 13:13:23 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Holding sockets in some class, be it Application or otherwise, is more work
> > than needed.  NS-3 can save the programmer work by doing this work by itself,
> 
> I understand you feel pretty strongly that this is a user bug which can be
> avoided but,
> 
> > with little cost.
> 
> The memory cost is not "little" so, I am very uneasy about this proposal,
> especially since there is another fix, as outlined in my previous comment. I am
> also pretty uneasy about this proposed change because I really think that it is
> a bug to create a socket, release it, and expect it to stay around while you
> run the simulation. 

The memory cost is one pointer per socket.  Surely that is negligible.  Same problem as keeping a list of applications in the Node.

> 
> Furthermore, I have a very hard time to see how your proposal would gracefully
> deal with appearing and disappearing sockets: how can you (user) make sure that
> a socket is really released ? Do you have to go into the array attached to the
> node and remove it from there ? _that_ sounds really bad.

Like I said, I would envision socket->Close() to take care of it.  Or even if a socket is closed from the remote side, the Node could then release its reference to it (since a closed socket cannot be used for communication anymore).
Comment 5 Tom Henderson 2008-03-27 14:38:26 UTC
I agree that it is not really intuitive the way the code now works, as exemplified by this test case.  I as a user would expect that if I used a socket and passed data to it, I could delete my local reference to the socket afterwards, just as I could lose reference to the descriptor in real code without destroying the underlying packet.

I would suggest trying to fix this somehow such that the provided test case works; not by suggesting that users subclass from Application, because otherwise it is bound to reoccur since I have now seen three ns-3 users write code like this that avoids the use of base class Application.
Comment 6 Rajib Bhattacharjea 2008-03-27 15:22:24 UTC
The bottom line is that SOMEONE has to retain a reference to the socket throughout its lifetime.  So far we have two alternatives, one being that the node retains references to its sockets (Gustavo's suggestion), the other being that the user of sockets (Applications or end-users) keeps a reference (current expectation).

An approach we took in Tcp for a similar problem was simply to count the references to the socket that live in the Ipv4Endpoint demux:
http://code.nsnam.org/ns-3-dev/rev/4956586bd798

In UdpSocket::FinishBind, Note that "this" effectively copied into into the endpoint by passing the callback.  It is never Ref'd however.  Our fix was to pass a smart pointer into the callback, Ref'ing it in the process.

This way, when the endpoint is deallocated, the smart pointer goes out of scope and Unrefs the socket.  The assures that the lifetime of the socket is tied to the lifetime of its entry in the demux, which seems more correct to me.

So a change similar to what we did in TcpSocket should do the trick.
Comment 7 Mathieu Lacage 2008-03-27 16:46:20 UTC
(In reply to comment #6)
> The bottom line is that SOMEONE has to retain a reference to the socket
> throughout its lifetime.  So far we have two alternatives, one being that the
> node retains references to its sockets (Gustavo's suggestion), the other being
> that the user of sockets (Applications or end-users) keeps a reference (current
> expectation).
> 
> An approach we took in Tcp for a similar problem was simply to count the
> references to the socket that live in the Ipv4Endpoint demux:
> http://code.nsnam.org/ns-3-dev/rev/4956586bd798
> 
> In UdpSocket::FinishBind, Note that "this" effectively copied into into the
> endpoint by passing the callback.  It is never Ref'd however.  Our fix was to
> pass a smart pointer into the callback, Ref'ing it in the process.
> 
> This way, when the endpoint is deallocated, the smart pointer goes out of scope
> and Unrefs the socket.  The assures that the lifetime of the socket is tied to
> the lifetime of its entry in the demux, which seems more correct to me.

What happens if a user releases the reference to the socket and he never calls Close ?

> 
> So a change similar to what we did in TcpSocket should do the trick.
> 

(In reply to comment #6)
> The bottom line is that SOMEONE has to retain a reference to the socket
> throughout its lifetime.  So far we have two alternatives, one being that the
> node retains references to its sockets (Gustavo's suggestion), the other being
> that the user of sockets (Applications or end-users) keeps a reference (current
> expectation).
> 
> An approach we took in Tcp for a similar problem was simply to count the
> references to the socket that live in the Ipv4Endpoint demux:
> http://code.nsnam.org/ns-3-dev/rev/4956586bd798
> 
> In UdpSocket::FinishBind, Note that "this" effectively copied into into the
> endpoint by passing the callback.  It is never Ref'd however.  Our fix was to
> pass a smart pointer into the callback, Ref'ing it in the process.
> 
> This way, when the endpoint is deallocated, the smart pointer goes out of scope
> and Unrefs the socket.  The assures that the lifetime of the socket is tied to
> the lifetime of its entry in the demux, which seems more correct to me.
> 
> So a change similar to what we did in TcpSocket should do the trick.

I was thinking along similar lines but, for UDP, the lifetime of the socket is much more problematic: i.e., it is not bound to an underlying stream which can be closed or not by the remote side. So, if the user forgets to call Close, the socket will live forever, even though it has data to send and cannot forward its incoming data to anyone.

I might be worried about a non-issue though.
Comment 8 Mathieu Lacage 2008-03-27 16:52:16 UTC
(In reply to comment #5)
> I agree that it is not really intuitive the way the code now works, as
> exemplified by this test case.  I as a user would expect that if I used a
> socket and passed data to it, I could delete my local reference to the socket
> afterwards, just as I could lose reference to the descriptor in real code
> without destroying the underlying packet.

Sure, however, what is happening in the real world is that it is your application which keeps track of open file descriptors and the kernel frees them when the application dies. The problem here, is that you have associate your socket to a user context to be able to destroy it when the user context disapears and the user context in ns3 is the Application class.

Comment 9 Rajib Bhattacharjea 2008-03-27 19:36:56 UTC
(In reply to comment #7)
[snip]
> What happens if a user releases the reference to the socket and he never calls
> Close ?
> 
> I was thinking along similar lines but, for UDP, the lifetime of the socket is
> much more problematic: i.e., it is not bound to an underlying stream which can
> be closed or not by the remote side. So, if the user forgets to call Close, the
> socket will live forever, even though it has data to send and cannot forward
> its incoming data to anyone.
> 
> I might be worried about a non-issue though.

I think that the situation you are describing would become an error on the user's part under my proposal.  In the real world, if you don't call close on a socket, doesn't it persist indefinitely?  From this perspective, this is the correct behavior.

The Socket WILL continue to deliver data up to the application however via the callback mechanism.  The Socket persists and has a pointer to the application upcall.

And luckily for us, Simulator::Destroy will eventually lead to UdpL4Protocol::DoDispose, which will delete it's demux, the destructor of which will delete each endpoint, the destructor of which will release the last reference to the Socket (the ones that live in the endpoint destroy and rx callbacks).  So this is shouldn't leak memory from a valgrind perspective either.
Comment 10 Gustavo J. A. M. Carneiro 2008-03-27 19:49:57 UTC
(In reply to comment #9)
> (In reply to comment #7)
> [snip]
> > What happens if a user releases the reference to the socket and he never calls
> > Close ?
> > 
> > I was thinking along similar lines but, for UDP, the lifetime of the socket is
> > much more problematic: i.e., it is not bound to an underlying stream which can
> > be closed or not by the remote side. So, if the user forgets to call Close, the
> > socket will live forever, even though it has data to send and cannot forward
> > its incoming data to anyone.
> > 
> > I might be worried about a non-issue though.
> 
> I think that the situation you are describing would become an error on the
> user's part under my proposal.

Hey, your proposal is the same as mine :-)

I just did not understand the implementation very well...

>  In the real world, if you don't call close on a
> socket, doesn't it persist indefinitely?  From this perspective, this is the
> correct behavior.

Agreed, a socket should live until told to die.  A smart pointer going out of scope should not be enough to make a socket die.

> 
> The Socket WILL continue to deliver data up to the application however via the
> callback mechanism.  The Socket persists and has a pointer to the application
> upcall.
> 
> And luckily for us, Simulator::Destroy will eventually lead to
> UdpL4Protocol::DoDispose, which will delete it's demux, the destructor of which
> will delete each endpoint, the destructor of which will release the last
> reference to the Socket (the ones that live in the endpoint destroy and rx
> callbacks).  So this is shouldn't leak memory from a valgrind perspective
> either.

Indeed.

IMHO a socket, once created, should remain under custodian of its factory.  The factory should remove its reference to the socket when either:
 1. the factory dies (and the factory dies when the node is disposed, at the end of the simulation)
 2. it is explicitly closed, by calling Socket::Close ()
 3. the remote end point closes the connection, for connection oriented sockets.

I don't think this costs much in terms of memory (at most 16 bytes per socket; that's less than 1.6 MiB for 10^5 sockets).  And like Raj said, having to close() sockets is already what happens in the real world.
Comment 11 Rajib Bhattacharjea 2008-03-27 20:48:20 UTC
Created attachment 119 [details]
This is Gustavo's test case worked out in full.
Comment 12 Rajib Bhattacharjea 2008-03-27 20:49:49 UTC
(In reply to comment #11)
> Created an attachment (id=119) [edit]
> This is gustavo
> 

Should read "This is Gustavo's test case worked out in full"
Comment 13 Rajib Bhattacharjea 2008-03-27 20:58:21 UTC
Created attachment 120 [details]
Proposed solution

To clarify to Gustavo:

The implementation is as follows:  Instead of having the creator of the UdpSocket (what you call the factory) maintain a reference to it and consuming an extra amount of memory, this approach effectively Ref's the socket one more time to account for the pointer that exists to the socket in the endpoint callbacks (Ipv4EndPoint::m_rxCallback and m_destroyCallback).  From a reference counting perspective, I contend this is the right thing to do anyway.

This fix, while somewhat orthogonal to the issue at hand, also happens to solve the problem.
Comment 14 Tom Henderson 2008-03-28 00:12:14 UTC
> IMHO a socket, once created, should remain under custodian of its factory.  The
> factory should remove its reference to the socket when either:
>  1. the factory dies (and the factory dies when the node is disposed, at the
> end of the simulation)
>  2. it is explicitly closed, by calling Socket::Close ()
>  3. the remote end point closes the connection, for connection oriented
> sockets.
> 
> I don't think this costs much in terms of memory (at most 16 bytes per socket;
> that's less than 1.6 MiB for 10^5 sockets).  And like Raj said, having to
> close() sockets is already what happens in the real world.
> 

Yes, but I think it is not completely clear cut.  We are debating the issue of what happens if the user forgets to close the socket.  In your proposal, we could leak sockets (descriptors) in such a case.  As presently implemented, we would delete the socket instead of closing it.  Neither really captures what happens in real code, in which close() is called when the process goes away before the user closed it.

Your and Raj's proposal would err on the side of keeping sockets around, and I would be fine with that, because I think it would be less intuitive to a user to have sockets disappear without even an implicit close() ever having occurred.  This is probably more relevant to consider with TCP but might be a problem with UDP if we ever model the case where the UDP socket didn't immediately send the datagram for some reason. 

there is the separate issue of whether ns-3 should do something when methods like Test::Run() are called before Simulation::Run() is called.  Clearly, it will provide undefined behavior to use parts of the API outside of a running simulation context.  Do we want to try to guard against this misuse somehow?

Comment 15 Gustavo J. A. M. Carneiro 2008-03-28 06:15:00 UTC
(In reply to comment #13)
> Created an attachment (id=120) [edit]
> Proposed solution
> 
> To clarify to Gustavo:
> 
> The implementation is as follows:  Instead of having the creator of the
> UdpSocket (what you call the factory) maintain a reference to it and consuming
> an extra amount of memory, this approach effectively Ref's the socket one more
> time to account for the pointer that exists to the socket in the endpoint
> callbacks (Ipv4EndPoint::m_rxCallback and m_destroyCallback).  From a reference
> counting perspective, I contend this is the right thing to do anyway.
> 
> This fix, while somewhat orthogonal to the issue at hand, also happens to solve
> the problem.

That sounds like a good solution as well, although it has slightly different semantics from what I proposed.  Under your solution, a reference is kept to the socket as long as there is a (rx,tx,destroy) callback associated with the socket.  That makes sense.  A socket with no callbacks and which goes out of scope should just be freed, as it will not prevent anything from working.
Comment 16 Gustavo J. A. M. Carneiro 2008-03-28 06:34:46 UTC
(In reply to comment #14)
> Yes, but I think it is not completely clear cut.  We are debating the issue of
> what happens if the user forgets to close the socket.  In your proposal, we
> could leak sockets (descriptors) in such a case.  As presently implemented, we
> would delete the socket instead of closing it.  Neither really captures what
> happens in real code, in which close() is called when the process goes away
> before the user closed it.

If you wanted to capture what _really_ happens in the real world, then sockets would always have to belong to applications, and be conceptually attached to applications:

  Ptr<Socket> SocketFactory::CreateSocket (Ptr<Application> application);

Since in the real world sockets are created by _processes_ making a system call into the kernel.  The kernel would implicitly receive the process ID as "parameter" and would associate the newly created socket with the process.

In NS-3, because users do not want to have to write new application classes (ugh!), we would need a 'generic' application.

Ptr<GenericApplication> Node::CreateApplication ();

Applications would then be release their own sockets when they are destroyed.

I think the above model would mimick best what happens in real world.  However, if applications are only destroyed when a node is destroyed, or when the programmer explicitly destroys them, their sockets will consequently also only be destroyed when the application is explicitly destroyed or when the node is destroyed.  And so this ultimately reallistic API turns is reduced to my proposed solution where sockets are destroyed explicitly or when the Node is destroyed.

I am in favour of making the Application "middle man" optional (personally I tend to never use it), and just allow sockets to be used directly.

> 
> Your and Raj's proposal would err on the side of keeping sockets around, and I
> would be fine with that, because I think it would be less intuitive to a user
> to have sockets disappear without even an implicit close() ever having
> occurred.  This is probably more relevant to consider with TCP but might be a
> problem with UDP if we ever model the case where the UDP socket didn't
> immediately send the datagram for some reason. 

What if I create a OnOff application and forget to dispose it?  It will keep on sending traffic forever...  same problem.

> 
> there is the separate issue of whether ns-3 should do something when methods
> like Test::Run() are called before Simulation::Run() is called.  Clearly, it
> will provide undefined behavior to use parts of the API outside of a running
> simulation context.  Do we want to try to guard against this misuse somehow?
> 

The logical solution would be:
1. Most APIs, like socket creation, fail an assertion when called and the simulator not running;
2. It would force programmers to do the simulation initialization stuff in a callback, scheduled with Simulator::ScheduleNow;

But I don't think that would help.  The Simulation initialization function would still create a local variable smart pointer, create the socket, and it would still go out of scope and destroy the socket before time.
Comment 17 Gustavo J. A. M. Carneiro 2008-04-24 10:46:51 UTC
I just wanted to say that today I found out the same socket error by the same student again, even after I explained the problem a few weeks ago.

But I give up trying to convince anyone anymore :-(
Comment 18 Gustavo J. A. M. Carneiro 2008-04-24 16:11:30 UTC
Raj, after today's conference call, you have the green light to commit your patch.
Comment 19 Gustavo J. A. M. Carneiro 2008-05-28 07:24:26 UTC
Is there anything blocking the patch?
Comment 20 Mathieu Lacage 2008-05-28 11:14:37 UTC
(In reply to comment #19)
> Is there anything blocking the patch?

I will happily apply it if someone can confirm that we have no valgrind-reported memory leak regressions

Comment 21 Gustavo J. A. M. Carneiro 2008-05-28 12:04:01 UTC
Created attachment 138 [details]
raj's patch updated to ns-3-dev

Regarding valgrind, waf check --valgrind does not report any problem, with or without the patch.  waf regression --valgrind reports errors related to packet tags, with or without the patch.
Comment 22 Mathieu Lacage 2008-05-28 12:13:59 UTC
(In reply to comment #21)

> Regarding valgrind, waf check --valgrind does not report any problem, with or
> without the patch.  waf regression --valgrind reports errors related to packet
> tags, with or without the patch.

Please, file a separate bug about the latter problem and merge this patch.



Comment 23 Gustavo J. A. M. Carneiro 2008-05-28 12:53:31 UTC
Filed bug #192 about the regressions.
Patch pushed.