Bugzilla – Bug 780
OLSR doesn't converge
Last modified: 2009-12-21 11:10:43 UTC
Created attachment 702 [details] How to reproduce OLSR is observed to not converge in simple 3-nodes varying topology. Description by Abdul Jabbar (a little bit modified): We place 3 nodes in a linear fashion. 1 ----------------- 2 ----------------- 3 (0,0) (250,0) (500,0) d=250 m d=250m After an initial settling time of 50 seconds (just to be safe), we start both the traffic and mobility (constant velocity mobility model via SetVelocity method). The transmission range is set to 300-350 meters and we send ping traffic from node 1 to node 2 staring from t = 50 seconds. We then move nodes 2 and 3 such that they end up interchanging their positions... First we bring node 3 very close to node 2. In fact, we bring 2 to the same node coordinates as 3. 1 ----------------- 2 -3 And then move the node 2 away until it reaches the original position of node 3. 1 ----------------- 3 ----------------- 2 (0,0) (250,0) (500,0) At this point, we set the velocity of all the nodes to zero, till the end of the simulation. We let the simulation run for couple hundred seconds after this point. And OLSR still doesn't converge on the routes! Note that throughout the simulation, there are no network partitions. There is always a path between every node pair. Test program is attached. Current PING summary: --- 10.1.1.2 ping statistics --- 150 packets transmitted, 69 received, 54% packet loss, time 149999ms rtt min/avg/max/mdev = 2/2.058/6/0.4815 ms Expected result: 0% packet loss. Note, that the expected result is achieved when OLSR is changed to AODV.
Created attachment 703 [details] patch to enable more debug logging for OLSR Here is a start of a patch to improve OLSR debugging capability, which adds a public method RoutingProtocol::Dump() which dumps to the logging output at log level DEBUG. This method can be scheduled in the test program by including #include "ns3/olsr-routing-protocol.h" and by adding statements such as: Ptr<olsr::RoutingProtocol> rp0 = adhocNodes.Get (0)->GetObject <olsr::RoutingProtocol> (); NS_ASSERT (rp0); Ptr<olsr::RoutingProtocol> rp1 = adhocNodes.Get (1)->GetObject <olsr::RoutingProtocol> (); NS_ASSERT (rp1); Ptr<olsr::RoutingProtocol> rp2 = adhocNodes.Get (2)->GetObject <olsr::RoutingProtocol> (); NS_ASSERT (rp2); Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp0); Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp1); Simulator::Schedule (Seconds (100.0), &olsr::RoutingProtocol::Dump, rp2); then try: NS_LOG="OlsrRoutingProtocol" ./waf --run abdul
I haven't looked into the cause yet, but here are the symptoms. The pings are not leaving the leftmost node (numbered 1 in the above ascii art) due to a route lookup failure. Before the mobility occurs, this is what a dump on node 1 looks like: 100s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1 100s -1 [node 0] OlsrRoutingProtocol:Dump(): Neighbor set 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.2 status:1 willingness:3 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3 status:1 willingness:3 100s -1 [node 0] OlsrRoutingProtocol:Dump(): Two-hop neighbor set 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.2->10.1.1.3 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.2->10.1.1.1 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3->10.1.1.2 100s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3->10.1.1.1 100s -1 [node 0] OlsrRoutingProtocol:Dump(): Routing table 100s -1 [node 0] OlsrRoutingProtocol:Dump(): dest=10.1.1.2 --> next=10.1.1.2 via interface 1 100s -1 [node 0] OlsrRoutingProtocol:Dump(): dest=10.1.1.3 --> next=10.1.1.3 via interface 1 After the nodes have switched places (at time 150), here is a repeat of the dump on node 1: 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.2 status:0 willingness:3 150s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3 status:1 willingness:3 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Two-hop neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3->10.1.1.2 150s -1 [node 0] OlsrRoutingProtocol:Dump(): 10.1.1.3->10.1.1.1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Routing table 150s -1 [node 0] OlsrRoutingProtocol:Dump(): dest=10.1.1.3 --> next=10.1.1.3 via interface 1 The next place I would look at is why node 10.1.1.2 is not in the routing table at time 150 despite being in the two-hop neighbor set of node 10.1.1.3.
Tom's patch does not take into account that 2-hop neighbor tuples may exist in an "expired" state. Also modified to use the existing operator << for printing the tuples, so that all fields are printed. The result at t=150 is: 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): NeighborTuple(neighborMainAddr=10.1.1.2, status=NOT_SYM, willingness=3) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): NeighborTuple(neighborMainAddr=10.1.1.3, status=SYM, willingness=3) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Two-hop neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Routing table 150s -1 [node 0] OlsrRoutingProtocol:Dump(): dest=10.1.1.3 --> next=10.1.1.3 via interface 1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): So, basically node 0 has _no_ 2-hop neighbors whatsoever. Moreover, there's a symmetric link to node 10.1.1.3, therefore it is added to the routing table, but the link to node 10.1.1.2 is not symmetric, i.e. we can hear them but they can't hear us. The problem is not in the routing table computation, it is elsewhere. Link sensing is failing...
Created attachment 704 [details] patch to enable more debug logging for OLSR (modified) Forget what I said before about 2-hop neighbors, I had a small bug in my modification :P Here's the correct patch and what it dumps: 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Dumping for node with main address 10.1.1.1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): NeighborTuple(neighborMainAddr=10.1.1.2, status=NOT_SYM, willingness=3) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): NeighborTuple(neighborMainAddr=10.1.1.3, status=SYM, willingness=3) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Two-hop neighbor set 150s -1 [node 0] OlsrRoutingProtocol:Dump(): TwoHopNeighborTuple(neighborMainAddr=10.1.1.3, twoHopNeighborAddr=10.1.1.2, expirationTime=154478525839ns) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): TwoHopNeighborTuple(neighborMainAddr=10.1.1.3, twoHopNeighborAddr=10.1.1.1, expirationTime=154478525839ns) 150s -1 [node 0] OlsrRoutingProtocol:Dump(): Routing table 150s -1 [node 0] OlsrRoutingProtocol:Dump(): dest=10.1.1.3 --> next=10.1.1.3 via interface 1 150s -1 [node 0] OlsrRoutingProtocol:Dump(): So, here's the picture that node 0 sees: 10.1.1.1 --- 10.1.1.3 --- 10.1.1.2 10.1.1.1 ---> 10.1.1.2 (assymetric link) Based on this, the routing table should indeed have also a "10.1.1.2 via 10.1.1.3" entry. Bug in RoutingTableComputation.
Created attachment 705 [details] bug fix
(In reply to comment #5) > Created an attachment (id=705) [details] > bug fix I confirm that patch fixes the problem. Gustavo, could you convert test case, attached to this bug to trace-based regression test in src/routing/olsr/test/ ?
changeset: 5867:8c65da328ae9 tag: tip user: Gustavo J. A. M. Carneiro <gjc@inescporto.pt> date: Mon Dec 21 16:07:08 2009 +0000 summary: Fix Bug 780 (problem in RoutingTableComputation with asymetric links), while adding debugging methods to OLSR. Thanks Abdul Jabbar, Pavel Boyko, and Tom Henderson for test case and debug code.