Bug 802

Summary: Minstrel algorithm causes segmentation fault
Product: ns-3 Reporter: Kirill Andreev <andreev>
Component: wifiAssignee: Nicola Baldo <nicola>
Status: RESOLVED FIXED    
Severity: critical CC: dnlove, jpelkey, mathieu.lacage, nicola, ns-bugs, pablosproject
Priority: P2    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: Backtrace
avoid assert
fixed index out of bound and switch minstrel to low latency
fixed index out of bound and switch minstrel to low latency
minstrel valgrind output
minstrel_third_output
third new valgrind ouput
patch for bug 802 and 919
backtrace
Fixed Bug 802 and Bug 919
backtrace
backtrace
paolo's third modified example
backtrace with leak memory informatio

Description Kirill Andreev 2010-02-01 09:50:30 UTC
Created attachment 744 [details]
Backtrace

When I tried to set Minstrel to mesh scenario, it has failed with segmentation fault.
Comment 1 duy 2010-03-10 14:14:16 UTC
I just ran ns-3-dev mesh example with 

mesh.SetRemoteStationManager("ns3::MinstrelWifiManager");

and I don't see the crash.  Sorry I had no idea you filed this bug a month ago.  Please cc me.
Comment 2 Nicola Baldo 2010-04-09 13:58:12 UTC
I can't make minstrel work in general:

nicola@pcnbaldo:~/locale/ns-3-dev$ ./waf --run examples/wireless/multirate
Waf: Entering directory `/home/nicola/locale/ns-3-dev/build'
Waf: Leaving directory `/home/nicola/locale/ns-3-dev/build'
'build' finished successfully (1.348s)
Scenario: 4
Rts Threshold: 2200
Name:  minstrel
Rate:  ns3::MinstrelWifiManager
Routing: 0
Mobility: 0
assert failed. file=../src/devices/wifi/wifi-remote-station-manager.cc, line=353, cond="found"

note that examples/wireless/multirate is not run by test.py, so no surprise that we didn't spot the problem. I have similar problems trying to convert other sim programs so that they use minstrel.

I get the impression that this might be due to changeset 6065 which fixed Bug 602... any opinions with respect to this?
Comment 3 duy 2010-04-09 16:00:29 UTC
we need to have it run by test.py, I am working on a fix for it now.
Comment 4 duy 2010-04-10 18:13:16 UTC
It seems like Onoe fails as well.  I have traced down to the following source of problems:

src/devices/wifi/minstrel-wifi-manager.cc

bool
MinstrelWifiManager::IsLowLatency (void) const
{
  return false;
}

and  

src/devices/wifi/wifi-remote-station-manager.cc:353

 if (!IsLowLatency ())
    {
      // Note: removing the packet below is wrong: what happens in case of retransmissions ???
      TxModeTag tag;
      bool found;
      found = ConstCast<Packet> (packet)->RemovePacketTag (tag);
      NS_ASSERT (found);
      return tag.GetDataMode ();
    }

I still don't understand how we are modeling the LowLantency and not low latency(e.g. removing TxModeTag)
Comment 5 Mathieu Lacage 2010-04-16 03:02:31 UTC
Created attachment 833 [details]
avoid assert

This avoids the backtrace but we hit a crash later in minstrel itself which appears to be trying to access an element past the end of an array in FindRate line 576.

==22131== Invalid read of size 1
==22131==    at 0x5485610: ns3::HighPrecision::Compare(ns3::HighPrecision const&) const (high-precision-128.h:227)
==22131==    by 0x54A9AEC: bool ns3::operator><1>(ns3::TimeUnit<1> const&, ns3::TimeUnit<1> const&) (nstime.h:257)
==22131==    by 0x5A802EA: ns3::MinstrelWifiManager::FindRate(ns3::MinstrelWifiRemoteStation*) (minstrel-wifi-manager.cc:576)
==22131==    by 0x5A81E01: ns3::MinstrelWifiManager::DoReportDataOk(ns3::WifiRemoteStation*, double, ns3::WifiMode, double) (minstrel-wifi-manager.cc:429)
Comment 6 Mathieu Lacage 2010-04-16 03:07:48 UTC
I did not test with onoe
Comment 7 Mathieu Lacage 2010-04-16 03:43:13 UTC
the later crash appears to come from some minstrel bugs. i.e., 
MinstrelWifiManager::InitSampleTable initializes the sample table on line 787: 

station->m_sampleTable[newIndex][col] = i+1;

Note the '+1' above: it sets an index in the sample table equal to GetNSupported (station) which leads GetNextSample later to return this invalid index in the minstrelTable.

Removing the '+1' or adding a (i+1)%GetNSupported(station) makes the code not crash for me.

I guess that this is something for duy to look at now.
Comment 8 duy 2010-04-16 11:41:14 UTC
thanks Mathieu, it seems like I didn't check for array out of bound for that case.  I am looking into it now.
Comment 9 duy 2010-04-17 02:36:14 UTC
Onoe and Amrr works fine with Mathieu's patch.

I agree that either removing the '+1' or adding a (i+1)%GetNSupported(station) should fix  the segmentation fault.

However, minstrel does not seem to produce correct results.  I am away this weekend, but I will look into it next week and hopefully find a fix for it soon.
Comment 10 Nicola Baldo 2010-04-17 05:51:20 UTC
(In reply to comment #9)
> Onoe and Amrr works fine with Mathieu's patch.

I confirm this as well.
I just pushed that patch (changeset:   6241:d9a65be745f0)


> I agree that either removing the '+1' or adding a (i+1)%GetNSupported(station)
> should fix  the segmentation fault.
> 
> However, minstrel does not seem to produce correct results.  I am away this
> weekend, but I will look into it next week and hopefully find a fix for it
> soon.

ok, let's wait until you have the chance to look into this issue.
Comment 11 duy 2010-04-20 17:23:14 UTC
Created attachment 840 [details]
fixed index out of bound and switch minstrel to low latency

Going back to the low latency vs high latency device.  I am still not clear on the definition.  To me, Minstrel seems to model a low latency device because it uses a multi-retry chain(sort out 4 rates ready to be used in case of failures) to combat the delay of the next data packet.  The packets feedback are used to update the statistics table.  Minstrel does not work properly if it is set to high latency.
Comment 12 duy 2010-04-20 17:39:41 UTC
Created attachment 841 [details]
fixed index out of bound and switch minstrel to low latency

removing "+1" instead.
Comment 13 Mathieu Lacage 2010-04-21 02:55:45 UTC
nak. minstrel is high latency. See my original arf/aarf paper for a definition.
Comment 14 Nicola Baldo 2010-04-21 07:16:26 UTC
(In reply to comment #13)
> nak. minstrel is high latency. See my original arf/aarf paper for a definition.

I guess you mean that the original minstrel in madwifi is high latency, right?


(In reply to comment #11)
> Minstrel does not work properly if it is set to
> high latency.

I get the impression that the minstrel implementation in ns-3 is low latency, in that you update statistics everytime DoReportDataOk and DoReportDataFailed are called. These updated stats are then used for selecting the rate for the next transmission attempt.

Duy, would it be possible to change how stats are updated so that minstrel can be high latency as it is in the real world? 

Ideally, if it were possible to support both low and high latency by setting the corresponding attribute, it would be great.
Comment 15 Mathieu Lacage 2010-04-21 07:29:07 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > nak. minstrel is high latency. See my original arf/aarf paper for a definition.
> 
> I guess you mean that the original minstrel in madwifi is high latency, right?

Yes, the original minstrel algorithm, was designed to work on high latency hardware.

> (In reply to comment #11)
> > Minstrel does not work properly if it is set to
> > high latency.
> 
> I get the impression that the minstrel implementation in ns-3 is low latency,
> in that you update statistics everytime DoReportDataOk and DoReportDataFailed
> are called. These updated stats are then used for selecting the rate for the
> next transmission attempt.
> 
> Duy, would it be possible to change how stats are updated so that minstrel can
> be high latency as it is in the real world? 
> 
> Ideally, if it were possible to support both low and high latency by setting
> the corresponding attribute, it would be great.

I did that at some point for amrr/onoe but it's really hard to get right so, I would advise against it.
Comment 16 Nicola Baldo 2010-04-21 09:42:52 UTC
> I did that at some point for amrr/onoe but it's really hard to get right so, I
> would advise against it.

Ok, so I propose this solution for now:

1) apply duy's latest patch

2) update minstrel's doxygen documentation (or rather create it, since there is nothing currently) so that we clearly say that the implementation is low-latency unlike the one in madwifi.

What do you think?
Comment 17 Mathieu Lacage 2010-04-21 09:46:37 UTC
(In reply to comment #16)
> 2) update minstrel's doxygen documentation (or rather create it, since there is
> nothing currently) so that we clearly say that the implementation is
> low-latency unlike the one in madwifi.

I don't care much about minstrel myself but I feel that to make the ns-3 model useful, it should model the original minstrel algorithm so, if we do the above, we should file a bug against ns-3 saying that we want to make our model high latency. It's your call though.
Comment 18 Nicola Baldo 2010-04-21 10:51:24 UTC
> I don't care much about minstrel myself but I feel that to make the ns-3 model
> useful, it should model the original minstrel algorithm so, if we do the above,
> we should file a bug against ns-3 saying that we want to make our model high
> latency. It's your call though.

ok now we have bug 889 to track that issue.

As for the current bug, I'll close it as soon as we are allowed to push duy's patch (we should be in code freeze now).
Comment 19 duy 2010-04-21 11:22:53 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > nak. minstrel is high latency. See my original arf/aarf paper for a definition.
> I guess you mean that the original minstrel in madwifi is high latency, right?
> (In reply to comment #11)
> > Minstrel does not work properly if it is set to
> > high latency.
> I get the impression that the minstrel implementation in ns-3 is low latency,
> in that you update statistics everytime DoReportDataOk and DoReportDataFailed
> are called. These updated stats are then used for selecting the rate for the
> next transmission attempt.
> Duy, would it be possible to change how stats are updated so that minstrel can
> be high latency as it is in the real world? 

Yes, I will do this for the next release because it's a bit late to make major changes to minstrel implementation.  



> Ideally, if it were possible to support both low and high latency by setting
> the corresponding attribute, it would be great.
Comment 20 Nicola Baldo 2010-04-21 11:53:27 UTC
(In reply to comment #19)
> Yes, I will do this for the next release because it's a bit late to make major
> changes to minstrel implementation.  
> 

Thank you for your commitment!
Let's continue that discussion on bug 889
Comment 21 Nicola Baldo 2010-04-22 12:53:02 UTC
changeset   6268:84e114d34b89
Comment 22 Paolo Tagliani 2010-05-25 04:27:55 UTC
Hi everybody. I'm working on wifi network, and Minstrel always crash for me. 
I modified the third.cc example in the tutorial and I only change the wifi manager as Minstrel, set all the station on the wifi as udp client and send 3 udp echo packet. If I swutch the wifi manager, for example, as Aarp the simulation works fine.
I post under my code:


#include "ns3/core-module.h"
#include "ns3/simulator-module.h"
#include "ns3/node-module.h"
#include "ns3/helper-module.h"
#include "ns3/wifi-module.h"
#include "ns3/mobility-module.h"

// Default Network Topology
//
//   Wifi 10.1.3.0
//                 AP   
//  *    *    *    *
//  |    |    |    |    10.1.1.0
// n5   n6   n7   n0 -------------- n1   n2   n3   n4
//                   point-to-point  |    |    |    |
//                                   ================
//                                     LAN 10.1.2.0

using namespace ns3;

NS_LOG_COMPONENT_DEFINE ("ThirdScriptExample");

int 
main (int argc, char *argv[])
{
  bool verbose = true;
  uint32_t nCsma = 3;
  uint32_t nWifi = 3;

  CommandLine cmd;
  cmd.AddValue ("nCsma", "Number of \"extra\" CSMA nodes/devices", nCsma);
  cmd.AddValue ("nWifi", "Number of wifi STA devices", nWifi);
  cmd.AddValue ("verbose", "Tell echo applications to log if true", verbose);

  cmd.Parse (argc,argv);

  if (verbose)
    {
      LogComponentEnable("UdpEchoClientApplication", LOG_LEVEL_INFO);
      LogComponentEnable("UdpEchoServerApplication", LOG_LEVEL_INFO);
    }

  NodeContainer p2pNodes;
  p2pNodes.Create (2);

  PointToPointHelper pointToPoint;
  pointToPoint.SetDeviceAttribute ("DataRate", StringValue ("5Mbps"));
  pointToPoint.SetChannelAttribute ("Delay", StringValue ("2ms"));

  NetDeviceContainer p2pDevices;
  p2pDevices = pointToPoint.Install (p2pNodes);

  NodeContainer csmaNodes;
  csmaNodes.Add (p2pNodes.Get (1));
  csmaNodes.Create (nCsma);

  CsmaHelper csma;
  csma.SetChannelAttribute ("DataRate", StringValue ("100Mbps"));
  csma.SetChannelAttribute ("Delay", TimeValue (NanoSeconds (6560)));

  NetDeviceContainer csmaDevices;
  csmaDevices = csma.Install (csmaNodes);

  NodeContainer wifiStaNodes;
  wifiStaNodes.Create (nWifi);
  NodeContainer wifiApNode = p2pNodes.Get (0);

  YansWifiChannelHelper channel = YansWifiChannelHelper::Default ();
  YansWifiPhyHelper phy = YansWifiPhyHelper::Default ();
  phy.SetChannel (channel.Create ());

  WifiHelper wifi = WifiHelper::Default ();
  wifi.SetRemoteStationManager ("ns3::MinstrelWifiManager");

  NqosWifiMacHelper mac = NqosWifiMacHelper::Default ();
  
  Ssid ssid = Ssid ("ns-3-ssid");
  mac.SetType ("ns3::NqstaWifiMac", 
    "Ssid", SsidValue (ssid),
    "ActiveProbing", BooleanValue (false));

  NetDeviceContainer staDevices;
  staDevices = wifi.Install (phy, mac, wifiStaNodes);

  mac.SetType ("ns3::NqapWifiMac", 
    "Ssid", SsidValue (ssid));

  NetDeviceContainer apDevices;
  apDevices = wifi.Install (phy, mac, wifiApNode);

  MobilityHelper mobility;

  mobility.SetPositionAllocator ("ns3::GridPositionAllocator",
    "MinX", DoubleValue (0.0),
    "MinY", DoubleValue (0.0),
    "DeltaX", DoubleValue (5.0),
    "DeltaY", DoubleValue (10.0),
    "GridWidth", UintegerValue (3),
    "LayoutType", StringValue ("RowFirst"));

  mobility.SetMobilityModel ("ns3::RandomWalk2dMobilityModel",
    "Bounds", RectangleValue (Rectangle (-50, 50, -50, 50)));
  mobility.Install (wifiStaNodes);

  mobility.SetMobilityModel ("ns3::ConstantPositionMobilityModel");
  mobility.Install (wifiApNode);

  InternetStackHelper stack;
  stack.Install (csmaNodes);
  stack.Install (wifiApNode);
  stack.Install (wifiStaNodes);

  Ipv4AddressHelper address;

  address.SetBase ("10.1.1.0", "255.255.255.0");
  Ipv4InterfaceContainer p2pInterfaces;
  p2pInterfaces = address.Assign (p2pDevices);

  address.SetBase ("10.1.2.0", "255.255.255.0");
  Ipv4InterfaceContainer csmaInterfaces;
  csmaInterfaces = address.Assign (csmaDevices);

  address.SetBase ("10.1.3.0", "255.255.255.0");
  address.Assign (staDevices);
  address.Assign (apDevices);

  UdpEchoServerHelper echoServer (9);

  ApplicationContainer serverApps = echoServer.Install (csmaNodes.Get (nCsma));
  serverApps.Start (Seconds (1.0));
  serverApps.Stop (Seconds (10.0));

  UdpEchoClientHelper echoClient (csmaInterfaces.GetAddress (nCsma), 9);
  echoClient.SetAttribute ("MaxPackets", UintegerValue (3));
  echoClient.SetAttribute ("Interval", TimeValue (Seconds (1.)));
  echoClient.SetAttribute ("PacketSize", UintegerValue (1024));

  ApplicationContainer clientApps = 
    echoClient.Install (wifiStaNodes);
  clientApps.Start (Seconds (2.0));
  clientApps.Stop (Seconds (10.0));

  Ipv4GlobalRoutingHelper::PopulateRoutingTables ();

  Simulator::Stop (Seconds (10.0));

  pointToPoint.EnablePcapAll ("third");
  phy.EnablePcap ("third", apDevices.Get (0));
  csma.EnablePcap ("third", csmaDevices.Get (0), true);

  Simulator::Run ();
  Simulator::Destroy ();
  return 0;
}
Comment 23 duy 2010-05-25 11:46:51 UTC
(In reply to comment #22)
> Hi everybody. I'm working on wifi network, and Minstrel always crash for me. 
> I modified the third.cc example in the tutorial and I only change the wifi
> manager as Minstrel, set all the station on the wifi as udp client and send 3
> udp echo packet. If I swutch the wifi manager, for example, as Aarp the
> simulation works fine.

Do you get similar back trace like this?  It doesn't seem it's coming from Minstrel but some libraries issues. I"ll take a look at it again.

Sent 1024 bytes to 10.1.2.4
*** glibc detected *** /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third: double free or corruption (out): 0x000000001a246900 ***
======= Backtrace: =========
/lib64/libc.so.6[0x34aa6722ef]
/lib64/libc.so.6(cfree+0x4b)[0x34aa67273b]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae3055cdf7f]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36Buffer7RecycleEPNS_10BufferDataE+0x185)[0x2ae3055d15a9]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36BufferD1Ev+0x260)[0x2ae3055d2e58]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36PacketD1Ev+0xdd)[0x2ae3055f93f9]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns314DefaultDeleterINS_6PacketEE6DeleteEPS1_+0x24)[0x2ae3055f9450]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns314SimpleRefCountINS_6PacketENS_5emptyENS_14DefaultDeleterIS1_EEE5UnrefEv+0x2e)[0x2ae3055f948a]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns33PtrINS_6PacketEED1Ev+0x27)[0x2ae3055f9541]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae305a400a3]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns314DefaultDeleterINS_9EventImplEE6DeleteEPS1_+0x27)[0x2ae30559ad81]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns314SimpleRefCountINS_9EventImplENS_5emptyENS_14DefaultDeleterIS1_EEE5UnrefEv+0x32)[0x2ae30559adb6]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns33PtrINS_9EventImplEEaSERKS2_+0x40)[0x2ae30559ae3a]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns37EventIdaSERKS0_+0x1d)[0x2ae30559ae7d]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns311YansWifiPhy18StartReceivePacketENS_3PtrINS_6PacketEEEdNS_8WifiModeENS_12WifiPreambleE+0xd47)[0x2ae305a3fd27]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns315YansWifiChannel7ReceiveEjNS_3PtrINS_6PacketEEEdNS_8WifiModeENS_12WifiPreambleE+0x64)[0x2ae305a480dc]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae305a481c7]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns39EventImpl6InvokeEv+0x2f)[0x2ae30558746f]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns320DefaultSimulatorImpl15ProcessOneEventEv+0x226)[0x2ae3055a3820]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns320DefaultSimulatorImpl3RunEv+0x1f)[0x2ae3055a386b]
/cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns39Simulator3RunEv+0x118)[0x2ae30558e6b2]
/cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third[0x40e08c]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x34aa61d994]
/cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third(__gxx_personality_v0+0x449)[0x409849]
======= Memory map: ========
00400000-00416000 r-xp 00000000 00:13 15237718                           /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third
00615000-00616000 rw-p 00015000 00:13 15237718                           /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third
1a1a8000-1a24d000 rw-p 1a1a8000 00:00 0                                  [heap]
34aa200000-34aa21c000 r-xp 00000000 fd:00 2676766                        /lib64/ld-2.5.so
34aa41b000-34aa41c000 r--p 0001b000 fd:00 2676766                        /lib64/ld-2.5.so
34aa41c000-34aa41d000 rw-p 0001c000 fd:00 2676766                        /lib64/ld-2.5.so
34aa600000-34aa74d000 r-xp 00000000 fd:00 2676776                        /lib64/libc-2.5.so
34aa74d000-34aa94d000 ---p 0014d000 fd:00 2676776                        /lib64/libc-2.5.so
34aa94d000-34aa951000 r--p 0014d000 fd:00 2676776                        /lib64/libc-2.5.so
34aa951000-34aa952000 rw-p 00151000 fd:00 2676776                        /lib64/libc-2.5.so
34aa952000-34aa957000 rw-p 34aa952000 00:00 0
34aaa00000-34aaa82000 r-xp 00000000 fd:00 2676837                        /lib64/libm-2.5.so
34aaa82000-34aac81000 ---p 00082000 fd:00 2676837                        /lib64/libm-2.5.so
34aac81000-34aac82000 r--p 00081000 fd:00 2676837                        /lib64/libm-2.5.so
34aac82000-34aac83000 rw-p 00082000 fd:00 2676837                        /lib64/libm-2.5.so
34aae00000-34aae02000 r-xp 00000000 fd:00 2676836                        /lib64/libdl-2.5.so
34aae02000-34ab002000 ---p 00002000 fd:00 2676836                        /lib64/libdl-2.5.so
34ab002000-34ab003000 r--p 00002000 fd:00 2676836                        /lib64/libdl-2.5.so
34ab003000-34ab004000 rw-p 00003000 fd:00 2676836                        /lib64/libdl-2.5.so
34ab200000-34ab216000 r-xp 00000000 fd:00 2676831                        /lib64/libpthread-2.5.so
34ab216000-34ab415000 ---p 00016000 fd:00 2676831                        /lib64/libpthread-2.5.so
34ab415000-34ab416000 r--p 00015000 fd:00 2676831                        /lib64/libpthread-2.5.so
34ab416000-34ab417000 rw-p 00016000 fd:00 2676831                        /lib64/libpthread-2.5.so
34ab417000-34ab41b000 rw-p 34ab417000 00:00 0
34ab600000-34ab614000 r-xp 00000000 fd:00 629619                         /usr/lib64/libz.so.1.2.3
34ab614000-34ab813000 ---p 00014000 fd:00 629619                         /usr/lib64/libz.so.1.2.3
34ab813000-34ab814000 rw-p 00013000 fd:00 629619                         /usr/lib64/libz.so.1.2.3
34aba00000-34aba07000 r-xp 00000000 fd:00 2676832                        /lib64/librt-2.5.so
34aba07000-34abc07000 ---p 00007000 fd:00 2676832                        /lib64/librt-2.5.so
34abc07000-34abc08000 r--p 00007000 fd:00 2676832                        /lib64/librt-2.5.so
34abc08000-34abc09000 rw-p 00008000 fd:00 2676832                        /lib64/librt-2.5.so
34aca00000-34aca59000 r-xp 00000000 fd:00 628437                         /usr/lib64/libsqlite3.so.0.8.6
34aca59000-34acc58000 ---p 00059000 fd:00 628437                         /usr/lib64/libsqlite3.so.0.8.6
34acc58000-34acc5b000 rw-p 00058000 fd:00 628437                         /usr/lib64/libsqlite3.so.0.8.6
34b5c00000-34b5d33000 r-xp 00000000 fd:00 635016                         /usr/lib64/libxml2.so.2.6.26
34b5d33000-34b5f33000 ---p 00133000 fd:00 635016                         /usr/lib64/libxml2.so.2.6.26
34b5f33000-34b5f3c000 rw-p 00133000 fd:00 635016                         /usr/lib64/libxml2.so.2.6.26
34b5f3c000-34b5f3d000 rw-p 34b5f3c000 00:00 0
34b7c00000-34b7c0d000 r-xp 00000000 fd:00 2676553                        /lib64/libgcc_s-4.1.2-20080825.so.1
34b7c0d000-34b7e0d000 ---p 0000d000 fd:00 2676553                        /lib64/libgcc_s-4.1.2-20080825.so.1
34b7e0d000-34b7e0e000 rw-p 0000d000 fd:00 2676553                        /lib64/libgcc_s-4.1.2-20080825.so.1
34bc800000-34bc8e6000 r-xp 00000000 fd:00 627764                         /usr/lib64/libstdc++.so.6.0.8
34bc8e6000-34bcae5000 ---p 000e6000 fd:00 627764                         /usr/lib64/libstdc++.so.6.0.8
34bcae5000-34bcaeb000 r--p 000e5000 fd:00 627764                         /usr/lib64/libstdc++.so.6.0.8
34bcaeb000-34bcaee000 rw-p 000eb000 fd:00 627764                         /usr/lib64/libstdc++.so.6.0.8
34bcaee000-34bcb00000 rw-p 34bcaee000 00:00 0
2ae304cd9000-2ae304cdb000 rw-p 2ae304cd9000 00:00 0
2ae304cdb000-2ae3060ab000 r-xp 00000000 00:13 15237739                   /cse/grads/duy/ns-3-dev/build/debug/libns3.so
2ae3060ab000-2ae3062aa000 ---p 013d0000 00:13 15237739                   /cse/grads/duy/ns-3-dev/build/debug/libns3.so
2ae3062aa000-2ae306347000 rw-p 013cf000 00:13 15237739                   /cse/grads/duy/ns-3-dev/build/debug/libns3.so
2ae306347000-2ae30634d000 rw-p 2ae306347000 00:00 0
2ae306367000-2ae30636c000 rw-p 2ae306367000 00:00 0
7fff34113000-7fff34128000 rw-p 7ffffffea000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
Comment 24 Mathieu Lacage 2010-05-25 11:48:53 UTC
(In reply to comment #23)

> Do you get similar back trace like this?  It doesn't seem it's coming from
> Minstrel but some libraries issues. I"ll take a look at it again.
> 
> Sent 1024 bytes to 10.1.2.4
> *** glibc detected ***

run the example in valgrind.
Comment 25 Paolo Tagliani 2010-05-25 13:22:34 UTC
Thath's my output(under). The strange thing is that if I only change the string "ns3::MinstrelWifiManager" with any wifi manager, all works perfect. I'm also having problem with minstrel on wifi net with multiple nodes that do a udp echo on client.



paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run scratch/thirdModificato
==32059== Memcheck, a memory error detector
==32059== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==32059== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==32059== Command: ./waf --run scratch/thirdModificato
==32059== 
--32059-- Valgrind options:
--32059--    --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp
--32059--    -v
--32059-- Contents of /proc/version:
--32059--   Linux version 2.6.32-22-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010
--32059-- Arch and hwcaps: AMD64, amd64-sse3-cx16
--32059-- Page sizes: currently 4096, max supported 4096
--32059-- Valgrind library directory: /usr/lib/valgrind
--32059-- Reading syms from /usr/bin/env (0x400000)
--32059-- Reading debug info from /usr/bin/env ..
--32059-- .. CRC mismatch (computed 2b996bfb wanted 50cb270b)
--32059--    object doesn't have a symbol table
--32059-- Reading syms from /lib/ld-2.11.1.so (0x4000000)
--32059-- Reading debug info from /lib/ld-2.11.1.so ..
--32059-- .. CRC mismatch (computed e1ab2e55 wanted 8e29b093)
--32059-- Reading debug info from /usr/lib/debug/lib/ld-2.11.1.so ..
--32059-- Reading syms from /usr/lib/valgrind/memcheck-amd64-linux (0x38000000)
--32059--    object doesn't have a dynamic symbol table
--32059-- Reading suppressions file: /usr/lib/valgrind/debian-libc6-dbg.supp
--32059-- Reading suppressions file: /usr/lib/valgrind/default.supp
--32059-- REDIR: 0x4018470 (strlen) redirected to 0x380402d7 (vgPlain_amd64_linux_REDIR_FOR_strlen)
--32059-- Reading syms from /usr/lib/valgrind/vgpreload_core-amd64-linux.so (0x4a22000)
--32059-- Reading syms from /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so (0x4c24000)
==32059== WARNING: new redirection conflicts with existing -- ignoring it
--32059--     new: 0x04018470 (strlen              ) R-> 0x04c28710 strlen
--32059-- REDIR: 0x40182e0 (index) redirected to 0x4c28320 (index)
--32059-- REDIR: 0x4018360 (strcmp) redirected to 0x4c28cf0 (strcmp)
--32059-- Reading syms from /lib/libc-2.11.1.so (0x4e2d000)
--32059-- Reading debug info from /lib/libc-2.11.1.so ..
--32059-- .. CRC mismatch (computed 998a00c2 wanted 045c8b93)
--32059-- Reading debug info from /usr/lib/debug/lib/libc-2.11.1.so ..
--32059-- REDIR: 0x4eb1ad0 (__GI_strrchr) redirected to 0x4c28140 (__GI_strrchr)
--32059-- REDIR: 0x4eb1aa0 (rindex) redirected to 0x4a225dc (_vgnU_ifunc_wrapper)
==32059== WARNING: new redirection conflicts with existing -- ignoring it
--32059--     new: 0x04eb1ad0 (__GI_strrchr        ) R-> 0x04c28110 rindex
--32059-- REDIR: 0x4eae5e0 (__GI_strcmp) redirected to 0x4c28ca0 (__GI_strcmp)
--32059-- REDIR: 0x4eb0010 (__GI_strlen) redirected to 0x4c286d0 (__GI_strlen)
--32059-- REDIR: 0x4eb0220 (__GI_strncmp) redirected to 0x4c28be0 (__GI_strncmp)
--32059-- REDIR: 0x4eae520 (__GI_strchr) redirected to 0x4c28220 (__GI_strchr)
--32059-- REDIR: 0x4eb5220 (strchrnul) redirected to 0x4c29a10 (strchrnul)
--32059-- REDIR: 0x4ea9520 (malloc) redirected to 0x4c27426 (malloc)
--32059-- REDIR: 0x4eb3350 (mempcpy) redirected to 0x4c29a80 (mempcpy)
--32059-- REDIR: 0x4eb3c30 (memcpy) redirected to 0x4c28dc0 (memcpy)
--32059-- REDIR: 0x4eaade0 (free) redirected to 0x4c27036 (free)
--32059-- REDIR: 0x4eb21e0 (memchr) redirected to 0x4c28d90 (memchr)
--32059-- REDIR: 0x4eaaf90 (realloc) redirected to 0x4c274d7 (realloc)
--32059-- REDIR: 0x4eb0060 (strnlen) redirected to 0x4c28630 (strnlen)
--32059-- REDIR: 0x4eb3990 (__GI_stpcpy) redirected to 0x4c296c0 (__GI_stpcpy)
--32059-- REDIR: 0x4eafa60 (__GI_strcpy) redirected to 0x4c28800 (__GI_strcpy)
--32059-- REDIR: 0x4eb51d0 (__GI___rawmemchr) redirected to 0x4c29a60 (__GI___rawmemchr)
--32059-- REDIR: 0x4eae4f0 (index) redirected to 0x4a225dc (_vgnU_ifunc_wrapper)
==32059== WARNING: new redirection conflicts with existing -- ignoring it
--32059--     new: 0x04eae520 (__GI_strchr         ) R-> 0x04c281e0 index
Waf: Entering directory `/home/paolo/Repos/ns-3-dev/build'
Waf: Leaving directory `/home/paolo/Repos/ns-3-dev/build'
'build' finished successfully (0.498s)
Command ['/home/paolo/Repos/ns-3-dev/build/debug/scratch/thirdModificato'] terminated with signal SIGSEGV. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>").
Comment 26 Mathieu Lacage 2010-05-25 14:26:29 UTC
(In reply to comment #25)
> paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run
> scratch/thirdModificato

No, try:
./waf --shell
valgrind ./build/debug/scratch/third
Comment 27 duy 2010-05-25 14:55:13 UTC
(In reply to comment #26)
> (In reply to comment #25)
> > paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run
> > scratch/thirdModificato
> 
> No, try:
> ./waf --shell
> valgrind ./build/debug/scratch/third

this is the output under valgrind.  I have no clues what's the problem.  Could this be related to minstrel memory leak problem in http://www.nsnam.org/bugzilla/﷒0﷓ ?  Any suggestions?


==3658== Memcheck, a memory error detector.
==3658== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==3658== Using LibVEX rev 1658, a library for dynamic binary translation.
==3658== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==3658== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==3658== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==3658== For more details, rerun with: -v
==3658==
./build/debug/scratch/third: error while loading shared libraries: libns3.so: ca                                                                                                 nnot open shared object file: No such file or directory
==3658== Jump to the invalid address stated on the next line
==3658==    at 0x2CE: ???
==3658==    by 0x34AA20D09B: _dl_signal_error (in /lib64/ld-2.5.so)
==3658==    by 0x34AA20C538: _dl_map_object_deps (in /lib64/ld-2.5.so)
==3658==    by 0x34AA203223: dl_main (in /lib64/ld-2.5.so)
==3658==    by 0x34AA21334A: _dl_sysdep_start (in /lib64/ld-2.5.so)
==3658==    by 0x34AA201387: _dl_start (in /lib64/ld-2.5.so)
==3658==    by 0x34AA200A77: (within /lib64/ld-2.5.so)
==3658==  Address 0x2CE is not stack'd, malloc'd or (recently) free'd
==3658==
==3658== Process terminating with default action of signal 11 (SIGSEGV)
==3658==  Bad permissions for mapped region at address 0x2CE
==3658==    at 0x2CE: ???
==3658==    by 0x34AA20D09B: _dl_signal_error (in /lib64/ld-2.5.so)
==3658==    by 0x34AA20C538: _dl_map_object_deps (in /lib64/ld-2.5.so)
==3658==    by 0x34AA203223: dl_main (in /lib64/ld-2.5.so)
==3658==    by 0x34AA21334A: _dl_sysdep_start (in /lib64/ld-2.5.so)
==3658==    by 0x34AA201387: _dl_start (in /lib64/ld-2.5.so)
==3658==    by 0x34AA200A77: (within /lib64/ld-2.5.so)
==3658==
==3658== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
==3658== malloc/free: in use at exit: 0 bytes in 0 blocks.
==3658== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==3658== For counts of detected errors, rerun with: -v
==3658== All heap blocks were freed -- no leaks are possible.
Comment 28 Mathieu Lacage 2010-05-26 02:29:12 UTC
(In reply to comment #27)

> > No, try:
> > ./waf --shell
> > valgrind ./build/debug/scratch/third

[read again what I wrote above and make sure you run ./waf shell]

> ./build/debug/scratch/third: error while loading shared libraries: libns3.so:
> ca                                                                             
>                    nnot open shared object file: No such file or directory

The above tells you what is wrong here: can't find libns3.so which means you have not run ./waf shell
Comment 29 duy 2010-05-26 03:12:28 UTC
Created attachment 879 [details]
minstrel valgrind output

the valgrind output is attached.  It seems like it's related to bug 919 memory leak (http://www.nsnam.org/bugzilla/﷒0﷓).
Comment 30 Paolo Tagliani 2010-05-26 03:30:49 UTC
Created attachment 880 [details]
minstrel_third_output

this is my output, I run ./waf shell first.
Comment 31 duy 2010-05-26 14:41:42 UTC
Created attachment 882 [details]
third new valgrind ouput

I just patched the memory leak problem in minstrel(Bug 919), yet this bug remains.  The new valgrind output is attached.

It points to:
==4079== Invalid read of size 4
==4079==    at 0x47D129E: ns3::TimeUnit<1>::operator=(ns3::TimeUnit<1> const&) (nstime.h:424)
==4079==    by 0x4DFF96F: ns3::MinstrelWifiManager::UpdateStats(ns3::MinstrelWifiRemoteStation*) (minstrel-wifi-manager.cc:574)
==4079==    by 0x4E008C7: ns3::MinstrelWifiManager::DoGetDataMode(ns3::WifiRemoteStation*, unsigned) (minstrel-wifi-manager.cc:429)
Comment 32 Mathieu Lacage 2010-05-26 14:47:13 UTC
(In reply to comment #31)
> Created an attachment (id=882) [details]
> third new valgrind ouput
> 
> I just patched the memory leak problem in minstrel(Bug 919), yet this bug
> remains.  The new valgrind output is attached.

valgrind is telling you that you are using a TimeUnit object which is located in an un-allocated/un-initialized area of memory. Probably because you are accessing an array out of bounds or a non-created or allready destroyed area.

> 
> It points to:
> ==4079== Invalid read of size 4
> ==4079==    at 0x47D129E: ns3::TimeUnit<1>::operator=(ns3::TimeUnit<1> const&)
> (nstime.h:424)
> ==4079==    by 0x4DFF96F:
> ns3::MinstrelWifiManager::UpdateStats(ns3::MinstrelWifiRemoteStation*)
> (minstrel-wifi-manager.cc:574)
> ==4079==    by 0x4E008C7:
> ns3::MinstrelWifiManager::DoGetDataMode(ns3::WifiRemoteStation*, unsigned)
> (minstrel-wifi-manager.cc:429)
Comment 33 duy 2010-05-26 20:14:59 UTC
Created attachment 883 [details]
patch for bug 802 and 919

Thanks Mathieu for pointing out accessing index out of bound.
GetNSupported(station) should only be call once after allocating the table. 
This patch fixed both bug 802 and 919 and put minstrel rate back to test.py.  Let me know if you still have any problems.
Comment 34 Nicola Baldo 2010-05-27 06:57:06 UTC
Created attachment 885 [details]
backtrace

I applied your patch to current ns-3-dev (rev 6318), but I still get a segmentation fault when running the program posted by paolo. I am attaching a backtrace.
Comment 35 duy 2010-05-27 16:20:57 UTC
Created attachment 886 [details]
Fixed Bug 802 and Bug 919

sorry, I forgot to run his example.   This patch will do.  It made sure to only allocate table for GetNSupported() > 1.  (In strange occasions, GetNSupported() returns 1).  I ran valgrind on his example and mine as well, they return ok.  Please let me know if there is still any problems.
Comment 36 Paolo Tagliani 2010-05-27 17:03:53 UTC
Created attachment 887 [details]
backtrace

I tried to run another time the example, after applied the patch, and it crash anyway. No problem if I use Aarf or some other wifi manager.
I attach my backtrace
Comment 37 duy 2010-05-27 17:47:17 UTC
(In reply to comment #36)
> Created an attachment (id=887) [details]
> backtrace
> 
> I tried to run another time the example, after applied the patch, and it crash
> anyway. No problem if I use Aarf or some other wifi manager.
> I attach my backtrace

Hi Paolo, the patch you applied seems like the old one.  Could you try again and give me a new trace?  I ran your example under valgrind with no problem or memory leak.

Duy
Comment 38 Paolo Tagliani 2010-05-27 18:27:58 UTC
I'm sorry, I haven't seen the new patch! Now all works. Thanks a lot.
Comment 39 Paolo Tagliani 2010-05-27 18:33:48 UTC
Created attachment 888 [details]
backtrace

That's the backtrace of the execution under valgrind.
Comment 40 duy 2010-05-27 19:26:32 UTC
(In reply to comment #39)
> Created an attachment (id=888) [details]
> backtrace
> 
> That's the backtrace of the execution under valgrind.

that's strange, this is my backtrace.  I used your example posted here, did you modify your example? if yes, please attach.

Waf: Entering directory `/home/duy/ns-3-clean/ns-3-dev/build'
Waf: Leaving directory `/home/duy/ns-3-clean/ns-3-dev/build'
'build' finished successfully (0.628s)
==14678== Memcheck, a memory error detector.
==14678== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==14678== Using LibVEX rev 1854, a library for dynamic binary translation.
==14678== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==14678== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==14678== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==14678== For more details, rerun with: -v
==14678==
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Received 1024 bytes from 10.1.3.3
Received 1024 bytes from 10.1.3.2
Received 1024 bytes from 10.1.3.1
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Received 1024 bytes from 10.1.3.3
Received 1024 bytes from 10.1.3.2
Received 1024 bytes from 10.1.3.1
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Sent 1024 bytes to 10.1.2.4
Received 1024 bytes from 10.1.3.1
Received 1024 bytes from 10.1.3.2
Received 1024 bytes from 10.1.3.3
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
Received 1024 bytes from 10.1.2.4
==14678==
==14678== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 102 from 1)
==14678== malloc/free: in use at exit: 0 bytes in 0 blocks.
==14678== malloc/free: 18,713 allocs, 18,713 frees, 882,087 bytes allocated.
==14678== For counts of detected errors, rerun with: -v
==14678== All heap blocks were freed -- no leaks are possible.
Comment 41 Paolo Tagliani 2010-05-28 03:28:57 UTC
No, I haven't Modified it. If you want attach the example you run and I'll run it and will attach my backtrace
Comment 42 duy 2010-05-28 03:37:39 UTC
Created attachment 897 [details]
paolo's third modified example

Hi Paolo, this attachment contains your modified third.cc that I ran.  Let me know if you get any memory leaks or valgrind errors.
Comment 43 Paolo Tagliani 2010-05-28 04:11:11 UTC
Created attachment 898 [details]
backtrace with leak memory informatio

Yes I have 1 memory error. This is the backtrace with information of memory leak
Comment 44 Paolo Tagliani 2010-05-28 04:16:43 UTC
Comment on attachment 898 [details]
backtrace with leak memory informatio

I applyed only your last patch.
Comment 45 duy 2010-05-28 06:42:47 UTC
Nicola and Mathieu, do you have any idea about this memory leak?  I have no idea what causes this.  The strange thing is that I could not repeat Paolo's valgrind errors under my machine.
Comment 46 duy 2010-06-03 16:28:17 UTC
Hi Paolo, could you run valgrind on other rates to see if you get the same valgrind errors?  I don't know why but I could not repeat your valgrind errors on my machine.

Duy


(In reply to comment #44)
> (From update of attachment 898 [details])
> I applyed only your last patch.
Comment 47 Paolo Tagliani 2010-06-03 17:23:15 UTC
(In reply to comment #46)
> Hi Paolo, could you run valgrind on other rates to see if you get the same
> valgrind errors?  I don't know why but I could not repeat your valgrind errors
> on my machine.
> 
> Duy
> 
> 
> (In reply to comment #44)
> > (From update of attachment 898 [details] [details])
> > I applyed only your last patch.

other rate? Do you mean other wifi manager?
Comment 48 duy 2010-06-03 17:26:44 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > Hi Paolo, could you run valgrind on other rates to see if you get the same
> > valgrind errors?  I don't know why but I could not repeat your valgrind errors
> > on my machine.
> > 
> > Duy
> > 
> > 
> > (In reply to comment #44)
> > > (From update of attachment 898 [details] [details] [details])
> > > I applyed only your last patch.
> 
> other rate? Do you mean other wifi manager?

yes
Comment 49 Paolo Tagliani 2010-06-03 17:37:03 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > Hi Paolo, could you run valgrind on other rates to see if you get the same
> > > valgrind errors?  I don't know why but I could not repeat your valgrind errors
> > > on my machine.
> > > 
> > > Duy
> > > 
> > > 
> > > (In reply to comment #44)
> > > > (From update of attachment 898 [details] [details] [details] [details])
> > > > I applyed only your last patch.
> > 
> > other rate? Do you mean other wifi manager?
> 
> yes

I have the same memory leak error with aarf and onoe. Wath's wrong with my machine?? This error don't block the simulation, I'm actually simulating send of over 100000 packets with minstrel and I haven't got any error.
Comment 50 duy 2010-06-03 17:48:57 UTC
(In reply to comment #49)
> I have the same memory leak error with aarf and onoe. Wath's wrong with my
> machine?? This error don't block the simulation, I'm actually simulating send
> of over 100000 packets with minstrel and I haven't got any error.

It seems like there is some small library issues with your machine, I have no idea what.  Maybe something needs to be updated?  This tiny leak shouldn't block your simulation unless you have like hundreds of thousands of nodes in your simulation and they will eat up memory gradually.  Anyway, I am going ahead and commit this patch to ns-3-dev tonight and close out this bug.
Comment 51 duy 2010-06-04 01:40:06 UTC
Thanks everyone!

changeset:   6337:92c95748a915
tag:         tip
user:        Duy Nguyen <duy@soe.ucsc.edu>
date:        Thu Jun 03 22:38:44 2010 -0700
summary:     Fixed Bug 802 and Bug 919