Bugzilla – Full Text Bug Listing |
Summary: | Minstrel algorithm causes segmentation fault | ||
---|---|---|---|
Product: | ns-3 | Reporter: | Kirill Andreev <andreev> |
Component: | wifi | Assignee: | Nicola Baldo <nicola> |
Status: | RESOLVED FIXED | ||
Severity: | critical | CC: | dnlove, jpelkey, mathieu.lacage, nicola, ns-bugs, pablosproject |
Priority: | P2 | ||
Version: | ns-3-dev | ||
Hardware: | All | ||
OS: | All | ||
Attachments: |
Backtrace
avoid assert fixed index out of bound and switch minstrel to low latency fixed index out of bound and switch minstrel to low latency minstrel valgrind output minstrel_third_output third new valgrind ouput patch for bug 802 and 919 backtrace Fixed Bug 802 and Bug 919 backtrace backtrace paolo's third modified example backtrace with leak memory informatio |
I just ran ns-3-dev mesh example with mesh.SetRemoteStationManager("ns3::MinstrelWifiManager"); and I don't see the crash. Sorry I had no idea you filed this bug a month ago. Please cc me. I can't make minstrel work in general: nicola@pcnbaldo:~/locale/ns-3-dev$ ./waf --run examples/wireless/multirate Waf: Entering directory `/home/nicola/locale/ns-3-dev/build' Waf: Leaving directory `/home/nicola/locale/ns-3-dev/build' 'build' finished successfully (1.348s) Scenario: 4 Rts Threshold: 2200 Name: minstrel Rate: ns3::MinstrelWifiManager Routing: 0 Mobility: 0 assert failed. file=../src/devices/wifi/wifi-remote-station-manager.cc, line=353, cond="found" note that examples/wireless/multirate is not run by test.py, so no surprise that we didn't spot the problem. I have similar problems trying to convert other sim programs so that they use minstrel. I get the impression that this might be due to changeset 6065 which fixed Bug 602... any opinions with respect to this? we need to have it run by test.py, I am working on a fix for it now. It seems like Onoe fails as well. I have traced down to the following source of problems: src/devices/wifi/minstrel-wifi-manager.cc bool MinstrelWifiManager::IsLowLatency (void) const { return false; } and src/devices/wifi/wifi-remote-station-manager.cc:353 if (!IsLowLatency ()) { // Note: removing the packet below is wrong: what happens in case of retransmissions ??? TxModeTag tag; bool found; found = ConstCast<Packet> (packet)->RemovePacketTag (tag); NS_ASSERT (found); return tag.GetDataMode (); } I still don't understand how we are modeling the LowLantency and not low latency(e.g. removing TxModeTag) Created attachment 833 [details]
avoid assert
This avoids the backtrace but we hit a crash later in minstrel itself which appears to be trying to access an element past the end of an array in FindRate line 576.
==22131== Invalid read of size 1
==22131== at 0x5485610: ns3::HighPrecision::Compare(ns3::HighPrecision const&) const (high-precision-128.h:227)
==22131== by 0x54A9AEC: bool ns3::operator><1>(ns3::TimeUnit<1> const&, ns3::TimeUnit<1> const&) (nstime.h:257)
==22131== by 0x5A802EA: ns3::MinstrelWifiManager::FindRate(ns3::MinstrelWifiRemoteStation*) (minstrel-wifi-manager.cc:576)
==22131== by 0x5A81E01: ns3::MinstrelWifiManager::DoReportDataOk(ns3::WifiRemoteStation*, double, ns3::WifiMode, double) (minstrel-wifi-manager.cc:429)
I did not test with onoe the later crash appears to come from some minstrel bugs. i.e., MinstrelWifiManager::InitSampleTable initializes the sample table on line 787: station->m_sampleTable[newIndex][col] = i+1; Note the '+1' above: it sets an index in the sample table equal to GetNSupported (station) which leads GetNextSample later to return this invalid index in the minstrelTable. Removing the '+1' or adding a (i+1)%GetNSupported(station) makes the code not crash for me. I guess that this is something for duy to look at now. thanks Mathieu, it seems like I didn't check for array out of bound for that case. I am looking into it now. Onoe and Amrr works fine with Mathieu's patch. I agree that either removing the '+1' or adding a (i+1)%GetNSupported(station) should fix the segmentation fault. However, minstrel does not seem to produce correct results. I am away this weekend, but I will look into it next week and hopefully find a fix for it soon. (In reply to comment #9) > Onoe and Amrr works fine with Mathieu's patch. I confirm this as well. I just pushed that patch (changeset: 6241:d9a65be745f0) > I agree that either removing the '+1' or adding a (i+1)%GetNSupported(station) > should fix the segmentation fault. > > However, minstrel does not seem to produce correct results. I am away this > weekend, but I will look into it next week and hopefully find a fix for it > soon. ok, let's wait until you have the chance to look into this issue. Created attachment 840 [details]
fixed index out of bound and switch minstrel to low latency
Going back to the low latency vs high latency device. I am still not clear on the definition. To me, Minstrel seems to model a low latency device because it uses a multi-retry chain(sort out 4 rates ready to be used in case of failures) to combat the delay of the next data packet. The packets feedback are used to update the statistics table. Minstrel does not work properly if it is set to high latency.
Created attachment 841 [details]
fixed index out of bound and switch minstrel to low latency
removing "+1" instead.
nak. minstrel is high latency. See my original arf/aarf paper for a definition. (In reply to comment #13) > nak. minstrel is high latency. See my original arf/aarf paper for a definition. I guess you mean that the original minstrel in madwifi is high latency, right? (In reply to comment #11) > Minstrel does not work properly if it is set to > high latency. I get the impression that the minstrel implementation in ns-3 is low latency, in that you update statistics everytime DoReportDataOk and DoReportDataFailed are called. These updated stats are then used for selecting the rate for the next transmission attempt. Duy, would it be possible to change how stats are updated so that minstrel can be high latency as it is in the real world? Ideally, if it were possible to support both low and high latency by setting the corresponding attribute, it would be great. (In reply to comment #14) > (In reply to comment #13) > > nak. minstrel is high latency. See my original arf/aarf paper for a definition. > > I guess you mean that the original minstrel in madwifi is high latency, right? Yes, the original minstrel algorithm, was designed to work on high latency hardware. > (In reply to comment #11) > > Minstrel does not work properly if it is set to > > high latency. > > I get the impression that the minstrel implementation in ns-3 is low latency, > in that you update statistics everytime DoReportDataOk and DoReportDataFailed > are called. These updated stats are then used for selecting the rate for the > next transmission attempt. > > Duy, would it be possible to change how stats are updated so that minstrel can > be high latency as it is in the real world? > > Ideally, if it were possible to support both low and high latency by setting > the corresponding attribute, it would be great. I did that at some point for amrr/onoe but it's really hard to get right so, I would advise against it. > I did that at some point for amrr/onoe but it's really hard to get right so, I
> would advise against it.
Ok, so I propose this solution for now:
1) apply duy's latest patch
2) update minstrel's doxygen documentation (or rather create it, since there is nothing currently) so that we clearly say that the implementation is low-latency unlike the one in madwifi.
What do you think?
(In reply to comment #16) > 2) update minstrel's doxygen documentation (or rather create it, since there is > nothing currently) so that we clearly say that the implementation is > low-latency unlike the one in madwifi. I don't care much about minstrel myself but I feel that to make the ns-3 model useful, it should model the original minstrel algorithm so, if we do the above, we should file a bug against ns-3 saying that we want to make our model high latency. It's your call though. > I don't care much about minstrel myself but I feel that to make the ns-3 model > useful, it should model the original minstrel algorithm so, if we do the above, > we should file a bug against ns-3 saying that we want to make our model high > latency. It's your call though. ok now we have bug 889 to track that issue. As for the current bug, I'll close it as soon as we are allowed to push duy's patch (we should be in code freeze now). (In reply to comment #14) > (In reply to comment #13) > > nak. minstrel is high latency. See my original arf/aarf paper for a definition. > I guess you mean that the original minstrel in madwifi is high latency, right? > (In reply to comment #11) > > Minstrel does not work properly if it is set to > > high latency. > I get the impression that the minstrel implementation in ns-3 is low latency, > in that you update statistics everytime DoReportDataOk and DoReportDataFailed > are called. These updated stats are then used for selecting the rate for the > next transmission attempt. > Duy, would it be possible to change how stats are updated so that minstrel can > be high latency as it is in the real world? Yes, I will do this for the next release because it's a bit late to make major changes to minstrel implementation. > Ideally, if it were possible to support both low and high latency by setting > the corresponding attribute, it would be great. (In reply to comment #19) > Yes, I will do this for the next release because it's a bit late to make major > changes to minstrel implementation. > Thank you for your commitment! Let's continue that discussion on bug 889 changeset 6268:84e114d34b89 Hi everybody. I'm working on wifi network, and Minstrel always crash for me. I modified the third.cc example in the tutorial and I only change the wifi manager as Minstrel, set all the station on the wifi as udp client and send 3 udp echo packet. If I swutch the wifi manager, for example, as Aarp the simulation works fine. I post under my code: #include "ns3/core-module.h" #include "ns3/simulator-module.h" #include "ns3/node-module.h" #include "ns3/helper-module.h" #include "ns3/wifi-module.h" #include "ns3/mobility-module.h" // Default Network Topology // // Wifi 10.1.3.0 // AP // * * * * // | | | | 10.1.1.0 // n5 n6 n7 n0 -------------- n1 n2 n3 n4 // point-to-point | | | | // ================ // LAN 10.1.2.0 using namespace ns3; NS_LOG_COMPONENT_DEFINE ("ThirdScriptExample"); int main (int argc, char *argv[]) { bool verbose = true; uint32_t nCsma = 3; uint32_t nWifi = 3; CommandLine cmd; cmd.AddValue ("nCsma", "Number of \"extra\" CSMA nodes/devices", nCsma); cmd.AddValue ("nWifi", "Number of wifi STA devices", nWifi); cmd.AddValue ("verbose", "Tell echo applications to log if true", verbose); cmd.Parse (argc,argv); if (verbose) { LogComponentEnable("UdpEchoClientApplication", LOG_LEVEL_INFO); LogComponentEnable("UdpEchoServerApplication", LOG_LEVEL_INFO); } NodeContainer p2pNodes; p2pNodes.Create (2); PointToPointHelper pointToPoint; pointToPoint.SetDeviceAttribute ("DataRate", StringValue ("5Mbps")); pointToPoint.SetChannelAttribute ("Delay", StringValue ("2ms")); NetDeviceContainer p2pDevices; p2pDevices = pointToPoint.Install (p2pNodes); NodeContainer csmaNodes; csmaNodes.Add (p2pNodes.Get (1)); csmaNodes.Create (nCsma); CsmaHelper csma; csma.SetChannelAttribute ("DataRate", StringValue ("100Mbps")); csma.SetChannelAttribute ("Delay", TimeValue (NanoSeconds (6560))); NetDeviceContainer csmaDevices; csmaDevices = csma.Install (csmaNodes); NodeContainer wifiStaNodes; wifiStaNodes.Create (nWifi); NodeContainer wifiApNode = p2pNodes.Get (0); YansWifiChannelHelper channel = YansWifiChannelHelper::Default (); YansWifiPhyHelper phy = YansWifiPhyHelper::Default (); phy.SetChannel (channel.Create ()); WifiHelper wifi = WifiHelper::Default (); wifi.SetRemoteStationManager ("ns3::MinstrelWifiManager"); NqosWifiMacHelper mac = NqosWifiMacHelper::Default (); Ssid ssid = Ssid ("ns-3-ssid"); mac.SetType ("ns3::NqstaWifiMac", "Ssid", SsidValue (ssid), "ActiveProbing", BooleanValue (false)); NetDeviceContainer staDevices; staDevices = wifi.Install (phy, mac, wifiStaNodes); mac.SetType ("ns3::NqapWifiMac", "Ssid", SsidValue (ssid)); NetDeviceContainer apDevices; apDevices = wifi.Install (phy, mac, wifiApNode); MobilityHelper mobility; mobility.SetPositionAllocator ("ns3::GridPositionAllocator", "MinX", DoubleValue (0.0), "MinY", DoubleValue (0.0), "DeltaX", DoubleValue (5.0), "DeltaY", DoubleValue (10.0), "GridWidth", UintegerValue (3), "LayoutType", StringValue ("RowFirst")); mobility.SetMobilityModel ("ns3::RandomWalk2dMobilityModel", "Bounds", RectangleValue (Rectangle (-50, 50, -50, 50))); mobility.Install (wifiStaNodes); mobility.SetMobilityModel ("ns3::ConstantPositionMobilityModel"); mobility.Install (wifiApNode); InternetStackHelper stack; stack.Install (csmaNodes); stack.Install (wifiApNode); stack.Install (wifiStaNodes); Ipv4AddressHelper address; address.SetBase ("10.1.1.0", "255.255.255.0"); Ipv4InterfaceContainer p2pInterfaces; p2pInterfaces = address.Assign (p2pDevices); address.SetBase ("10.1.2.0", "255.255.255.0"); Ipv4InterfaceContainer csmaInterfaces; csmaInterfaces = address.Assign (csmaDevices); address.SetBase ("10.1.3.0", "255.255.255.0"); address.Assign (staDevices); address.Assign (apDevices); UdpEchoServerHelper echoServer (9); ApplicationContainer serverApps = echoServer.Install (csmaNodes.Get (nCsma)); serverApps.Start (Seconds (1.0)); serverApps.Stop (Seconds (10.0)); UdpEchoClientHelper echoClient (csmaInterfaces.GetAddress (nCsma), 9); echoClient.SetAttribute ("MaxPackets", UintegerValue (3)); echoClient.SetAttribute ("Interval", TimeValue (Seconds (1.))); echoClient.SetAttribute ("PacketSize", UintegerValue (1024)); ApplicationContainer clientApps = echoClient.Install (wifiStaNodes); clientApps.Start (Seconds (2.0)); clientApps.Stop (Seconds (10.0)); Ipv4GlobalRoutingHelper::PopulateRoutingTables (); Simulator::Stop (Seconds (10.0)); pointToPoint.EnablePcapAll ("third"); phy.EnablePcap ("third", apDevices.Get (0)); csma.EnablePcap ("third", csmaDevices.Get (0), true); Simulator::Run (); Simulator::Destroy (); return 0; } (In reply to comment #22) > Hi everybody. I'm working on wifi network, and Minstrel always crash for me. > I modified the third.cc example in the tutorial and I only change the wifi > manager as Minstrel, set all the station on the wifi as udp client and send 3 > udp echo packet. If I swutch the wifi manager, for example, as Aarp the > simulation works fine. Do you get similar back trace like this? It doesn't seem it's coming from Minstrel but some libraries issues. I"ll take a look at it again. Sent 1024 bytes to 10.1.2.4 *** glibc detected *** /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third: double free or corruption (out): 0x000000001a246900 *** ======= Backtrace: ========= /lib64/libc.so.6[0x34aa6722ef] /lib64/libc.so.6(cfree+0x4b)[0x34aa67273b] /cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae3055cdf7f] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36Buffer7RecycleEPNS_10BufferDataE+0x185)[0x2ae3055d15a9] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36BufferD1Ev+0x260)[0x2ae3055d2e58] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns36PacketD1Ev+0xdd)[0x2ae3055f93f9] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns314DefaultDeleterINS_6PacketEE6DeleteEPS1_+0x24)[0x2ae3055f9450] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns314SimpleRefCountINS_6PacketENS_5emptyENS_14DefaultDeleterIS1_EEE5UnrefEv+0x2e)[0x2ae3055f948a] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns33PtrINS_6PacketEED1Ev+0x27)[0x2ae3055f9541] /cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae305a400a3] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns314DefaultDeleterINS_9EventImplEE6DeleteEPS1_+0x27)[0x2ae30559ad81] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns314SimpleRefCountINS_9EventImplENS_5emptyENS_14DefaultDeleterIS1_EEE5UnrefEv+0x32)[0x2ae30559adb6] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns33PtrINS_9EventImplEEaSERKS2_+0x40)[0x2ae30559ae3a] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns37EventIdaSERKS0_+0x1d)[0x2ae30559ae7d] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns311YansWifiPhy18StartReceivePacketENS_3PtrINS_6PacketEEEdNS_8WifiModeENS_12WifiPreambleE+0xd47)[0x2ae305a3fd27] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZNK3ns315YansWifiChannel7ReceiveEjNS_3PtrINS_6PacketEEEdNS_8WifiModeENS_12WifiPreambleE+0x64)[0x2ae305a480dc] /cse/grads/duy/ns-3-dev/build/debug/libns3.so[0x2ae305a481c7] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns39EventImpl6InvokeEv+0x2f)[0x2ae30558746f] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns320DefaultSimulatorImpl15ProcessOneEventEv+0x226)[0x2ae3055a3820] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns320DefaultSimulatorImpl3RunEv+0x1f)[0x2ae3055a386b] /cse/grads/duy/ns-3-dev/build/debug/libns3.so(_ZN3ns39Simulator3RunEv+0x118)[0x2ae30558e6b2] /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third[0x40e08c] /lib64/libc.so.6(__libc_start_main+0xf4)[0x34aa61d994] /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third(__gxx_personality_v0+0x449)[0x409849] ======= Memory map: ======== 00400000-00416000 r-xp 00000000 00:13 15237718 /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third 00615000-00616000 rw-p 00015000 00:13 15237718 /cse/grads/duy/ns-3-dev/build/debug/examples/tutorial/third 1a1a8000-1a24d000 rw-p 1a1a8000 00:00 0 [heap] 34aa200000-34aa21c000 r-xp 00000000 fd:00 2676766 /lib64/ld-2.5.so 34aa41b000-34aa41c000 r--p 0001b000 fd:00 2676766 /lib64/ld-2.5.so 34aa41c000-34aa41d000 rw-p 0001c000 fd:00 2676766 /lib64/ld-2.5.so 34aa600000-34aa74d000 r-xp 00000000 fd:00 2676776 /lib64/libc-2.5.so 34aa74d000-34aa94d000 ---p 0014d000 fd:00 2676776 /lib64/libc-2.5.so 34aa94d000-34aa951000 r--p 0014d000 fd:00 2676776 /lib64/libc-2.5.so 34aa951000-34aa952000 rw-p 00151000 fd:00 2676776 /lib64/libc-2.5.so 34aa952000-34aa957000 rw-p 34aa952000 00:00 0 34aaa00000-34aaa82000 r-xp 00000000 fd:00 2676837 /lib64/libm-2.5.so 34aaa82000-34aac81000 ---p 00082000 fd:00 2676837 /lib64/libm-2.5.so 34aac81000-34aac82000 r--p 00081000 fd:00 2676837 /lib64/libm-2.5.so 34aac82000-34aac83000 rw-p 00082000 fd:00 2676837 /lib64/libm-2.5.so 34aae00000-34aae02000 r-xp 00000000 fd:00 2676836 /lib64/libdl-2.5.so 34aae02000-34ab002000 ---p 00002000 fd:00 2676836 /lib64/libdl-2.5.so 34ab002000-34ab003000 r--p 00002000 fd:00 2676836 /lib64/libdl-2.5.so 34ab003000-34ab004000 rw-p 00003000 fd:00 2676836 /lib64/libdl-2.5.so 34ab200000-34ab216000 r-xp 00000000 fd:00 2676831 /lib64/libpthread-2.5.so 34ab216000-34ab415000 ---p 00016000 fd:00 2676831 /lib64/libpthread-2.5.so 34ab415000-34ab416000 r--p 00015000 fd:00 2676831 /lib64/libpthread-2.5.so 34ab416000-34ab417000 rw-p 00016000 fd:00 2676831 /lib64/libpthread-2.5.so 34ab417000-34ab41b000 rw-p 34ab417000 00:00 0 34ab600000-34ab614000 r-xp 00000000 fd:00 629619 /usr/lib64/libz.so.1.2.3 34ab614000-34ab813000 ---p 00014000 fd:00 629619 /usr/lib64/libz.so.1.2.3 34ab813000-34ab814000 rw-p 00013000 fd:00 629619 /usr/lib64/libz.so.1.2.3 34aba00000-34aba07000 r-xp 00000000 fd:00 2676832 /lib64/librt-2.5.so 34aba07000-34abc07000 ---p 00007000 fd:00 2676832 /lib64/librt-2.5.so 34abc07000-34abc08000 r--p 00007000 fd:00 2676832 /lib64/librt-2.5.so 34abc08000-34abc09000 rw-p 00008000 fd:00 2676832 /lib64/librt-2.5.so 34aca00000-34aca59000 r-xp 00000000 fd:00 628437 /usr/lib64/libsqlite3.so.0.8.6 34aca59000-34acc58000 ---p 00059000 fd:00 628437 /usr/lib64/libsqlite3.so.0.8.6 34acc58000-34acc5b000 rw-p 00058000 fd:00 628437 /usr/lib64/libsqlite3.so.0.8.6 34b5c00000-34b5d33000 r-xp 00000000 fd:00 635016 /usr/lib64/libxml2.so.2.6.26 34b5d33000-34b5f33000 ---p 00133000 fd:00 635016 /usr/lib64/libxml2.so.2.6.26 34b5f33000-34b5f3c000 rw-p 00133000 fd:00 635016 /usr/lib64/libxml2.so.2.6.26 34b5f3c000-34b5f3d000 rw-p 34b5f3c000 00:00 0 34b7c00000-34b7c0d000 r-xp 00000000 fd:00 2676553 /lib64/libgcc_s-4.1.2-20080825.so.1 34b7c0d000-34b7e0d000 ---p 0000d000 fd:00 2676553 /lib64/libgcc_s-4.1.2-20080825.so.1 34b7e0d000-34b7e0e000 rw-p 0000d000 fd:00 2676553 /lib64/libgcc_s-4.1.2-20080825.so.1 34bc800000-34bc8e6000 r-xp 00000000 fd:00 627764 /usr/lib64/libstdc++.so.6.0.8 34bc8e6000-34bcae5000 ---p 000e6000 fd:00 627764 /usr/lib64/libstdc++.so.6.0.8 34bcae5000-34bcaeb000 r--p 000e5000 fd:00 627764 /usr/lib64/libstdc++.so.6.0.8 34bcaeb000-34bcaee000 rw-p 000eb000 fd:00 627764 /usr/lib64/libstdc++.so.6.0.8 34bcaee000-34bcb00000 rw-p 34bcaee000 00:00 0 2ae304cd9000-2ae304cdb000 rw-p 2ae304cd9000 00:00 0 2ae304cdb000-2ae3060ab000 r-xp 00000000 00:13 15237739 /cse/grads/duy/ns-3-dev/build/debug/libns3.so 2ae3060ab000-2ae3062aa000 ---p 013d0000 00:13 15237739 /cse/grads/duy/ns-3-dev/build/debug/libns3.so 2ae3062aa000-2ae306347000 rw-p 013cf000 00:13 15237739 /cse/grads/duy/ns-3-dev/build/debug/libns3.so 2ae306347000-2ae30634d000 rw-p 2ae306347000 00:00 0 2ae306367000-2ae30636c000 rw-p 2ae306367000 00:00 0 7fff34113000-7fff34128000 rw-p 7ffffffea000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] (In reply to comment #23) > Do you get similar back trace like this? It doesn't seem it's coming from > Minstrel but some libraries issues. I"ll take a look at it again. > > Sent 1024 bytes to 10.1.2.4 > *** glibc detected *** run the example in valgrind. Thath's my output(under). The strange thing is that if I only change the string "ns3::MinstrelWifiManager" with any wifi manager, all works perfect. I'm also having problem with minstrel on wifi net with multiple nodes that do a udp echo on client. paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run scratch/thirdModificato ==32059== Memcheck, a memory error detector ==32059== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==32059== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==32059== Command: ./waf --run scratch/thirdModificato ==32059== --32059-- Valgrind options: --32059-- --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp --32059-- -v --32059-- Contents of /proc/version: --32059-- Linux version 2.6.32-22-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 --32059-- Arch and hwcaps: AMD64, amd64-sse3-cx16 --32059-- Page sizes: currently 4096, max supported 4096 --32059-- Valgrind library directory: /usr/lib/valgrind --32059-- Reading syms from /usr/bin/env (0x400000) --32059-- Reading debug info from /usr/bin/env .. --32059-- .. CRC mismatch (computed 2b996bfb wanted 50cb270b) --32059-- object doesn't have a symbol table --32059-- Reading syms from /lib/ld-2.11.1.so (0x4000000) --32059-- Reading debug info from /lib/ld-2.11.1.so .. --32059-- .. CRC mismatch (computed e1ab2e55 wanted 8e29b093) --32059-- Reading debug info from /usr/lib/debug/lib/ld-2.11.1.so .. --32059-- Reading syms from /usr/lib/valgrind/memcheck-amd64-linux (0x38000000) --32059-- object doesn't have a dynamic symbol table --32059-- Reading suppressions file: /usr/lib/valgrind/debian-libc6-dbg.supp --32059-- Reading suppressions file: /usr/lib/valgrind/default.supp --32059-- REDIR: 0x4018470 (strlen) redirected to 0x380402d7 (vgPlain_amd64_linux_REDIR_FOR_strlen) --32059-- Reading syms from /usr/lib/valgrind/vgpreload_core-amd64-linux.so (0x4a22000) --32059-- Reading syms from /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so (0x4c24000) ==32059== WARNING: new redirection conflicts with existing -- ignoring it --32059-- new: 0x04018470 (strlen ) R-> 0x04c28710 strlen --32059-- REDIR: 0x40182e0 (index) redirected to 0x4c28320 (index) --32059-- REDIR: 0x4018360 (strcmp) redirected to 0x4c28cf0 (strcmp) --32059-- Reading syms from /lib/libc-2.11.1.so (0x4e2d000) --32059-- Reading debug info from /lib/libc-2.11.1.so .. --32059-- .. CRC mismatch (computed 998a00c2 wanted 045c8b93) --32059-- Reading debug info from /usr/lib/debug/lib/libc-2.11.1.so .. --32059-- REDIR: 0x4eb1ad0 (__GI_strrchr) redirected to 0x4c28140 (__GI_strrchr) --32059-- REDIR: 0x4eb1aa0 (rindex) redirected to 0x4a225dc (_vgnU_ifunc_wrapper) ==32059== WARNING: new redirection conflicts with existing -- ignoring it --32059-- new: 0x04eb1ad0 (__GI_strrchr ) R-> 0x04c28110 rindex --32059-- REDIR: 0x4eae5e0 (__GI_strcmp) redirected to 0x4c28ca0 (__GI_strcmp) --32059-- REDIR: 0x4eb0010 (__GI_strlen) redirected to 0x4c286d0 (__GI_strlen) --32059-- REDIR: 0x4eb0220 (__GI_strncmp) redirected to 0x4c28be0 (__GI_strncmp) --32059-- REDIR: 0x4eae520 (__GI_strchr) redirected to 0x4c28220 (__GI_strchr) --32059-- REDIR: 0x4eb5220 (strchrnul) redirected to 0x4c29a10 (strchrnul) --32059-- REDIR: 0x4ea9520 (malloc) redirected to 0x4c27426 (malloc) --32059-- REDIR: 0x4eb3350 (mempcpy) redirected to 0x4c29a80 (mempcpy) --32059-- REDIR: 0x4eb3c30 (memcpy) redirected to 0x4c28dc0 (memcpy) --32059-- REDIR: 0x4eaade0 (free) redirected to 0x4c27036 (free) --32059-- REDIR: 0x4eb21e0 (memchr) redirected to 0x4c28d90 (memchr) --32059-- REDIR: 0x4eaaf90 (realloc) redirected to 0x4c274d7 (realloc) --32059-- REDIR: 0x4eb0060 (strnlen) redirected to 0x4c28630 (strnlen) --32059-- REDIR: 0x4eb3990 (__GI_stpcpy) redirected to 0x4c296c0 (__GI_stpcpy) --32059-- REDIR: 0x4eafa60 (__GI_strcpy) redirected to 0x4c28800 (__GI_strcpy) --32059-- REDIR: 0x4eb51d0 (__GI___rawmemchr) redirected to 0x4c29a60 (__GI___rawmemchr) --32059-- REDIR: 0x4eae4f0 (index) redirected to 0x4a225dc (_vgnU_ifunc_wrapper) ==32059== WARNING: new redirection conflicts with existing -- ignoring it --32059-- new: 0x04eae520 (__GI_strchr ) R-> 0x04c281e0 index Waf: Entering directory `/home/paolo/Repos/ns-3-dev/build' Waf: Leaving directory `/home/paolo/Repos/ns-3-dev/build' 'build' finished successfully (0.498s) Command ['/home/paolo/Repos/ns-3-dev/build/debug/scratch/thirdModificato'] terminated with signal SIGSEGV. Run it under a debugger to get more information (./waf --run <program> --command-template="gdb --args %s <args>"). (In reply to comment #25) > paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run > scratch/thirdModificato No, try: ./waf --shell valgrind ./build/debug/scratch/third (In reply to comment #26) > (In reply to comment #25) > > paolo@paolo-laptop:~/Repos/ns-3-dev$ valgrind -v ./waf --run > > scratch/thirdModificato > > No, try: > ./waf --shell > valgrind ./build/debug/scratch/third this is the output under valgrind. I have no clues what's the problem. Could this be related to minstrel memory leak problem in http://www.nsnam.org/bugzilla/0 ? Any suggestions? ==3658== Memcheck, a memory error detector. ==3658== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==3658== Using LibVEX rev 1658, a library for dynamic binary translation. ==3658== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==3658== Using valgrind-3.2.1, a dynamic binary instrumentation framework. ==3658== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==3658== For more details, rerun with: -v ==3658== ./build/debug/scratch/third: error while loading shared libraries: libns3.so: ca nnot open shared object file: No such file or directory ==3658== Jump to the invalid address stated on the next line ==3658== at 0x2CE: ??? ==3658== by 0x34AA20D09B: _dl_signal_error (in /lib64/ld-2.5.so) ==3658== by 0x34AA20C538: _dl_map_object_deps (in /lib64/ld-2.5.so) ==3658== by 0x34AA203223: dl_main (in /lib64/ld-2.5.so) ==3658== by 0x34AA21334A: _dl_sysdep_start (in /lib64/ld-2.5.so) ==3658== by 0x34AA201387: _dl_start (in /lib64/ld-2.5.so) ==3658== by 0x34AA200A77: (within /lib64/ld-2.5.so) ==3658== Address 0x2CE is not stack'd, malloc'd or (recently) free'd ==3658== ==3658== Process terminating with default action of signal 11 (SIGSEGV) ==3658== Bad permissions for mapped region at address 0x2CE ==3658== at 0x2CE: ??? ==3658== by 0x34AA20D09B: _dl_signal_error (in /lib64/ld-2.5.so) ==3658== by 0x34AA20C538: _dl_map_object_deps (in /lib64/ld-2.5.so) ==3658== by 0x34AA203223: dl_main (in /lib64/ld-2.5.so) ==3658== by 0x34AA21334A: _dl_sysdep_start (in /lib64/ld-2.5.so) ==3658== by 0x34AA201387: _dl_start (in /lib64/ld-2.5.so) ==3658== by 0x34AA200A77: (within /lib64/ld-2.5.so) ==3658== ==3658== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1) ==3658== malloc/free: in use at exit: 0 bytes in 0 blocks. ==3658== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==3658== For counts of detected errors, rerun with: -v ==3658== All heap blocks were freed -- no leaks are possible. (In reply to comment #27) > > No, try: > > ./waf --shell > > valgrind ./build/debug/scratch/third [read again what I wrote above and make sure you run ./waf shell] > ./build/debug/scratch/third: error while loading shared libraries: libns3.so: > ca > nnot open shared object file: No such file or directory The above tells you what is wrong here: can't find libns3.so which means you have not run ./waf shell Created attachment 879 [details] minstrel valgrind output the valgrind output is attached. It seems like it's related to bug 919 memory leak (http://www.nsnam.org/bugzilla/0). Created attachment 880 [details]
minstrel_third_output
this is my output, I run ./waf shell first.
Created attachment 882 [details] third new valgrind ouput I just patched the memory leak problem in minstrel(Bug 919), yet this bug remains. The new valgrind output is attached. It points to: ==4079== Invalid read of size 4 ==4079== at 0x47D129E: ns3::TimeUnit<1>::operator=(ns3::TimeUnit<1> const&) (nstime.h:424) ==4079== by 0x4DFF96F: ns3::MinstrelWifiManager::UpdateStats(ns3::MinstrelWifiRemoteStation*) (minstrel-wifi-manager.cc:574) ==4079== by 0x4E008C7: ns3::MinstrelWifiManager::DoGetDataMode(ns3::WifiRemoteStation*, unsigned) (minstrel-wifi-manager.cc:429) (In reply to comment #31) > Created an attachment (id=882) [details] > third new valgrind ouput > > I just patched the memory leak problem in minstrel(Bug 919), yet this bug > remains. The new valgrind output is attached. valgrind is telling you that you are using a TimeUnit object which is located in an un-allocated/un-initialized area of memory. Probably because you are accessing an array out of bounds or a non-created or allready destroyed area. > > It points to: > ==4079== Invalid read of size 4 > ==4079== at 0x47D129E: ns3::TimeUnit<1>::operator=(ns3::TimeUnit<1> const&) > (nstime.h:424) > ==4079== by 0x4DFF96F: > ns3::MinstrelWifiManager::UpdateStats(ns3::MinstrelWifiRemoteStation*) > (minstrel-wifi-manager.cc:574) > ==4079== by 0x4E008C7: > ns3::MinstrelWifiManager::DoGetDataMode(ns3::WifiRemoteStation*, unsigned) > (minstrel-wifi-manager.cc:429) Created attachment 883 [details] patch for bug 802 and 919 Thanks Mathieu for pointing out accessing index out of bound. GetNSupported(station) should only be call once after allocating the table. This patch fixed both bug 802 and 919 and put minstrel rate back to test.py. Let me know if you still have any problems. Created attachment 885 [details]
backtrace
I applied your patch to current ns-3-dev (rev 6318), but I still get a segmentation fault when running the program posted by paolo. I am attaching a backtrace.
Created attachment 886 [details] Fixed Bug 802 and Bug 919 sorry, I forgot to run his example. This patch will do. It made sure to only allocate table for GetNSupported() > 1. (In strange occasions, GetNSupported() returns 1). I ran valgrind on his example and mine as well, they return ok. Please let me know if there is still any problems. Created attachment 887 [details]
backtrace
I tried to run another time the example, after applied the patch, and it crash anyway. No problem if I use Aarf or some other wifi manager.
I attach my backtrace
(In reply to comment #36) > Created an attachment (id=887) [details] > backtrace > > I tried to run another time the example, after applied the patch, and it crash > anyway. No problem if I use Aarf or some other wifi manager. > I attach my backtrace Hi Paolo, the patch you applied seems like the old one. Could you try again and give me a new trace? I ran your example under valgrind with no problem or memory leak. Duy I'm sorry, I haven't seen the new patch! Now all works. Thanks a lot. Created attachment 888 [details]
backtrace
That's the backtrace of the execution under valgrind.
(In reply to comment #39) > Created an attachment (id=888) [details] > backtrace > > That's the backtrace of the execution under valgrind. that's strange, this is my backtrace. I used your example posted here, did you modify your example? if yes, please attach. Waf: Entering directory `/home/duy/ns-3-clean/ns-3-dev/build' Waf: Leaving directory `/home/duy/ns-3-clean/ns-3-dev/build' 'build' finished successfully (0.628s) ==14678== Memcheck, a memory error detector. ==14678== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==14678== Using LibVEX rev 1854, a library for dynamic binary translation. ==14678== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==14678== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==14678== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==14678== For more details, rerun with: -v ==14678== Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Received 1024 bytes from 10.1.3.3 Received 1024 bytes from 10.1.3.2 Received 1024 bytes from 10.1.3.1 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Received 1024 bytes from 10.1.3.3 Received 1024 bytes from 10.1.3.2 Received 1024 bytes from 10.1.3.1 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Sent 1024 bytes to 10.1.2.4 Received 1024 bytes from 10.1.3.1 Received 1024 bytes from 10.1.3.2 Received 1024 bytes from 10.1.3.3 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 Received 1024 bytes from 10.1.2.4 ==14678== ==14678== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 102 from 1) ==14678== malloc/free: in use at exit: 0 bytes in 0 blocks. ==14678== malloc/free: 18,713 allocs, 18,713 frees, 882,087 bytes allocated. ==14678== For counts of detected errors, rerun with: -v ==14678== All heap blocks were freed -- no leaks are possible. No, I haven't Modified it. If you want attach the example you run and I'll run it and will attach my backtrace Created attachment 897 [details]
paolo's third modified example
Hi Paolo, this attachment contains your modified third.cc that I ran. Let me know if you get any memory leaks or valgrind errors.
Created attachment 898 [details]
backtrace with leak memory informatio
Yes I have 1 memory error. This is the backtrace with information of memory leak
Comment on attachment 898 [details]
backtrace with leak memory informatio
I applyed only your last patch.
Nicola and Mathieu, do you have any idea about this memory leak? I have no idea what causes this. The strange thing is that I could not repeat Paolo's valgrind errors under my machine. Hi Paolo, could you run valgrind on other rates to see if you get the same valgrind errors? I don't know why but I could not repeat your valgrind errors on my machine. Duy (In reply to comment #44) > (From update of attachment 898 [details]) > I applyed only your last patch. (In reply to comment #46) > Hi Paolo, could you run valgrind on other rates to see if you get the same > valgrind errors? I don't know why but I could not repeat your valgrind errors > on my machine. > > Duy > > > (In reply to comment #44) > > (From update of attachment 898 [details] [details]) > > I applyed only your last patch. other rate? Do you mean other wifi manager? (In reply to comment #47) > (In reply to comment #46) > > Hi Paolo, could you run valgrind on other rates to see if you get the same > > valgrind errors? I don't know why but I could not repeat your valgrind errors > > on my machine. > > > > Duy > > > > > > (In reply to comment #44) > > > (From update of attachment 898 [details] [details] [details]) > > > I applyed only your last patch. > > other rate? Do you mean other wifi manager? yes (In reply to comment #48) > (In reply to comment #47) > > (In reply to comment #46) > > > Hi Paolo, could you run valgrind on other rates to see if you get the same > > > valgrind errors? I don't know why but I could not repeat your valgrind errors > > > on my machine. > > > > > > Duy > > > > > > > > > (In reply to comment #44) > > > > (From update of attachment 898 [details] [details] [details] [details]) > > > > I applyed only your last patch. > > > > other rate? Do you mean other wifi manager? > > yes I have the same memory leak error with aarf and onoe. Wath's wrong with my machine?? This error don't block the simulation, I'm actually simulating send of over 100000 packets with minstrel and I haven't got any error. (In reply to comment #49) > I have the same memory leak error with aarf and onoe. Wath's wrong with my > machine?? This error don't block the simulation, I'm actually simulating send > of over 100000 packets with minstrel and I haven't got any error. It seems like there is some small library issues with your machine, I have no idea what. Maybe something needs to be updated? This tiny leak shouldn't block your simulation unless you have like hundreds of thousands of nodes in your simulation and they will eat up memory gradually. Anyway, I am going ahead and commit this patch to ns-3-dev tonight and close out this bug. Thanks everyone! changeset: 6337:92c95748a915 tag: tip user: Duy Nguyen <duy@soe.ucsc.edu> date: Thu Jun 03 22:38:44 2010 -0700 summary: Fixed Bug 802 and Bug 919 |
Created attachment 744 [details] Backtrace When I tried to set Minstrel to mesh scenario, it has failed with segmentation fault.