Back to GSOC2023ns3-ai (page containing my weekly updates, not the final report)
Slides: final report slides (for reference)
- 1 Project Overview
- 2 Merge Requests and Commits
- 3 Project Details
- 4 Build and Run the Code
- 5 Proposal vs. Actual Work
- 6 Future Works
- 7 Acknowledgments
- Project Name: ns3-ai enhancements
- Student: Muyuan Shen
- Mentors: Collin Brady and Hao Yin
The main focus of this project is to optimize performance and improve usability of the ns3-ai module, which facilitates the connection between ns-3 and Python-based ML frameworks using shared memory.
To accomplish this goal, the project will introduce additional APIs that support data structures such as vector in shared memory IPC. This will effectively reduce the required interaction between C++ and Python, resulting in improved performance. Also, the project will integrate Gymnasium API like ns3-gym's but has a shared-memory-based backend, to turn ns-3 into a environment that agents can efficiently and seamlessly interact with. In addition, the project will enhance the existing examples, documentation and tutorials, while also integrating new examples that cover scenarios like Multi-BSS in VR. By doing so, users will have more comprehensive resources at their disposal. Furthermore, the project aims to provide examples utilizing pure C++-based ML frameworks. This will offer researchers more options for integrating with ML.
The overall aim of the project is to expand and accelerate the capabilities of the ns3-ai module, enabling users to simulate and analyze network related algorithms with enhanced efficiency and flexibility.
Merge Requests and Commits
Throughout the project, my development is based cmake branch branch of ns3-ai. I created a single MR that contain all my works to be merged into the upstream cmake branch. In this MR, there are 110+ commits by me, with author name 'ShenMuyuan' or 'Mu-YuanShen' or 'eicsmy'. The MR has been merged.
- Why the branch is named "cmake": because one of my early tasks was to add Cmake support for ns3-ai (to be compatible with ns-3.36+). During GSoC I have worked on another branch named "improvements", and it was eventually merged into cmake branch.
|merge to cmake branch
Note: Each URL showed below, if it is for my source code, points to contents as of my last commit during GSoC period.
Community Bonding Period
During community bonding period, I started bi-weekly meetings with my mentors and we decided on the project plan, which is prioritizing the development of new interfaces, than develop more examples & enhance documentations.
There are two new interfaces, including vector interface (later, we called it vector-based message interface, as it shared some fundamentals with the struct-based message interface) and Gym interface. Also, we talked about some details of new examples like LTE-handover and Multi-BSS.
I also read the ns3-ai code thoroughly to understand its IPC principles and learned some reinforcement learning basics.
To add std::vector into shared memory is not easy with ns3-ai's original design, because Python's ctypes library does not provide STL templates support (it can only support C structures and functions). In order to support vector, I refactored the original model completely, replacing ctypes with Boost C++ library which is more flexible for interprocess communication. My works include:
- Utilized Boost's boost::interprocess::managed_shared_memory to store data (as well as synchronization variables) in shared memory. This shared segment can be used for data transmission between C++ and Python. The two directions, C++-to-Python and Python-to-C++, occupies two different regions in shared memory. It also supports custom memory allocator for STL, a instance of boost::interprocess::allocator, which ensures that when STL allocates new memory, that memory is come from the shared memory rather than other heap memory.
- The shared memory creation can be found in the constructor of Ns3AiMsgInterfaceImpl: code
- Developed spinlock-based semaphore to synchronize reads & writes operations in shared memory. The original synchronization method works, but the "version number" concept and the "control block" data structures may cause confusion and distraction for beginners. Also, the "version number" is just a complex implementation of the well-known semaphore. To improve ease of use and enhance code readability, I created a semaphore that only spins but does not sleep while waiting based on Boost's semaphore. It has performance comparable to the original with better readability and usability.
- Built the vector-based interface with multiple configurable options. The vector interface is in parallel with the struct interface in terms of creation and usage, and there is an attribute that users can set in early code in order to choose one of the interfaces. If the vector interface is chosen, the C++-to-Python and Python-to-C++ vectors are created in shared memory and will contain no elements. It requires users to call resize or push_back to adjust their length before use. Another attribute is whether the interface handles simulation end. If that attribute is set, the interface will perform a simple protocol to notify Python side when C++ side simulation finishes. Other configurable attributes include memory segment size and names of objects constructed in shared memory.
- Provided Python binding boilerplate code in examples. Python side accesses the shared memory and the objects in it (vectors or structs) via C++ functions exposed to Python. The exposure of C++ class functions and members is achieved with Pybind11, a lightweight python binding library. The C++ binding code, linked with Pybind11, is compiled into dynamically-linked library that Python can import as a module. Because the C++ side interface is template-based and Python does not support template natively, the Python binding module needs to be separately generated for every program (the creation is done by a cmake target dependency so it's seamless). Although the binding contains many lines of C++ code and is difficult to write from scratch, users can modify from an existing binding code to generate Python binding modules quickly, and I provide many boilerplate on that (the *_py.cc files in all examples).
- Some of the example boilerplate code: binding code for struct-based message interface in A-Plus-B example, binding code for vector-based message interface in Multi-BSS example
The Gymnasium API for ns3-ai is aimed to be based on shared memory rather than sockets communication, which can provide faster data exchange than ns3-gym does. While many of the Gym interface code is from ns3-gym's repository, I made some substantial changes in order for it to have a shared memory backend. My works include:
- Modified OpenGymInterface to use Ns3AiMsgInterface for IPC. OpenGymInterface is created by ns3-gym developers, providing code to create Gym-compatible environments in ns-3. It contains functions to get state or action spaces, observe the environment in ns-3 and execute the actions (maybe changing parameters in simulation). Those function use callbacks registered by OpenGymEnv at runtime. To make callbacks work well, custom environment must inherit from OpenGymEnv and implement the class methods such as GetActionSpace, GetObservationSpace, GetObservation and ExecuteActions. All states and actions are serialized by Google's Protocol Buffers and then transmitted and de-serialized by the peer. What I did is changing the ZeroMQ socket's send & receive functions to Ns3AiMsgInterface's send & receive functions, and ensuring that Ns3AiMsgInterface is properly initialized. The underlying message interface for transmitting serialized messages is struct-based. The struct contains a buffer (uint8_t array) and its capacity.
- Example of my changed part: before (in ns3-gym's repo), after (in my ns3-ai repo)
- Initialization of Ns3AiMsgInterface: code
- Note: in the above configuration, handling finish is set to false because the protocol of notifying Python side that C++ side has finished is unnecessary for Gym. Gym interface has its own protocol for handling finish, which is NotifySimulationEnd on C++ side and then 'done' becoming true when Python steps.
- Created Python binding for accessing the shared structure containing serialized message string. Binding that structure containing array is similar to binding a common structure, except that the array is specially treated to convert its contents to Python's memoryview. With memoryview, Python side can read and write to the array seamlessly, like what you can do in C++ with std::array.
- Obtaining the memoryview in binding: code
- Note: different length of array must have different memoryview object for Python to deal with. In the above code, get_buffer returns the buffer that is actually used (for reading), while get_buffer_full returns the buffer that has the full length (for writing). Example usage in Ns3Env (the Python side Gym environment created with gym.make): array read and array write
Examples and documentation update
To demonstrate the usage of the message interface and Gym interface, all existing examples are updated to use the new interfaces. Also, a new example "Multi-BSS" is created to benchmark the performance of vector interface. All of them can be successfully built using the "./ns3 build" command with the updated Cmake files, without needing to copy the examples to scratch folder. Updated examples and the interfaces supported by them are listed below:
- A-Plus-B example (updated example) (directory): In this example, C++ side starts by setting 2 random numbers between 0 and 10 in shared memory. Then, Python side gets the numbers and sets the sum of the numbers in shared memory (in another region). Finally, C++ gets the sum that Python set. The procedure is analogous to C++ passing RL states to Python and Python passing RL actions back to C++, and is repeated many times. Supported interfaces:
- Struct-based message interface
- Vector-based message interface
- Gym interface
- LTE-CQI example (updated example) (directory): CQI prediction example. The original work is done based on 5G NR branch in ns-3, and previous developers have made some changes to make it also run in LTE codebase in ns-3 mainline. Supported interfaces:
- Struct-based message interface
- Multi-BSS example (new example) (directory): The example is based on and modified from juanvleonr's clean-tgax branch. The C++ side simulates a VR gaming scenario showed below, in which 4 BSSs operate in separate apartments in a 2 by 2 grid. Each BSS contains 1 AP and 4 STAs. One of the STA in the first BSS is a VR device generating burst UL traffic, while other devices have normal UL traffic. Supported interfaces:
- Struct-based message interface (available at the benchmarking branch)
- Vector-based message interface
- Rate-Control example (updated example) (including constant rate & Thompson Sampling) (directory): There are existing models of constant rate and Thompson sampling algorithms in Wi-Fi module. Here they are implemented in Python to show how to develop a new rate control algorithm for the Wi-Fi module using ns3-ai. Supported interfaces:
- Struct-based message interface
- RL-TCP example (updated example) (directory): This example applies Q-learning algorithms (Q-learning and deep Q-learning) to TCP congestion control for real-time changes in the environment of network transmission. By strengthening the learning management sliding window and threshold size, the network can get better throughput and smaller delay. Supported interfaces:
- Struct-based message interface
- Gym interface
Documents are updated along with the examples. Apart from all the README.md in example directories, I added instruction for installation, message interface tutorial and Gym interface tutorial as separate documents linked to the updated root README.md.
Pure C++ example
In the development of a pure C++-based ML framework example, I tried to rewrite the LTE-CQI example (originally using tensorflow as Python-based ML framework) to utilize TensorFlow C API, and the RL-TCP example (originally using torch as Python-based ML framework) to employ PyTorch C++ API. Unfortunately, only the latter succeed. The pure C++ version of LTE-CQI failed because there was limited support for gradients and neural networks in TensorFlow's C API. So, for TensorFlow C I only provide an example that checks libtensorflow version. Although I succeeded in converting Python code to C++ in RL-TCP example, the process was slow and difficult due to the lack of official documents and examples. For instance, C++ API doesn't provide the useful load_state_dict function for copying policy net parameters to target net. It took me a while to find out the equivalent C++ function to do that (torch::save and torch::load, and the module must be defined with TORCH_MODULE macro).
I also wrote a guide on how to use C++-based ML frameworks in ns-3 (by installing in ns3-ai): here
I benchmarked three items:
- Gym interface vs ns3-gym in terms of transmission time: This benchmark is based on the RL-TCP example, measuring the CPU cycle count during C++ to Python and Python to C++ data transmissions, and compare the mean and standard deviation of cycles. Results show that in both directions, the transmission time of ns3-ai's Gym interface is more than 15 times shorter than that of ns3-gym (shorter is better).
- Vector-based vs. struct-based message interface in terms of transmission time: The benchmark is based on Multi-BSS example, on benchmark_vector branch. Unfortunately, in terms of action transmission time (from C++'s beginning of write to Python's complete read), the vector-based is 1.2 times longer than the struct-based (shorter is better). The extra time is caused by Python's slow reading of vectors. Measurements show that in reading rxPower (received power in nodes in first BSS) at Python side, vector interface spent 20% to 50% more time than struct interface.
- To deal with the slow vector access on Python side in the future, one possible solution is to integrate Eigen on C++ side and use existing Eigen-Python bindings like pybind11's Eigen support or eigenpy to convert linear algebra types into numpy or scipy types.
- Pure C++ vs. struct-based message interface in terms of processing time: The benchmark is based on the pure C++ (libtorch) and message interface (PyTorch) version of RL-TCP example. We compare the processing time (i.e. transmission time + DRL algorithm time for message interface, DRL algorithm time for pure C++) for the two interfaces, including the mean and the standard deviation. Results show that the processing time of pure C++ implementation is more than twice shorter than that of message interface implementation (shorter is better).
See ns3-ai benchmarking documentation for more detailed information.
Overall, the Gym interface is much faster than ns3-gym, and the pure C++ interface is more efficient than message interface. The vector interface needs to be enhanced in the future, especially in the optimization of Python side access.
Build and Run the Code
A detailed guide on how to setup ns3-ai module is here. You must install ns-3 prior to install ns3-ai. To test ns3-ai, you can build and run the provided examples (listed in the above 'Phase 2' section) according to their documentations.
Proposal vs. Actual Work
A few things were mentioned in the proposal, but is not completed in my actual work:
- The LTE-handover example, which was intended to be an example using vector interface, similar to Multi-BSS. This example has not started because of limited time.
- Support for std::string in shared memory. Development for this support was postponed because the vector interface had the highest priority. Also, it's not considered a 'must do' in the project.
- Pure C++ ML example using TensorFlow C API. This has failed because of inadequate C API for gradients and neural networks, as mentioned above in 'Pure C++ example' section.
- Add more examples, such as LTE-handover, to ns3-ai for better demonstration of the tool.
- Optimize the vector-based message interface to reach its full potential on transmitting vectors or matrices of data.
I extend my heartfelt gratitude to my mentors Hao and Collin for their invaluable suggestions and comments that have guided me through the challenges during the GSoC 2023. Collaborating with the ns-3 community has been an enriching experience, expanding not only my technical knowledge but also fostering my skills in communication and oral presentation. At the same time, I am also very grateful to my teachers Prof. Yayu Gao and Prof. Xiaojun Hei at HUST, who have provided me with a lot of encouragement when I encountered difficulties. Additionally, I would like to express my appreciation to Google for offering this remarkable opportunity.