- 1 Personal Information
- 2 Project Overview
- 3 Weekly Overview
- 3.1 Week 1 (May 29 - May 31)
- 3.2 Week 2 (June 3 - June 7)
- 3.3 Week 3 (June 10 - June 14)
- 3.4 Week 4 (June 17 - June 21)
- 3.5 Week 5 (June 24 - June 28)
- 3.6 Week 6 (July 1 - July 5)
- 3.7 Week 7 (July 8 - July 12)
- 3.8 Week 8 (July 15 - July 19)
- 3.9 Week 9 (July 22 - July 26)
- 3.10 Week 10 (July 29 - August 2)
- 4 Daily Log
- 4.1 Week 1 (May 29 - May 31)
- 4.2 Week 2 (June 3 - June 7)
- 4.3 Week 3 (June 10 - June 14)
- 4.4 Week 4 (June 17 - June 21)
- 4.5 Week 5 (June 24 - June 28)
- 4.6 Week 6 (July 1 - July 5)
- 4.7 Week 7 (July 8 - July 12)
- 4.8 Week 8 (July 15-19)
- 4.9 Week 9 (July 22 - July 26)
- 4.10 Week 10 (July 29 - August 2)
My name is Eric Biggers and I am currently a mathematics and computer science major at Macalester College in St. Paul, Minnesota (graduating in 2014).
I am working with Dr. Dennis Brylow and several other REU students (Farzeen and Tyler) on porting the Embedded Xinu operating system to the Raspberry Pi, which is a tiny ARM-based computer. As the Raspberry Pi is an inexpensive piece of hardware and intended for educational use, the Xinu operating system, which is simple and intended for educational use, will be a very good fit for it.
Our initial work will consist of getting basic functionality working, including the UART, interrupts, and system timer. To do this we will need to consult various documentation, including documentation from ARM about the ARM processor and documentation from Broadcom about the peripherals accessible by the BCM2835 SoC. Our next task will probably be to better integrate our work into the existing Embedded Xinu codebase rather than using a separate source tree. We then will need to work on more advanced functionality, including writing a driver for the integrated SMSC LAN9512 USB 2.0 Ethernet Adapter, which is needed to get networking working. However, this may be difficult because working USB 2.0 support is a prerequisite for this. We then will need to write a bootloader program that will reside on the Raspberry Pi's SD card and can boot the Xinu kernel over the network. This will also be very challenging as it requires a driver for the ethernet adapter; however, as an alternative we may attempt to implement the bootloader as a stripped-down Linux kernel that loads the Xinu kernel using the kexec_load() system call. We expect that our final work will focus on providing high-quality documentation for the Raspberry Pi port of Xinu so that it can be used in educational settings.
At the end of the program, we were able to achieve most of our original goals, including porting basic preemptive multitasking operating system functionality and interrupt handling as well as implementing support for USB and the SMSC LAN9512. With networking functional, I also implemented all components needed by the network bootloader, including new DHCP and TFTP clients, and code to transfer control to a new kernel loaded into memory. Farzeen implemented working support for basic graphics, while Tyler made progress on USB and keyboard support. Our final goal, however, is not to just have this code, but to also fully document our work, especially with hardware that has not previously been well documented. Towards this end we have already written a paper summarizing our work, we are working on writing new articles on the Embedded Xinu Wiki, and I also improved the API documentation for many functions in the Embedded Xinu code-base that were not up to par, especially given that it is an educational operating system. Later, Dr. Brylow plans to actually use our work in his classes, which will make it possible for the documentation to become more practically-oriented for those looking to use Embedded Xinu on the Raspberry Pi to teach classes (e.g. a step-by-step guide for a lab setup and example coursework).
Week 1 (May 29 - May 31)
Assess state of current Raspberry Pi code; work on UART, interrupts, and system timer support
Week 2 (June 3 - June 7)
Complete port of core operating system excluding networking; pass all relevant Xinu test cases; ensure all the code we have written is appropriately documented; begin planning USB and ethernet adapter support.
Week 3 (June 10 - June 14)
Write core USB code, including support for hubs and sending and receiving control messages; implement additional graphics functionality and move assembly code to C.
Week 4 (June 17 - June 21)
Continue working on USB support, including support for interrupt and bulk transfers and using real interrupts rather than software polling. With the USB device framework in place, begin designing the actual ethernet adapter driver. Farzeen and Tyler: continue working on graphics and sound support.
Week 5 (June 24 - June 28)
Work on driver for SMSC LAN9512 USB Ethernet Adapter. Improve driver for Synopsys Designware High-Speed USB 2.0 On-The-Go Controller as needed. Prepare and give talk on Thursday. Attend required training session on Wednesday.
Week 6 (July 1 - July 5)
Continue working on Ethernet and USB drivers, with the goal of getting sending and receiving packets fully functional and stable.
Week 7 (July 8 - July 12)
Fix all remaining known bugs in currently implemented features on Embedded Xinu on the Raspberry Pi, including UART, Ethernet, and USB support. Get "reset" and "kexec" commands working as intended. Begin writing network bootloader.
Week 8 (July 15 - July 19)
Following the meeting with Dr. Brylow on Monday, Tyler, Farzeen, and I plan to spend most of this week writing a paper to possibly submit to the 2013 Workshop on Embedded and Cyber-Physical Systems Education. We also need to prepare posters by next week as well. However, if time permits I would like to work on improving the actual documentation for Embedded Xinu (both automatically-generated and on the Wiki), and possibly attempting to set up Raspberry Pis in a back-end pool equivalent to the setup Dr. Brylow has with the MIPS routers now.
Week 9 (July 22 - July 26)
Finish revising WESE paper for submission Monday night; prepare poster by deadline on Wednesday; re-investigate using "mini-UART" instead of PL011 UART; begin preparing documentation on Wiki; possibly move our changes to the SVN repository; make any necessary preparations for the session with high school teachers next week.
Week 10 (July 29 - August 2)
Prepare final talk for Thursday or Friday; present at poster session on Tuesday; work on SD card driver and/or start writing documentation on the Embedded Xinu Wiki.
Week 1 (May 29 - May 31)
Our first task today was to review the state of the current Raspberry Pi code and plan our next course of action. The current code was written by Farzeen and Jason and is very incomplete and not well integrated into Xinu (e.g. platform-specified code is not in its own directory, and many platform-independent files are missing or have been changed). It does, however, have some preliminary code to access the graphics hardware, which we may wish to use and expand on in the future. But, we decided to skip over the graphics initialization code paths for now.
We know that the Raspberry Pi hardware loads the Xinu kernel (by default) at address 0x8000. Our first code executed is in start.s, a file written in ARM assembly language. Most of our work today consisted of familiarizing ourselves with the ARM architecture and what this code is supposed to be doing. We implemented the ARM exception table, which was missing from the existing code. This table is initially loaded into the .data section of the kernel, but is then copied to address 0 by our code, where it then supposedly will be used by the ARM exception and interrupt handling hardware in order to branch to the correct handler function; however, our handler functions are currently stubs. We also added to code start.s that initializes the stack pointer for the stack in IRQ mode. This is necessary because each mode the ARM processor uses a banked stack pointer, depending on the processor mode.
Finally, we identified several problems with the UART driver; in particular, it accessed memory-mapped IO registers from C code without the proper 'volatile' annotations, and it was also written for the so-called "miniUART", which is probably not the UART we want to be using, since existing software such as the "raspbootin" bootloader uses a different UART that appears to be more standard and better documented by the Broadcom documentation.
Today we began by integrating our (initially completely separate) code base into the existing code for Embedded Xinu (MIPS version). Although we didn't all strongly agree this was the best course of action at this point, this should allow us to move the port forward much more quickly as it now uses the Xinu build system, configuration system, and mechanism for handling platform-specific code. In doing this we also discovered several problems, which we were bound to discover anyway. First, the ARM processor does not have a division instruction, so division must be performed in software. Normally, this is handled automatically by the GCC compiler by using a software implementation from the platform-specific static library libgcc.a. However, the existing Xinu Makefile linked the Xinu kernel directly with the linker (ld) rather than the GCC compiler driver (gcc), and did not include libgcc.a. To fix this we had to change the Makefile to link the code with gcc instead, along with the appropriate flag(s) to prevent gcc from also linking the kernel with the C runtime startup stub.
We also discovered that, although Xinu appears to isolate platform-specific code in its own directory, platform-specific features had creeped into code that was supposed to be platform independent. Two specific issues were a line of inline MIPS assembly language in the platform-independent part of reschedule() function, and also an assumption that characters can be sent to the UART through kprintf() before it has actually been initialized by the device-initialization code in sysinit()--- apparently a quirk of the MIPS hardware. Dr. Brylow stated that past students had not paid much attention to architecture-independent vs. architecture-dependent parts of the kernel and recommended we use code checked into his subversion repository instead. However, it is not yet clear what we should do when we find additional bugs and problems in architecture-independent code (we already found two separate problems with the config parser, which seemed to be poorly written), since there are apparently many separate Xinu source trees.
At the end of the day we worked more on the UART and system timer. We are now using the PL011 UART instead of the "miniUART", with the driver based on one written last fall by some RIT students. However, we are still experiencing problems with the UART and the driver is poorly written, so we will need to audit the code and make sure all the problems are fixed. With regards to the system timer, we wrote preliminary code to access the BCM2835 System Timer. The registers are documented by Broadcom; however, nothing in the documentation actually stated what frequency the timer runs at. We eventually were able to find the value of 1 MHz in the device tree file for the BCM2835 board included in the Linux kernel, and we confirmed that this was plausible by writing code that continually printed the low-order bits of the timer's counter. Farzeen also found a resource online that stated that some registers of the BCM2835 System Timer cannot be used by the ARM as they are already used by the GPU, despite this not being stated in the official documentation.
Last night I had worked to integrate our code in with the code in the SVN repository. This work was mostly complete by this morning, so we then focused on getting interrupts working properly. Using the documentation for the BCM2835 board (specifically, Chapters 7 and 12 of the Broadcom docs) we implemented the enable_irq() and disable_irq() functions, which enable or disable specified IRQ lines. We also attempted to implement the IRQ dispatcher routine. This is jumped to from the exception table placed at address 0. It then enters a C routine called dispatch() that handles all pending interrupts by calling the appropriate handlers in Xinu's interrupt vector. However, we have not really been able to test this code yet as we have been having trouble even getting any IRQ interrupts to occur at all. (Software interrupts [SWIs] are working.) We have not enabled interrupts on the UART yet and are focusing on the System Timer instead. But, we have so far been unsuccessful in getting the System Timer to actually generate an interrupt, despite the fact that we verified that the appropriate bit gets set in the Control/Status register when the counter matches the value set in the output compare register. We have also enabled interrupts both in the ARM CPSR (current program status register) and in the BCM2835-specific interrupt registers.
We also fixed a problem with our exception table itself. The table we implemented two days ago contained ARM instructions that jumped to the exception handler entry points using relative offsets. However, we realized that this would not work after the table was copied from the .text section of the Xinu kernel to address 0. To work around this problem we appended a table of absolute addresses to the end of the table, so that the jumps can be done by loading an absolute address that itself is located in a memory location that can be relatively addressed.
At the end of the day we finally were able to get the timer interrupt working. It turns out there were two separate problems with our code:
- We were accessing the registers of the interrupt controller in the wrong location. This was because we misread the Broadcom docs and didn't notice that the first documented register of the interrupt controller was not actually located at the interrupt controller base address, but rather than address plus 0x200. (It is still not clear to us what the intervening memory is used for.)
- We were using the wrong interrupt for the System Timer. The interrupt line for interrupts via the index 3 output compare register of the System timer (one of the two output compare registers not being used by the GPU already) was actually "GPU" IRQ 3. This was not stated in the Broadcom docs; in fact, the timer interrupt was stated to be something else, which we wasted much time trying to use.
Week 2 (June 3 - June 7)
This weekend I had gone through most of our code and improved the comments and fixed various problems. I also re-wrote and documented our context switch function (ctxsw) and interrupt handler function (irq_handler). The old code did not work, mainly for two reasons. First, the program status register was not being set correctly for new threads, as Xinu expects new threads to be created with interrupts enabled. Since ctxsw() is always entered with interrupts disabled, this means interrupts must be explicitly enabled when creating a new thread. Second, ctxsw() may be called from interrupt handlers, and we do not want to be using the stack associated with the ARM-specific IRQ mode at this point. Therefore, our IRQ handler routine had to be modified to use the system-mode stack instead of the IRQ stack. This actually simplified the code, as now the processor is almost always running in system mode.
Once context switching was working in combination with the timer interrupt, we were able to enable additional Xinu features, including semaphores and mailboxes. Our next task was to get the system shell working, which requires having a working interrupt handler for the UART. We already had code written for this (from the RIT students). However, it did not actually work, so we rewrote some parts of the code and added improved comments in order to understand what it was supposed to be doing. After this, we were able to enable the system shell and interact with Xinu, but only after disabling FIFOs in the UART, which we plan to fix this later this week.
Next, we enabled the Xinu test suite and ran it. We observed that two scheduling-related tests and one semaphore-related test failed sporadically, while the test suite crashed entirely once it got to a stdio test. We initially focused on the two scheduling-related tests and were eventually able to determine that the bug was in the tests themselves, as they were subject to a race condition where the test would fail if a reschedule, triggered by a timer interrupt, occurred at a certain point in the code. The other test failures we will investigate later this week.
Farzeen added some memory barriers to our code, which the Broadcom documentation states are necessary when switching between peripherals. However, these had no noticeable effect on our existing code.
We also talked with Dr. Brylow regarding USB and networking support, which will have to start implementing after we get the core operating system working. Dr. Brylow expressed frustration with the lack of good quality documentation about current and past hardware. However, we do have access to working code in Linux, which we may need to consult if we find that the hardware is not working as expected or documented. We expect our final driver to be much simpler than the Linux one, include only basic functionality, and be much better documented.
This morning I worked on running Linux on the Raspberry Pi. This may be useful in getting USB support working, since Linux does have working, although complicated and very poorly documented, code to support the USB ethernet adapter on the Raspberry Pi. Furthermore, as mentioned earlier, a stripped-down Linux kernel could be used to load the Xinu kernel over the network. I initially prepared a SD card with an image of Arch Linux ARM, which is more stripped down than Raspbian. I then booted it but was unable to see any messages over the serial console (connected to another computer via a USB-serial adapter, as we have been doing with Xinu). This was eventually partially solved by specifying the appropriate terminal on the kernel command line via the cmdline.txt file; however, some garbage characters were still printed to the terminal and it did not take any input. I then connected an ethernet cable between the Raspberry Pi and my laptop, running a DHCP server. As the Arch Linux ARM software enabled the network and SSH server by default, the Raspberry Pi requested an IP address from the DHCP server and could then be SSH'ed into. This confirmed that the ethernet adapter does indeed work with Linux and that it is possible to access the Raspberry Pi through the ethernet adapter without any attached keyboard, monitor, or serial cable. I then dumped the kernel log messages and sysfs so that we can refer to them later if there are any questions about the hardware, as these files provide information about the Linux kernel's view of the hardware and which drivers it is using for particular devices. Finally, starting with the configuration of the Arch Linux ARM kernel, I compiled a new Linux kernel with many unneeded features disabled, plus several other options that may be useful for us, such as verbose USB debug messages. However, I was unable to confirm, through either the serial port or ethernet port, that this kernel had successfully booted in place of the default kernel.
Tyler spent most of today attempting to better understand the UART code and figure out why the UART stopped working properly if hardware FIFOs were turned on; we still do not have an explanation for this.
Farzeen and I then focused on the remaining Xinu test cases that our kernel was failing. We were able to determine that the semaphore test that failed was also subject to a race condition, so it can fail even if the operating system code is correct. We implemented fixes for this test as well as the two mentioned yesterday as also being subject to race conditions. It was frustrating finding these problems with the test cases themselves--- in platform independent code--- because these problems should have been fixed long ago, and it makes additional work for us to have to do these fixes just now that we are trying to port the operating system. Similarly, while trying to determine why the stdio test was failing, I examined some of the code in Xinu's miniature C library, which I found to be very poorly written. Functions as simple as memchr(), which is only a few lines of code, were implemented incorrectly, and the function comments frequently contained errors and critical omissions. Furthermore, some function definitions did not match function declarations, which was possible because the original code sometimes declared externally visible functions without including the header file with the corresponding function prototype, despite this being standard practice in C. We expect we will have to review the entire C library implementation to fix all the bugs in it.
I spent most of today continuing to review the C library implementation. All the simple functions are done, pending some amount of additional review, and have been re-commented. One of the more complicated functions that I rewrote was qsort(). The original implementation of qsort(), which sorts an array of data using Quicksort, contained about 130 lines of fairly incomprehensible code, complete with meaningless variable names, 'goto' statements, and no comments. As Xinu is supposed to be meant for education, I re-wrote this function to be as simple as possible, halving the lines of code excluding the detailed comments I added. I also re-wrote the Makefile that builds the C library, as well as the Makefile for libdsp, because these Makefiles contained highly duplicated code and also a bug that caused the Xinu kernel to not be rebuilt when the sources were modified. Finally, in the afternoon I worked on examining the stdio scanning code and have rewritten some of the code in order to fix at least two bugs, including the use of global variables which made the code interrupt-unsafe and accessing buffers beyond their bounds.
Tyler focused mainly on the UART today and was eventually able to conclude that we need to leave FIFOs disabled, because otherwise the Xinu kernel has no way of being notified that a single character has arrived from the UART. (It's possible that other operating systems would toggle FIFOs on and off depending on the terminal mode.)
Farzeen made a list of items, including Raspberry Pi Model B's and their accessories, that we will need to order for later this summer if Dr. Brylow is to be running a workshop with these, and possibly use them for classes. She also worked on finding additional information about USB.
Today I continued reviewing/rewriting the C library and moved into the testing phase (on both the MIPS emulator and the Raspberry Pi). All tests initially passed except the limits test, stdio test, and a semaphore test that failed sporatically. The limits test failure was caused by CHAR_MIN and CHAR_MAX being defined as if 'char' was signed. These constants were defined as such because MIPS and many other platforms do consider 'char' to be signed. However, gcc by default makes 'char' unsigned on ARM. Therefore, the definitions of CHAR_MIN and CHAR_MAX had to be made conditional on whether 'char' is signed or not. On a related note, I also noticed that some of the test code relied on signed overflow being fully defined as two's complement arithmetic, but the corresponding '-fwrapv' flag was missing from the compiler command line, so I added it.
The stdio test code only very briefly tested formatted input and output, so I added additional test cases that should test the majority of the possible code paths in fscanf() and fprintf(). Furthermore, I fixed bugs in many of Xinu's device drivers where they did not correctly handle errors in input/output.
As usual, the semaphore test failure was caused by a race condition in the existing code and was not related to anything we are actually trying to do at this point.
At the lunch meeting today we briefly talked with Dr. Brylow. He seemed interested in getting graphics working with Xinu on the Raspberry Pi, so we decided that Farzeen would focus on graphics support while Tyler and I would work on USB support.
In the afternoon the Raspberry Pi cases and serial cables we had ordered arrived, so after borrowing the Pi that Dr. Brylow had, we now have more than one Pi to work with which will make testing on the hardware slightly easier, especially in the weeks to come when we might be working on different features.
I also implemented a shell command "kexec" that causes Xinu to copy a new kernel into memory from over the UART, then copy it to address 0x8000 and jump to it. This may eventually allow us to test different kernels without having to keep unplugging and replugging the Raspberry Pis, and it may also be useful for if we implement a real network-based bootloader. However I could not get the code to work yet due to kgetc() seemingly not working.
In the morning I attempted to figure out why the "kexec" command I programmed was not working. I eventually was able to determine that I was passing the wrong device structure to kgetc(), which is supposed to synchronously read a byte from the UART. After fixing this, the "kexec" code was able to start a new kernel. It works by sending a triple break sequence over the UART to trigger the "raspbootcom" program to send a new kernel, then receiving the kernel size and acknowledging it, then receiving the kernel into a dynamically allocated memory region, then jumping to a 7-instruction code stub at address 0x7ff9 (before the kernel, so as to not be overwritten by copying) that copies the kernel to the final destination at 0x8000.
However, neither new kernels booted by the "kexec" mechanism described above nor the same kernel, rebooted by jumping to address 0 or 0x8000 by running "reset" in the shell, actually ran as expected. They seemed to fail as soon as interrupts were enabled, even after I added startup code that disabled all the interrupt lines.
Tyler mostly spent today reading the USB specification to gain a better understanding of USB.
Farzeen began working on the framebuffer driver and re-implemented the initialization code in C (rather than ARM assembly) so that it would be easier to understand. The code currently sets up a simple 1024x768 framebuffer with a bit depth of 20. It communicates with the graphics hardware through the so-called "mailbox" mechanism, which is a message-passing mechanism that involves writing to a memory-mapped address at which there are channels for sending data to the GPU and receiving data from the GPU. Unfortunately, neither the mailbox mechanism nor the format of the messages themselves is documented by Broadcom, so we need to rely on informal resources, such as the page at http://elinux.org/RPi_Framebuffer.
In the afternoon I started work on the USB driver. The first thing the USB driver needs to do is power on and reset the host controller, which involves accessing the registers of the undocumented "Synopsis DesignWare Hi-Speed USB On-the-Go Controller". Fortunately, we do have several sources we can use for "documentation". The main resource to which we can fall back to is the Linux source code for the driver. However, the Linux source code has not yet been accepted into the mainline kernel and is only present in a separate Raspberry Pi repository, which typically indicates that the code is of poor quality and/or has been written primarily for other operating systems and does not use existing Linux interfaces when available. The code size is also extremely large, totaling 41796 lines not even counting OS-specific code in a separate directory. An easier resource to use will be the CSUD driver, which was written by another programmer based on the Linux source code but only supports HID devices, not the ethernet adapter. Our goal is not to copy this existing code, but rather extract from it the minimal bits and pieces that are needed to actually get the undocumented hardware to do its thing.
Week 3 (June 10 - June 14)
Leaving the problems with software resets pending, today Tyler and I continued to work on USB support. As expected, this is becoming an extremely difficult task due to both the high complexity of USB and the lack of real documentation for the Synopsys USB host controller. To implement our code we are referring primarily to the following resources:
- The CSUD driver, mentioned previously
- The core Linux USB software (located in drivers/usb/core)
- The Linux dwc_otg ("Designware Core On-the-Go") host controller driver (located in patched kernels only in drivers/usb/host/dwc_otg)
- The USB 2.0 standard (downloadable from usb.org), which is a lengthy 650 pages yet only covers a fraction of what we need to know to actually write a driver from scratch that controls a specific USB device.
Today we mainly focused on understanding exactly what the USB driver is supposed to do after resetting the hardware. It obviously is supposed to enumerate the devices on the bus, reading the device descriptors and assigning them addresses and configurations. However, it has been very difficult to understand exactly how this needs to be done. For example, simply reading a device descriptor, which provides basic information about what kind of USB device has been attached to the bus (such as whether the device is a keyboard, a mouse, a flash drive, an ethernet adapter, or a programatically controlled water balloon launcher), requires sending a control message, which itself is a process with multiple phases (SETUP, DATA, STATUS), each of which requires interacting with the host controller's undocumented registers in complex ways involving various timeouts and retry mechanisms to handle faults including the device being disconnected at any arbitrary time, and requires understanding many USB concepts and data structures, including pipes, endpoints, packet IDs, transfer types, requests, request types, low vs. full vs. high speeds, maximum packet sizes, and direct memory access. Obviously we hope to have our code be highly simplified, but USB by design is not simple.
We also were caught up by the root hub, specifically how it is apparently not actually a real USB hub but instead is supposed to be, to some extent, faked by the USB software. This does not seem to be documented in the USB specification itself, as this apparently is outside the scope of the specification (being host-controller specific) despite the fact that, apparently, the very first thing the USB software needs to do after resetting the hardware is to enumerate the root hub.
Towards the end of the day I continued the work with Linux I had started on June 4th and was successfully able to get a custom kernel to boot on the Raspberry Pi. I compiled it with verbose USB debugging messages and added the kernel log from a successful boot to our repository, since it might help if we have questions about the hardware.
Meanwhile, Farzeen continued to work on the graphical code, removing some of the older assembly files and fixing and additional additional functionality with C code. It is now possible to programatically change the color of any pixel on the screen.
Today Tyler and I continued to work on USB support. In the morning we focused on reading the USB 2.0 specification and the Linux source code, mainly regarding bus enumeration, control messages, and the expected separation between the USB driver core and the host controller driver. In the afternoon we returned to our code and implemented code that fakes some types of control messages to the root hub. This code is part of the host controller driver, which we have currently placed in a separate file (usb-hcd.c) from the USB driver core (usb.c). Therefore, by design, the USB driver core need not care that the root hub is not an actual hub. We also documented various structures and functions in our existing code and added additional descriptors as specified in section 9 of the USB 2.0 specification.
We did not do much testing on the Raspberry Pi today, as we are mainly working to get a USB system software framework in place. In fact, since the root hub is faked, there apparently is no need to actually send or receive messages on the USB bus until we start enumerating the devices that are attached to the root hub. When we get to this, this will require understanding how hubs are supposed to work and how status changes on hub ports are supposed to be detected and handled by the software, as well as additional work on our code to actually interact with the DWC hardware to send and receive messages over the USB bus.
Farzeen continued to work on the graphical code. Based on the code to change the color of individual pixels, it is now possible to draw lines and gradients.
Tyler and I continued working on USB support. We began implementing the USB hub code in a separate file, usb-hub.c. The hub code is responsible for performing hub-specific tasks when a hub, including the root hub, has been attached to the USB bus. It first must read the hub-specific descriptor (using a USB control message), which contains the number of ports that the hub has, among other information. It then must power on the USB ports, which must done using a separate USB control transfer for each port. Then, somehow the code must wait until an event has occurred on a port, such as a device being attached. Our understanding is that this applies even if a device was already plugged into the port when the power was turned on, since USB is by design a dynamic bus and all events are supposed to be handled dynamically. We currently implement this by starting a thread ("hub_thread") that checks for changes on all ports on all hubs on the USB bus every 1 second. This involves sending a control transfer to each port to request its status. For each port, the hardware provides bitmasks containing the current status and any status bits that have changed since last being explicitly clearly by a separate control message. The hub_thread then must take the appropriate action, such as allocating and logically attaching a new USB device, or recursively freeing a tree of USB devices that were disconnected.
We also had to spend some time writing code in the host controller driver that emulates a root hub that can be controlled by the hub driver. This is the standard implementation technique, based on the USB 2.0 specification, the Linux code, and the CSUD code. Specifically, the host controller driver is supposed to present the root hub as a standard hub, even if actually interacting with it at the hardware level requires using host-controller specific registers rather than generic control transfers, which is the case with the DWC.
Our code at this point is able to detect that a device (likely another hub, but we don't know for sure yet) is connected to the root hub's single port, but it is unable to address and configure it. This is expected at this point because doing this requires actually sending messages over the USB bus, which will require fixing our code that is supposed to do this.
Farzeen continued to work on the graphical code and implemented drawing strings of text via the framebuffer.
Today I continued to work on USB support. Tyler started working on sound support instead, since we were having some trouble staying coordinated with the USB code. The main problem is that since we have been learning as we go, it has been difficult to split up the work. If we were highly knowledgeable about USB beforehand, then we would not have had this problem because then we could have planned out the entire driver before writing any code. However, this morning I refactored some of our code and headers and I'm hoping the overall layout of the code is nearing its final state. The currently implemented USB code is split up into 3 components:
- The USB Driver, which implements the USB device model and exports functions to initialize the USB bus, to allocate, deallocate, or attach a USB device, and to send control messages to USB device. This code is platform independent, so this morning I moved it from system/platforms/arm-rpi, a Raspberry Pi-specific directory, to devices/usb.
- The USB Hub Driver, which is a USB device driver for hubs. Although this is a device driver, it is somewhat of a special case because hubs are a fundamental part of USB and this driver is required to support any USB devices at all. This code is platform-independent, so I also moved it to devices/usb.
- The USB Host Controller Driver, which abstracts the actual USB host controller hardware. It currently exports only 3 functions: hcd_power_on(), which powers on the host controller, hcd_reset(), which resets and initializes the host controller, and hcd_control_msg(), which submits a control message to a USB device. This is platform-dependent code that deals specifically with the Synopsys Designware On-The-Go USB Controller, so it is currently located in the system/platforms/arm-rpi directory.
In the afternoon I focused mainly on the host controller driver and worked on responding to reset requests, correctly using hardware channels to process control messages, and simplifying dealing with registers by using bitfields. At the end of the day we finally had the first signs of our code actually sending a message over the USB bus: we were able to successfully read the device descriptor of the device attached to the root hub. Based on the vendor ID and product ID this is the ethernet adapter, but it also shows up as a hub. This is consistent with the way Linux reported the device, although we don't yet understand why the ethernet adapter is a hub (and not a "function", in USB terminology) and also whether the actual physical USB ports you can plug stuff into are connected to this hub or not. This might become clear once we are successfully able to enumerate the devices attached to this hub, which we have the code to do, but it is not working yet, probably due to problems with the control transfers.
Farzeen continued to work on graphics support, including text and drawing in different colors.
Today I continued to work on USB support, including the following:
- Added definitions of USB class codes other than hubs
- Added code to parse the USB configuration descriptors and extract pointers to the interfaces and endpoints of the first available configuration.
- Made 'usbinfo' shell command print out detailed information about every connected USB device, as well a diagram of the USB bus.
- Fixed and refactored some of the code to read descriptors.
- Fixed and refactored some of the hub code.
- Added support for reading USB string descriptors.
- Fixed declaration of root hub descriptor and string descriptors.
- Added workaround to avoid clearing write-clear bits in the DWC host_port_ctrlstatus register when clearing a specific host port feature.
At this point, the code is successfully able to control the hub that's connected to the root hub. It has 3 ports, two of which correspond to the ports you can physically plug stuff into. I verified this by plugging in a keyboard. (Our code was able to detect that a low-speed USB device was connected to the relevant port, but could not do anything with it yet, since we have not tested low-speed devices yet, nor do we have a HID driver to make the keyboard do anything.) A vendor-specific class device is connected to the third port, and it has IN and OUT bulk transfer endpoints and an IN interrupt endpoint. This might be the device we need to communicate with for the ethernet adapter.
Our next tasks should be to review the code we've written so far and make sure it is robust in various situations, implement support for communicating with low-speed USB devices, optionally write a basic keyboard or mouse driver, and begin planning support for the ethernet driver.
In a separate branch, Tyler added declarations for the PCM audio hardware. It is not a USB device and simply requires writing to memory-mapped registers, so it will theoretically be much easier than USB to get working, although it may not be well documented.
Farzeen continued to work on the graphics code and added functions to draw various shapes.
Week 4 (June 17 - June 21)
As usual, today I continued to work on USB support. My initial task, which I had started on the weekend, was to get split transactions working. A split transaction is a special transaction on the USB bus that splits a low or full-speed transaction into two separate high-speed transactions. For compatibility reasons, such transactions must be used to communicate with low-speed USB devices, such as mice and keyboards; otherwise such devices will simply not work. Although some of the complexity of split transactions is handled by the hardware, the host controller driver must handle submitting the separate "begin split" and "complete split" transactions. Eventually I was able to get the code working, at which point I could connect a USB keyboard and mouse to the Raspberry Pi and have them show up on the USB bus. In the process, I also added various wrappers around some of the host controller code that retries transactions a certain number of times if they fail. This is recommended by the USB specification in various places, despite the fact that one might expect such errors to be handled by the hardware directly, since apparently USB is optimized for cheap hardware more than simple software.
I also started looking at the HID (Human Interface Device) specification, which describes HID-compliant devices such as USB keyboards and mice. However, it was also very complicated, apparently due to its need to be compatible with so many devices (mice, trackballs, joysticks, knobs, switches, buttons, sliders, VCR remote controls, data gloves, throttles, steering wheels, rubber pedals, bar-code readers, thermometers, and voltmeters were given as examples). Furthermore, HID apparently makes use of interrupt transfers in addition to control transfers. This makes sense, but I have not implemented interrupt transfers yet, which likely should be my next task.
USB is a polled bus, meaning that there is no interrupt line on the bus itself. I was initially confused about what this meant, because it makes it sound like interrupts from USB devices are not supported at all and system software must repeatedly poll USB devices for new input. Furthermore, the USB specification makes no attempt to clarify this matter as it is apparently host-controller specific. However, based on the Linux source code, in practice the polling actually happens entirely in hardware, between the host controller and the device. Then, the host controller can interrupt the CPU when it needs to. In Linux, software submits an "urb" (USB request block) specifying an interrupt transfer to the Host Controller Driver. The "urb" is then completed asynchronously with the help of the hardware's polling mechanism(s). When it has been completed, a completion function is called in IRQ context. The completion function can then take the appropriate action to handle the interrupt; for example, it might provide new mouse data to the kernel's input subsystem. It then re-submits the "urb" in order for future interrupts to be generated.
I then attempted to figure out what interrupt line the DWC Host Controller uses on the Raspberry Pi. As expected from Broadcom-quality documentation, nowhere in the BCM2835 ARM peripherals document was this actually mentioned. I eventually was able to find the value in the Linux source code but have not yet had time to test it.
The ethernet driver will likely require interrupt transfers as well, but I plan to test interrupt transfers on hubs and HID devices first.
Last night I had looked a little more into exactly what the driver for the SMSC LAN9500 USB 2.0 Ethernet Controller will require. A couple weeks ago, we had found a 66-page datasheet for this device, and I had thought that this would be helpful in writing the driver once we got the USB infrastructure in place. However, upon further examination of this datasheet yesterday, it was clear that it was written for computer/electrical engineers wanting to include the Controller in a device or system, rather than software developers. Almost the entire document was devoted to electrical characteristics, circuit diagrams, and pinouts. Luckily, there was a brief overview of the USB endpoints, which gives some of the basic information that we need. The device has a bulk-in endpoint for ethernet reception and a bulk-out endpoint for ethernet transmission. Section 2.1.3 says that "Bulk-out packets from the USB controller are directly stored into the TX buffer. Ethernet frames are directly stored into the RX buffer and become the basis for bulk-in packets". This makes it sound like the ethernet packets can essentially be read and written directly to/from the bulk endpoints, although the wording isn't 100% clear. There also seemed to be no documentation whatsoever about the vendor-specific commands, which may be needed to actually set up the device to send and receive packets. For this we will have to look at the Linux source code, since given the lack of documentation the only other alternative would be to reverse-engineer the hardware itself or reverse-engineer another device driver, which would be pointless when SMSC already has an open-source driver available for this device.
Today I mostly worked on implementing asynchronous, interrupt-driven USB transfers. For the initial implementation I am supporting only high-speed interrupt and bulk transfers, which should be sufficient for the ethernet controller driver and hub driver. The main challenge here is avoiding some of the complexity of the Linux code, where all USB transfers, including control transfers, are asynchronous and interrupt driven, and low-speed and isochronous transfers are fully supported. Partly because of this, just the interrupt handler in the Linux driver for the DWC-OTG USB host controller is over 2200 lines of code. In particular, it needs to handle interrupts that mean different things for different transfers, and it must keep track of the current transaction in transfers that consist of multiple transactions or series of transactions.
The basic design of the code I wrote today was that USB device drivers can call a new usb_submit_xfer_request() function in the USB core driver to asynchronously submit a bulk or interrupt transfer to a certain endpoint on a USB device. This function then immediately queues the request and returns. Meanwhile, a thread in the host controller driver waits on this queue whenever a hardware channel is free, grabs the next available transfer request, and starts it on the hardware channel. When the request is completed, a callback function provided in the transfer request structure will be called. The main challenge at this point is fine-tuning the actual communication with the hardware and setting up the interrupts correctly such that the hardware actually performs the transfer and interrupts the CPU when it has been completed, as expected.
This new code unfortunately is making our USB core driver and Host Controller driver even longer, but they are still an order of magnitude shorter than the Linux code (despite being heavily commented), and we hope to keep them that way, including by auditing our code for parts that could be simplified or deleted.
Tyler continued to work on sound support today, which apparently involved using an oscilloscope to verify that the hardware was actually generating the expected electrical signal.
In the morning I initially looked at some of the ethernet drivers currently implemented in Xinu, then added a stub driver for the SMSC9512 USB ethernet adapter. After doing this and making a couple more changes, I was able to enable the networking components of Xinu, except for http, which had a hard-coded dependency on nvram memory being available. But obviously networking does not actually work yet as the ethernet driver routines need to be filled in with actual implementations.
I also observed that the Xinu device model is somewhat inflexible compared with that in other OS's such as Linux. Xinu expects all devices to be declared statically in the platform-specific "xinu.conf" file (as well as in "platformVars", due to Xinu's lack of an integrated configuration system). Devices cannot be added or removed from the system at runtime. This is fundamentally incompatible with USB, which is a fully dynamic bus. Enumeration of the bus is designed to be an ongoing, interrupt-driven process that does not ever "complete". Even if a USB device is non-removable, there are no guarantees as to how long it will take for the hub to which it's attached to report that it exists. Moreover, any USB device can be attached or detached at any time.
However, our goal is not to change Xinu's device model to make it more complicated, and we can expect somewhat sane behavior from the ethernet adapter considering that it is physically not removable from the device. Therefore, it should be possible to declare the ethernet adapter as a static device, with some caveats. Namely, when etherInit() is called to initialize the device, there's no guarantee that the USB bus has actually found it yet. This could be worked around in a couple ways:
- Try to enumerate the USB bus fully before calling etherInit(). Although we've been doing this with some degree of success, as mentioned above this is not how USB was designed. Furthermore, I've been transitioning to interrupt-driven hub status notifications to avoid having to run a thread to keep polling all USB hubs in software. The interrupt-driven approach, while superior, makes synchronously enumerating the bus even more unrealistic.
- Make etherInit() wait until the USB ethernet adapter has been found. The dumbest solution would be to just wait 1 second or so before continuing. A less dumb solution would be to wait for some notification from the USB core driver, ideally with some timeout, that the needed device has been added.
- Make etherInit() simply register the ethernet driver with the USB core, then return. The device can then be initialized for real as soon as the USB device is added. This is the solution I've currently implemented and it makes the most sense with regards to how USB is designed, but it doesn't really solve the problem of static devices because then other functions such as etherOpen() would have to wait for the device to be initialized for real.
In the afternoon I mostly tried to get interrupt-driven hub status notifications to actually work. This is the preliminary to get interrupt-driven bulk transfers to work, which are required by any sane implementation of the ethernet driver (otherwise there is no way for the system to be actually notified when an Ethernet packet is received). A hub status notification is an interrupt transfer from a USB hub that provides a bitmask specifying ports on which status changes have occurred. These transfers first needed to be emulated for the root hub, which needed to be treated as a special case and implemented by using the special "port interrupt" that can be enabled in the DWC Core Interrupt Mask Register. Next, these transfers needed to be implemented for actual hubs, which I have yet to get fully working yet to due the various quirks of interrupt transfers and the lack of documentation for the hardware.
Last night I had worked on cleaning up various parts of the USB code, including making sure the interface to the USB core driver was appropriately documented. This will make it easier for myself and others to create USB device drivers that use this interface.
This morning I changed the Host Controller Driver Interface a bit by combining hcd_power_on() and hcd_start() into just hcd_start(), since this simplifies the interface and we have no plans to do anything that requires calling these separately. I also moved the USB transfer request queue from the USB Core Driver to the Host Controller Driver. This is a more standard design and makes it easier for the Host Controller Driver to process USB transfer requests in its desired way.
For the rest of the day I continued trying to get interrupt transfers working properly, using the status change endpoint of the integrated USB hub as a test case. One problem was that for interrupt transfers, the hardware channels are apparently (based on the Linux source code) not supposed to be programmed to transfer more data than can be transferred in one microframe. Fixing this did not solve all the problems, though. Currently, the only interrupt we are enabling when an interrupt transfer is started is the "channel halted" interrupt. This interrupt seems to be set after each USB packet (not necessarily a full transfer) has completed or failed. The interrupt handler must then either continue the transfer or perform any actions needed to abort or complete the transfer. This is similar to the way this interrupt works with synchronous control transfers, but with interrupt transfers there apparently are not supposed to be multiple transactions per transfer. The part we are still caught up on, however, is that interrupt endpoints respond with a NAK packet when they have no data to send, thereby triggering the "channel halted" interrupt. The logical thing to do is to restart the transfer; however, this simply causes the interrupt to trigger again, causing about 50000 interrupts per second. This is obviously unacceptable, and we want the USB controller to handle this itself and only interrupt the CPU when interrupt data is actually available. So we will need to continue to figure out how this undocumented hardware supports interrupt transfers, by experiment and by reading the extremely complicated Linux driver.
This afternoon Tyler and I talked with Dr. Brylow about USB and sound support. He seemed to think we had made a lot of progress, but we all continued to be frustrated by the lack of documentation for the hardware and the extremely high complexity of the standards (and non-standards) involved. We also talked briefly about how the USB device model is incompatible with Xinu's static device model (as I mentioned in yesterday's entry). We agreed that at this point we should preserve Xinu's current device model and add in some sort of workaround to allow for USB devices.
This morning I spent some time examining the Linux driver for the DWC host controller. I was eventually able to determine that I had misunderstood the action that software needs to take to perform interrupt transfers. I had assumed that it was possible to have the CPU be interrupted by the USB Host Controller only when an interrupt transfer had completed with data (e.g. new mouse data, new keyboard data, or hub status change data) or an error had occurred. However, by enabling verbose debugging messages in the Linux driver and running some tests with the internal hub, I observed that the Linux driver, which was written by the hardware vendor itself, did not actually do this. Indeed, any interrupt transfer to the hub's status change endpoint when no devices were being attached or detached completed almost immediately with a NAK response. The driver then repeated the tranfer about 4 times per second, consistent with the bInterval value of 12 in the Endpoint Descriptor. This is software-based polling, which is even more inefficient than the hardware-based polling that I was expecting to be available. It was surprising to find this was the case, since this seems to be a major flaw in the design.
Either way, I had to change Xinu's driver for the DWC host controller to use this software polling. To keep the code simple, I made the code create a thread for each transfer request that needs to be polled. Each such thread repeatedly sleeps for the appropriate number of milliseconds and retries the transfer. This method gets the polling working without the high complexity of the Linux driver, which uses various per-endpoint queues spread out among thousands of lines of code that are difficult to understand, especially without access to any internal documentation.
With the above software polling method implemented, the interrupts from the internal hub are limited to the polling rate. Therefore, the high-speed IN interrupt transfers are essentially working as expected. Devices can be attached and detached from the USB ports and detected within about 1/4 of a second. The next tasks will be to start work on the Ethernet Adapter driver and HID keyboard driver. I will work on the former, since Tyler wanted to write the HID keyboard driver. Unfortunately, both are expected to require some (hopefully small) changes to the Host Controller Driver, due to the need for high-speed bulk transfers and low-speed IN interrupt transfers, neither of which has been tested yet.
Week 5 (June 24 - June 28)
On the weekend I spent some time improving the USB Host Controller driver, including documenting the registers that it uses and implementing support for asynchronous, interrupt-driven control transfers. Implementing control transfers in this way actually reduced the code size (and arguably simplified the code) because then all the code that performed synchronous transfers could be deleted, thereby sending all transfers on the same code path. Furthermore, threads now do not waste as much time polling the Host Controller's channel interrupt register for changes when executing control transfers.
I also fixed a problem where the actual size of OUT transfers was not calculated correctly, due to the unexpected behavior of one of the Host Controller's registers.
After these improvements to the Host Controller driver, it was possible to continue working on the driver for the SMSC LAN9512 USB Ethernet Adapter. Dr. Brylow had tried to contact SMSC to get approval to access their Programmers' Reference Manual, but apparently such a document doesn't even exist. Therefore, we must use the Linux source code as a substitute for the nonexistent documentation. This will be difficult because the code reads and writes from the hardware in various places with no explanation. For each such place we need to determine what the purpose of the hardware access is and whether we need to include it in Xinu's driver. However, one thing we have noticed so far is that the ethernet adapter integrated into the Raspberry Pi does not have an EEPROM attached to it. This may simplify the driver somewhat as it will definitely not have to support reading data from the EEPROM, which the Linux driver does.
Besides the above, I spent much of today preparing slides and content for the talk that all the REU students need to give on Thursday. My talk will give an overview of our project and then focus on USB support, while Tyler's and Farzeen's talks will focus on aspects of our project other than USB support.
Today I mostly worked on the driver for the SMSC LAN9512 USB Ethernet Adapter. I continued working on the initialization code and tried to better understand what the various hardware registers do, based on their name (which often is an undocumented acronym or abbreviation like "COE" that needs to be figured out from context) and how the Linux code reads and writes to them. I also spend some time reading the bcm4713 and ag71xx drivers, which are the two Ethernet drivers currently included in Xinu. Based on this, I was able to begin implementing additional required functions, such as etherControl, etherRead(), and etherWrite(). However, these had to be modified to take into account the specific hardware, including sending and receiving packets via USB bulk transfers rather than direct memory access.
One thing worth noting is that Ethernet devices in Xinu are described by a `struct ether' that is shared between both of the existing drivers. However, this structure was designed for Ethernet devices of a particular type, in particular ones that use direct DMA and have a pointer to memory-mapped Control/Status registers rather than a location on a bus like a USB device does. Although it looks like it will be possible to use it unmodified for the SMSC LAN9512 driver, not all fields will be used, which could lead to some confusion.
As expected, several problems with the USB Host Controller driver had to be fixed. One of the hardware registers (transfer size) does not work as expected and a workaround had to be added. Also, I found that the hardware sporadically issued "frame overrun" errors on IN interrupt transfers. I do not yet know for sure what the best way to fix this is, but my interpretation at this point is that the hardware does not like periodic transfers being issued near microframe boundaries (which, on high-speed USB occur every 125 microseconds). Simply re-trying the offending transaction makes the error go away.
Based on the Linux driver, Ethernet packets cannot quite be read and written raw to the bulk endpoints on the SMSC LAN9512 device. Instead, they must be wrapped by a couple of fields that contain device-specific data. Multiple packets per USB transfer are apparently supported.
Towards the end of the day I was able to get basic receive/send functionality working. It was then possible to bring up the network interface and then ping it and get a response. It therefore should only be a matter of time before we have adequate networking support, although it may be impossible to get it working 100% reliably due to the difficulties of working with undocumented hardware.
Farzeen is working on code for "turtle" graphics. However, I did not actually see either Farzeen or Tyler today.
Today I continued to work on the Ethernet driver and the USB Host Controller driver. I found that Bulk transfers behaved slightly differently from how I expected because the Host Controller sometimes transferred more than one packet before halting the channel. This is to some extent a good thing, although it's not consistent with the other transfer types and required re-writing some of the Host Controller driver. After doing this, bulk transfers with size greater than 1 USB packet (512 bytes in this case) appeared to work.
I enabled the so-called "turbo" mode on the Ethernet adapter, which allows multiple packets to be received in a single USB transfer. However, I haven't yet been able to verify that this feature is actually being used by the hardware and is working correctly.
There still exist various problems apparent from merely sending Ping requests to the Raspberry Pi, including packets being dropped or corrupted and the system occasionally crashing. I also spent some time trying to figure out why Ping and ARP replies from Xinu contained extra data appended to them. I eventually was able to determine that this was a bug in Xinu's platform-independent networking stack. As usual it was disappointing to find this bug, since I had originally believed that Xinu had been well-tested on other platforms.
Also, this afternoon I attended the required training session about research ethics.
I also talked with Tyler about the presentations tomorrow. Tyler suggested that I do not need to give as much background information in my talk, since I will be giving my talk after Farzeen and Tyler give theirs.
Today was the day of the "mini-presentations". Tyler mostly gave an introduction to the project and talked about sound and HID support, Farzeen mostly talked about graphics support, and I gave a little more background information about the project and mostly talked about USB and networking support. Since the talks were only 8-10 minutes each, we didn't have time to go into much detail.
For the rest of the day I continued to work on getting sending and receiving packets to Xinu on the Raspberry Pi to work reliably. I am currently just using "ping" with various packet sizes and intervals, since there is little reason to test more advanced networking functionality such as TCP until ping is fully functional with no apparent bugs, regardless of the packet load.
The problems with dropped and corrupted packets seemed to be fixed, at least temporarily, by using only 1 USB receive and 1 USB transmit request at the same time. The problems with the OS crashing and some packets being duplicated have been much more difficult to track down. I found and fixed two bugs in the preliminary code I had written on Tuesday (not checking frame lengths on received packets properly, and enabling interrupts at an unsafe time) that both could cause the OS to crash. However, the OS will still crash if many packets are sent to it, indicating there are other bug(s). It is very hard to debug because the only mechanism we really have to debug the code is to print stuff over the UART, but the UART is very slow compared to the CPU speed and inserting a print statement can easily make the code 10-100x slower, thereby making it difficult to track down bugs having to do with low-level, time-dependent details like interrupt handling. To try to track down the bug(s) I am currently examining some low-level details, such as stack alignment and whether certain interrupt handlers should be allowed to execute re-entrantly.
I started off today by attempting to understand and document more of the registers of the SMSC LAN9512. This was partially successful and I was able to document some important registers or flags, then use this to remove unneeded reads and writes from the device initialization code. However, there are still some flags, such as HC_CFG_BIR, whose purpose I have not been able to determine yet.
I spent the rest of the day trying to debug the problem with the OS crashing after receiving many Ethernet packets. I have not been able to solve this problem yet, but I am thinking it's because of memory corruption, which means the bug could be virtually anywhere in the operating system. It also could potentially be due to DMA to/from the USB controller not acting as expected. I have looked into what the Linux code does for DMA but it is very abstracted (to deal with all architectures, or all ARM architectures depending on the specific file) and involves thousands of lines of code, so I haven't yet been able to pick out the bits that are actually relevant for our purposes.
Week 6 (July 1 - July 5)
Today I continued to try to solve the problem with the OS crashing after receiving too many Ethernet packets. I mostly on trying to figure out exactly how DMA (Direct Memory Access, where devices directly read or write from system memory) is supposed to work on this hardware, and in particular whether our code in the USB Host Controller Driver does it correctly. It has been extremely difficult to do this, mainly because there is no single source that actually documents this. What I've been able to find out so far is the following:
- The specification of the ARMv6 architecture defines a DMA controller at the ARM architectural level. However, this is apparently only used for transferring data to/from TCMs (Tightly Coupled Memories), which are internal to the ARM processor. Because the USB Host Controller we are dealing with is outside the ARM processor, this DMA controller is irrelevant to us.
- Broadcom's documentation for the BCM2835 ARM Peripherals specifies another DMA controller that operates at the level of peripherals. This sounds relevant to us, but the documentation says that devices that are bus masters can "satisfy their own data requirements". Our interpretation is that the USB hardware is a bus master and therefore this DMA controller need not be used.
Therefore, there should be no need to interact with any DMA controller(s) to perform DMA to/from the USB Host Controller, and this is of course supported by the fact that our code up until this point has mostly been working. But, caches and the memory map also need to be considered, since DMA can cause problems with cache coherency:
- The specification of the ARMv6 architecture, which includes the ARM1176JZF-S we are working with, states that ARMv6 implementations must include data and instruction caches at the L1 level of the cache hierarchy, but the specific details of these caches are implementation-defined. The reference manual specifically for the ARM1176JZF-S describes the implementation of the L1 caches specifically on the ARM1176JZF-S. Significantly for us, it states that both the data and instruction L1 caches are initially disabled and must be explicitly enabled by software by writing to a certain register in the System Control Coprocessor. This means that the Xinu kernel, until this point, has not been using the L1 data or instruction caches. But if, hypothetically, the ARM was indeed using the L1 data cache, then this could in principle cause DMA to not work correctly due to cache coherency not being maintained; after all, the L1 data cache is apparently internal to the ARM processor, and other devices located on the system (AMBA) bus are not aware of it.
- The specification of the ARMv6 architecture describes a number of memory attributes, including "normal memory", which may be either shared or non-shared. Shared normal memory is cache-coherent, so one might expect that we would need to use this type of memory for DMA. However, these memory attributes appear to be documented only for TLB entries, which are irrelevant for Xinu because it does not use virtual addresses. I was unable to find any documentation for how to access shared normal memory via physical addresses.
- The BCM2835 ARM Peripherals document gives a couple more clues about the memory map and caches. It states that ARM physical addresses are actually translated to bus addresses. It also states the BCM2835 has a L2 cache, and it is apparently outside the ARM processor and used "primarily" by the GPU. The highest 2 bits of the bus address control caching behavior. The documentation is not clear about this, but my interpretation is that this is L2 caching behavior only. Furthermore, based on a diagram in this document, 0-based ARM physical address are actually translated to 0xC0000000-based bus addresses, which are not cached by the L2 cache.
We can therefore conclude that the ARM code running on Xinu up until this point uses neither the L1 or L2 caches for data. However, we need to consider whether the L2 cache is used when DMA is performed by the USB controller. Addresses from the USB controller likely do not go through the hardware labeled as the VC/ARM MMU and therefore must be bus addresses, not ARM physical addresses. This is consistent with the statement "Software accessing RAM using the DMA engines must use bus addresses (based at 0xC0000000)."--- that suggests that DMA addresses need to be specified as bus addresses in the region not cached by the L2 cache, which makes sense if the ARM processor does not use the L2 cache.
However, to confuse things further, giving a 0xC0000000-based bus address to the USB controller for DMA does not actually work, as it cause bus errors to be reported by the USB controller. The Linux kernel uses 0x40000000-based addresses, whereas our code previously used 0-based addresses. But, changing the addresses to be the same as those Linux uses had no effect on observed behavior, including the occasional crash when receiving Ethernet packets. Furthermore, it is not clear to me, at this point, how DMA can be performed correctly if the USB controller is using the L2 cache but the ARM is not.
Today I tried to narrow down the focus of the bug I've been searching for since last Thursday. Its symptom is the OS freezing/crashing after receiving too many Ethernet packets.
To narrow the possible scope of the bug, I stripped out Xinu's networking code, leaving just the USB and Ethernet drivers. With this configuration, packets are received by the USB and Ethernet drivers but are intentionally ignored and not passed to Xinu's networking stack. Note that this configuration implies that no packets are ever sent by Xinu, including ARP replies, so I had to manually add an ARP entry to the computer that I connected to the Raspberry Pi in order to get Ping packets to be sent. After doing this, the bug still occurred, which narrowed down its scope to receiving packets only.
I then attempted to determine whether the bug really involved the Ethernet adapter/driver or whether it was a problem with USB itself. To do this, I stripped out the Ethernet driver, made the code not even enumerate the devices attached to the SMSC LAN9512 hub, and added code that repeatedly submitted GetDescriptor control messages to the SMSC LAN9512 hub. This configuration excluded all networking-related code paths and exercised the USB code. With this configuration, the bug, or at least another bug with the same symptom, still occurred. Thus, the focus of the bug was narrowed to only the USB code.
I did, however, make two observations I could not explain:
- Exactly every 512th GetDescriptor control message left the buffer for received data unmodified, despite returning success indicating that the requested number of bytes were transferred. This return status is ultimately based on the DWC USB Host Controller indicating, via the Transfer Size register, that a certain number of bytes were transferred. This probably is related to some hardware quirk that I have yet to work around.
- Changing the DMA address provided to the DWC USB Host Controller to be 0x40000000-based instead of 0x00000000-based made the observable bug go away in the context of the GetDescriptor control messages test. Therefore, it appears that the information I was researching yesterday about DMA could be relevant. However, making this change in the original code did not solve the bug in the context of Ethernet packets being received. This could mean that this "bug" is actually be caused by more than one independent factor. I am changing the DMA addresses to be 0x40000000-based to rule out this one factor, especially since Linux appears to use that value. However, the Broadcom documentation is very unclear about the properties of this memory region. It's stated to be "L2 cache coherent (non-allocating)". I would normally assume that this describes multiple L2 caches being coherent with each other, but this system in fact only has one L2 cache, so this might instead refer to the L2 cache being coherent with main memory.
Today I continued trying to fix the same bug I've been working on for several days, and towards the end of the day I believe I was able to finally fix it. The cause was that the initialization code I wrote for the Synopsys DWC OTG USB Controller did not explicitly set up FIFOs for the controller, thereby causing the defaults chosen by Broadcom's build of the Synopsys block to be used, which evidently do not actually work properly. My current understanding is now that the DWC has a certain amount of internal memory set aside for 3 different FIFOs: Receive, Non-periodic Transmit, and Periodic Transmit. These FIFOs are presumably used to buffer data being sent or received over the USB in a first-in-first-out manner. However, I had not previously included code to configure these FIFOs for several reasons:
- I had thought that the FIFOs were only used in Slave mode, not DMA mode, since one might expect that the hardware can just use the DMA region instead of storing data in a FIFO. It turns out this is not the case, and the hardware still uses the FIFOs in DMA mode.
- Even if the FIFOs were used, I would have assumed that the FIFOs would be configured to working default values after the DWC was reset. Instead, the actual default values cause USB transfers to work maybe 99.8% of the time and then cause very difficult to diagnose memory corruption the other 0.2% of the time. My understanding is that this is Broadcom's fault because based on the BCM2835 ARM Peripherals document, Broadcom configured several relevant parameters, including the default size of each FIFO and the total space available for FIFOs, when they instantiated the Synopsys block for their SoC.
- As mentioned in many other entries on here, the DWC controller is simply not documented. The only real clue I had that configuring the FIFOs needed to be done was other source code, including Synopsys' Linux driver, which is difficult to understand and extract the essential pieces of for many reasons, including enormous code length, many abstraction layers, and support for many features (including numerous features directly affecting the hardware initialization code) I am not interested in supporting in Xinu's driver.
In tests this morning before applying my fix for the bug, I observed that the DMA address register on the DWC controller, and several other Host Channel registers, would periodically become corrupted with some of the bytes of data received over the USB (e.g. bytes from the payload of a Ping packet). My interpretation is that this was caused by the overflow of a FIFO within the DWC controller. But then since the DMA address register become corrupted this caused corruption of system memory, thereby potentially crashing the operating system, depending on the specific memory region being corrupted. Furthermore, I believe that the problems where exactly 1 in 512 GetDescriptor messages were not working and where some Ping packets were received as duplicates were actually both caused by this same problem, as this behavior is most likely consistent with a FIFO overflowing at regular intervals and causing DMA to miss the target buffer.
To prevent further problems like this I will need to more carefully evaluate any differences in the functional behavior of my driver code versus existing driver code that is known to "work". This however does not change the fact that this can be very difficult to do, for reasons mentioned above. Also I believe that given that I do not have documentation for the hardware nor the actual source in Verilog or whatnot, it is literally impossible to write a driver that is guaranteed to work correctly in all cases other than ones explicitly tested. But this is of course a very common problem in driver development for open-source operating systems.
This morning I improved the documentation for the registers of the DWC controller (in the header file declaring them). This was crucial because over the past week or two I've had to change my understanding of what some of the registers do, thereby making some of the existing documentation I wrote incorrect or incomplete.
I then concentrated on running Xinu's test suite, including all networking-related tests. Some of the networking tests rely on being able to put the Ethernet device into loopback mode, which I had not implemented for the SMSC LAN9512. Although this is never done in the Linux driver, there is in fact a flag named "MAC_CR_LOOPBK" that appears to enable loopback mode. After implementing support for this, I was able to run the tests.
Based on the test results and some troubleshooting, the first problem I had to fix was a line of code in the USB Host Controller driver that did not correctly determine that a USB transfer had completed when the transfer size happened to be an even multiple of the maximum USB packet size to the corresponding device endpoint. The second problem that I had to fix was that the Ethernet Driver and Ethernet Loopback Driver tests (which are platform-independent code that I did not write) both contained an off-by-one error causing a buffer overflow.
I then examined the Ethernet Loopback driver and fixed various bugs, including leaking of semaphores, failure to restore interrupts in error paths, and incorrect or poorly written comments. None of these fixes likely had an effect on the results of the specific tests included in the test suite, however.
At this point all the relevant tests "passed". However, the OS would sporatically crash while running the test cases. This is going to be another problem that will be extremely hard to debug. Fortunately, it happens even in the Ethernet Loopback test, which has nothing to do with USB at all. Furthermore, the crash does not seem to happen if printing the test results is disabled (note: this still needs to be confirmed more rigorously). Therefore, this might be a problem with the UART driver; for example, it might not correctly handle timeouts or overruns, which could be a problem because the printing of the test results happens concurrently to running the following tests, which in some cases disable interrupts for a relatively long time.
This afternoon I reviewed the UART driver and fixed some minor problems. but none appeared to have anything to do with the above-mentioned bug.
Week 7 (July 8 - July 12)
This weekend I had made a couple minor fixes to the code. For example, semcreate() and bfpalloc(), which are platform-independent code that I did not write, contained resource leaks encountered only on error paths. I also continued to debug the problem noted on July 5 where the OS would crash while running the test cases.
This morning I continued trying to solve the problem noted above. It turns out that it was indeed a UART problem, and the OS did not actually crash; instead, UART transmit interrupts were disabled, which simply made the OS unable to print messages over the UART. This happened because uartInterrupt() cleared the transmit interrupt after handling it, when in fact this could clear a new interrupt instead of the one that was just handled. After doing this, the full Xinu test suite seems to execute reliably; however, there of course could still be bugs I just haven't observed yet.
I also adjusted the stack sizes and names of the threads created by the USB subsystem. I had originally only given the USB transfer scheduler and defer transfer threads stack sizes of 1024 bytes, but more recently I had also made it possible for both of these threads to be interrupted. This therefore requires that the stack size be increased to a size large enough for allow for all interrupt-handling code to execute using it. I also found that I had previously named the USB transfer scheduler thread "USB Transfer Request Scheduler Thread" which was in fact longer than 16 characters defined as the maximum thread name length in Xinu, thereby causing it to be truncated, so I fixed this.
Furthermore, I observed that the thread name was originally not actually truncated correctly because all versions of create() in Embedded Xinu use strncpy() incorrectly, thereby creating a string that is potentially not null-terminated. Upon further examination this problem appeared in other placed in Embedded Xinu as well. To fix this I added strlcpy() to the C library and changed all offending strncpy() calls to strlcpy().
In the afternoon I implemented support for parsing the ARM boot tags ("atags"), which are provided by the bootloader in a standard format starting at memory address 0x100. At this point, the only atags that Xinu cares about are the available physical memory (ATAG_MEM) and the Raspberry Pi's serial number (ATAG_SERIAL), the latter of which is used to generate a unique MAC address for the Ethernet adapter. The tags also include the command line passed to the Linux kernel (ATAG_CMDLINE), but Xinu does not yet need any information from it.
I also added READMEs for libxc and the SMSC LAN9512 driver and documented the members of `struct platform'.
This morning I added a mechanism by which platforms can override specific functions in the C library for optimization purposes. We do not plan to use this for ARM, but before I re-wrote the C library, there was an ugly "ifdef" in memcpy.c that skipped compiling it specifically for the Intel SCC platform, so that an optimized implementation could be used. That was a bad solution because the number of "ifdefs" will inevitably grow as more platforms are added, so I added a way for the build system itself to take care of omitting source files declared in the platform-specific Makefile.
In the morning I also saw Tyler for the first time in a while, and he had found that the USB code was not correctly working with his keyboard. I was able to determine that the keyboard was really slow and issuing a lot of "NAK" packets when it wasn't ready to reply, which were evidently not being handled correctly by our USB Host Controller driver specifically when the transfer involved split transactions.
For the rest of the day I mostly focused on improving Xinu's build system. As I've noticed over the past few weeks, there were some major issues with the build system, such as
- Source files were not recompiled when headers were modified.
- The kernel was not correctly recompiled when the platform was changed.
- The version string was almost never updated, even though it's supposed to be updated every time the kernel is built.
- Editing the toplevel Makefile to set parameters such as the platform was encouraged, even though this file is part of the sources and is checked into version control.
- The various platforms/ directories contained duplicated linker scripts and Makefile definitions. For some of the MIPS platforms there were files that were simply copied and had comments referring to the wrong platform.
- There was no explanation of chosen compiler flags.
- There was no standard way to expose to C code what the target platform or architecture is.
I was able to fix all the problems noted above, although I haven't been able to test the SCC or various MIPS router platforms yet. The SCC platformVars file is a bit more complicated and may need a few more changes.
This morning I tried to build Embedded Xinu for other platforms, including the various MIPS platforms and the SCC. I was not able to build the full SCC version due to a missing platform-specific tool, but I believe I caught most of the regressions introduced by the changes I made to the build system. I also adjusted the way that the MIPS platforms share build code; namely, since the e2100l platform is big-endian MIPS, the MIPS template is now designed to work for either little-endian or big-endian MIPS platforms. Furthermore, I added the udelay() and mdelay() functions to a new file in system/, since there were now two separate implementations of them in Xinu.
Tyler had found another keyboard that was not working correctly with Xinu. Or at least, you'd think it was a keyboard. From the point of view of USB it was not a keyboard at all, but rather a hub that just happens to have a keyboard attached to it. That wasn't really the problem, though, since we have a hub driver to deal with that case. But the devices attached to this hub, including the actual keyboard device, were not showing up due to a problem with the Host Controller Driver. The hub inside the keyboard is attached as a full-speed device and therefore must be accessed using split transactions, even when doing interrupt transfers. Apparently, there are many quirks and special cases in the USB standard and/or in the DWC hardware for this case. The USB standard has many pages detailing the scheduling rules for such transactions. I was able to "fix" the problem just by restarting split transactions when they failed too many times, but this is really just a temporary solution. However, this may ultimately be the best we can do because of the high complexity of USB and the lack of documentation for our Host Controller.
In the afternoon I got tired of Xinu's UART drivers being a mess and moved the code common to the 3 UART drivers into a different directory. This halved the size of the code unique to each UART driver.
In the morning I started to plan merging our code into Dr. Brylow's SVN repository. This will not be trivial for several reasons:
- We have been using git, not subversion.
- The directory corresponding to the SVN trunk actually corresponds to a subdirectory of our git repository, not the git respository itself.
- Partly because of the above, and also the way we integrated the code at the end of May, our history is not directly based off of the SVN repository.
To work around this I found out it's possible to rewrite the git history (with `git filter-branch') to include just the needed subdirectory. I then was able to generate a series of over 500 patches, each of which corresponds to a git changeset, that we should be able to add to the SVN repository somehow, following one or more "glue" patches to reach the initial patch from the SVN trunk. However, I haven't done this yet because Tyler and Farzeen currently have code not merged with the git master branch. (At the lunch today we talked a little bit about this problem with Dr. Brylow.)
In the afternoon I focused on fixing up some bugs that we had put off. First, the code to power on the USB controller seemingly went into an infinite loop if the USB controller was in fact already on, which can happen when starting a new kernel without power-cycling. To fix this, I ended up factoring out the BCM2835-specific functionality from the USB Host Controller Driver, then implementing BCM2835 power management a little more thoroughly in a separate file. However, power mangement on the BCM2835 uses the mailbox mechanism for communicating with the GPU, which the framebuffer driver does as well, so there is still an opportunity for sharing some of this code with the framebuffer driver.
I then fixed the "kexec" functionality, which is supposed to load a new kernel over the UART and execute it. It's essential to get this working if we are to use Xinu itself to implement a network bootloader for Xinu (except that would of coures load the kernel over the network, not the UART). It turned out that I had just made a mistake when writing the assembly language loop that copies the new kernel to its final location. However, I also used this chance to factor the kexec code a bit and make it possible to implement on other platforms.
I also determined that the "reset" shell command cannot be reasonably implemented just by jumping to the ARM reset handler, or address 0. This is because certain kernel variables are not re-initialized to their default values. To fix this, I made "reset" instead set the watchdog timer for 1 millisecond, causing the hardware to actually reset itself shortly thereafter. However, to do this I had to add code to support the BCM2835 watchdog timer.
Also in the afternoon, Dr. Brylow talked with Farzeen about the turtle graphics program she is working on, mainly the possible uses for it and advantages compared to other tools/platforms that may be available. I also showed Dr. Brylow some of the work I had done on USB, and we agreed that soon I need to begin summarizing the work, e.g. on the Embedded Xinu wiki, and not long from now in the final paper and poster for the REU program.
Today I focused on adding a DHCP client to Embedded Xinu. DHCP (Dynamic Host Configuration Protocol) support is needed to implement the network bootloader. The overall workflow of the network bootloader will be as follows:
- Embedded Xinu, acting as the network bootloader, boots directly from the SD card, then opens the Ethernet device.
- The DHCP client runs on the network device and returns the assigned IP address, subnet mask, gateway, boot file, and TFTP server IP address. This requires that a DHCP and TFTP server be set up on the network.
- The TFTP client (yet to be added) requests the boot file from the TFTP server. The intention is that this be an Embedded Xinu kernel.
- The boot file is loaded into memory, a stub is jumped to that copies it to address 0x8000, and the new kernel is entered.
The purpose of this is to make it possible to set up a pool of Embedded Xinu backends, similar to what Dr. Brylow has set up with the MIPS routers, that can be used to test student's modified version of Embedded Xinu on demand. The difference is that the MIPS routers' firmware support DHCP and TFTP, so the firmware (on the device) could act as the bootloader for Embedded Xinu (booted from the network), whereas with the Raspberry Pi we are going to make Embedded Xinu (booted from the SD card) act as a bootloader for Embedded Xinu (booted from the network).
Step (1) is essentially complete because the Ethernet driver is working to the extent that we've tested it, and step (4) is essentially complete as well because of the "kexec" functionality I've implemented. Therefore, today I worked on step (2).
Dr. Brylow had stated that the research version of Embedded Xinu (in the SVN repo) had a DHCP client, but it turns out that it is only present in an older version of Embedded Xinu and the code is not source-compatible with the current Embedded Xinu. This old DHCP client also contains a number of bugs and/or flaws, such as:
- It did not re-send DHCPDISCOVER packets if no DHCPOFFER was received in a certain amount of time
- It provided no way to access the information needed by the bootloader (bootfile name and TFTP server IP address).
- It duplicated code among the different DHCP request and reply handlers, thereby making it more difficult to fix other problems.
Furthermore, the DHCP client needs to be called from either netUp() or xsh_netup() to make it useful, but this code had been changed in the current version of Embedded Xinu.
Therefore, I spent today rewriting the DHCP client to be more robust, redesigning the interfaces to provide the needed functionality, and integrating it into the current version of Embedded Xinu. To test it I ran "dnsmasq" on my laptop and just had it assign the Raspberry Pi an address in the [192.168.0.2, 192.168.0.254] range. The code is currently functional but could use a bit more review, especially a strict comparison with the behavior described by RFC 2131.
The next step, other than continuing to improve the DHCP client and various other code, is to add TFTP client support so that the boot file can be downloaded.
Week 8 (July 15-19)
Over the weekend I finished the initial implementation of all functionality required by the network bootloader. Like the DHCP client, the TFTP client that I had to work with was written for an older version of Xinu and was not well written; therefore, I ended up completely re-writing it. Throughout the process I reviewed various platform-independent code, including the UDP layer, and improved the documentation and fixed bugs. After doing this, I then re-organized the "kexec" functionality so that loading a kernel from the network can be handled better. Finally, I successfully tested the bootloader by running a DHCP and TFTP server on a computer connected to the Raspberry Pi. The exact booting process is essentially as described in last Friday's entry, except for convenience I was starting up the bootloader process interactively via the "kexec" shell command I had written (and re-written).
Today in the morning I reviewed various source files involved in network bootloading, then attempted to use Doxygen to build the automatically-generated documentation for Embedded Xinu. In doing this I fixed and improved the Doxygen-compatible comments in various files, and I am also considering using Doxygen groups so that one can, for example, easily browse to the documentation for all external functions included in Xinu's C library, the UART driver, or any other module, rather than browsing file-by-file.
Farzeen finally merged her work in the "graphics" branch into the "master" branch of the repository. For the most part this was just adding independent code, although there is one problem we have yet to fix where the shell's output does not go to the console due to the shell being created on the framebuffer device instead.
Finally, in the afternoon Dr. Brylow talked to us about possibly submitting a paper for the WESE (Workshop on Embedded and Cyber-Physical Systems Education), since they apparently really want papers and are extending their deadline. We agreed that we would try to prepare a paper by the deadline next week, although it may be difficult to write given that we're just starting to finish up programming various components and have not yet written any educational modules or done any classroom trials, etc. yet. Therefore the paper would primarily document the low-level details needed to write a simple operating system for the Raspberry Pi for educational purposes, along with our future plans for rolling out this platform for Operating Systems education as an alternative to the MIPS-based routers, and possibly Farzeen's plans to target much younger students with the turtle graphics program.
Today I primarily worked on the paper that we plan to submit to WESE. I also worked a little bit on continuing to improve the Doxygen-generated documentation from Embedded Xinu's source code, in particular grouping related functions and better documenting the behavior of some functions. Furthermore, I fixed an apparently long-standing bug in kill() where the thread table would be placed in an inconsistent state after killing a sleeping thread.
Today I continued working on the paper that we plan to submit to WESE. In the afternoon Tyler, Farzeen, and I met with Dr. Brylow to discuss the paper. He thought that especially after the sections I had written about various aspects such as interrupts, USB, Ethernet, and the system timer, we're on track to submit the paper. However it of course still needs more work, including the introduction, conclusion, related work, etc.
I also spent some time today going through the USB core and hub drivers and trying to identify any code that could be removed or simplified in order to ensure we're meeting our goal of having simple, easy-to-understand code. I eventually was able to make some changes, although it's difficult to strip the driver down more than it already is without removing required functionality. We could potentially save hundreds of lines of code by removing the 'usbinfo' shell command, debugging statements, and USB string descriptor support, although I think that would be a bad idea since those features are very useful when working with USB.
Since I was mostly waiting for Tyler and Farzeen to finish their sections of the paper and for Dr. Brylow to read through the current draft, I spent most of today working on the USB code a bit more. I moved all the code regarding USB strings and printing various information, which turned out to be over 600 lines, out of the USB core and into a separate file that can be omitted entirely if a special "USB_EMBEDDED" mode is enabled. Therefore, usbcore.c now contains only the code essential for actual operation of the USB core driver, which might make it a bit more manageable.
I also tried to fix the longstanding lack of support for properly detaching devices. In the current version, the unbind_device callback can be called at any time when a device is detached, but there is no easy way to actually safely terminate all ongoing USB transfer requests. Today I implemented a reference-counting mechanism such that when a device is detached, any new transfer requests to the device will be refused, and unbind_device will not be called until all currently executing USB transfer requests to the relevant device are owned by the device driver. This might seem like a nonessential point, but you certainly wouldn't want your operating system to crash when you unplugged your keyboard.
Based on the above, I was able to fix the hub driver so that, to my understanding, it handles detachment of a hub safely, although the only device I physically tested it on at this time was an Apple keyboard. I did not yet implement correct detachment of the SMSC LAN9512 yet since we don't have a device to test it with, and I also wanted to implement etherClose() at the same time, which I could not do at this point because it basically will require aborting in-progress USB transfers when the device has not actually been detached. (This is because nonperiodic USB transfers are not guaranteed to terminate at any given time, except if the device was detached in which case my understanding is that you can reasonably expect the host controller to detect an error condition and issue an interrupt regarding the transfer.)
Today I continued revising our paper and added an abstract section. I also made some minor changes to the code, such as moving SCC-specific code out of the platform-independent code and cleaning up the DWC host controller driver and some of the framebuffer driver code slightly. In the afternoon, Dr. Brylow gave us some feedback about the paper, and among many other things he recommended we make some sort of block diagram of how the relevant Raspberry Pi hardware appears to the software running on the ARM. Tyler also had some questions about USB since he found that interrupt transfers were not working as expected. I was not too surprised by this as there were some major issues getting them to work properly from the hub inside the Apple keyboard (which attaches itself at full-speed), and as mentioned in a previous entry, periodic split transactions apparently need special handling.
Week 9 (July 22 - July 26)
On the weekend I had continued to work on our paper, including adding a diagram. In the morning I reviewed our paper a bit, then started working on the poster that's due on Wednesday. In the late morning Farzeen, Tyler, and I met with Dr. Brylow about the paper. After continuing to work on the paper in the afternoon and evening, we were able to improve various sections and finally submit the paper (since today was the deadline).
Today I continued working on the poster due tomorrow. Also, following a short discussion that Dr. Brylow had with us yesterday about the UARTs available on the Raspberry Pi and seeing that none of us really understood the "Mini UART" that is apparently an alternative to the PL011 UART we are currently using, I attempted to switch Xinu over to using the Mini UART. It turns out that this UART is partially compatible with NS16550 devices, which Xinu already has a driver for (used on the MIPS platforms). However, it still took a few changes to get the Mini UART working, and given that the actual documentation for the "Mini UART" (in the BCM2835 ARM Peripherals document) is very poor and the device is not fully compatible with any standard device, we are considering continuing to use the PL011 UART rather than switching to the Mini UART, even though the PL011 UART is quite different from the NS16550 and requires a separate driver.
In the morning I added some final touches to my poster. I then spent a little time investigating standards-compliance in the Embedded Xinu code. In the past, Dr. Brylow and others have advertised Embedded Xinu as "ANSI C-compliant"; however it turns out that this is very much not the case, regardless of whether ANSI C means C90, C99, or C11, for many reasons (some in new code we've added this summer, but also many in the existing code). In my opinion this is very much to be expected because the C standard(s) are in some cases inflexible and unintuitive, and do not take into account some of the issues that show up when implementing an operating system; as a result, the codebases of real operating systems like the BSDs and Linux do not conform to any particular standard but rather make use of GCC extensions, when useful, that typically are also being picked up by other compilers and in some cases new versions of the C standard. Furthermore, Dr. Brylow stated that Embedded Xinu was originally called ANSI C-compliant because it did not use K&R-style C like the old Xinu, so to my understanding this was primarily a mis-application of the phrase.
For the rest of the day I mostly attempted to import our work into the Xinu Subversion repository. This is non-trivial because our work was done in git, is not directly based on the SVN trunk (basically we copied all the files from it at some point rather than starting from it, so there's no commit in our repository that *is* the SVN trunk), and also contains non-linear history (for example, I worked on a separate "usb" branch for a while, and Farzeen worked on a separate "graphics" branch for a while, but these eventually got merged into master, thus creating a non-linear history), and finally the SVN trunk really corresponds to a subdirectory of our repository. To partially solve these problems I used `git filter-tree' to extract just the needed subdirectory of our repository, then `git format-patch' to generate a series of linear patches that approximates our commit history.
I then created a branch in the SVN repository and applied the above patches. This unfortunately did not work quite as intended, however. First of all, even though I was using a separate branch that I thought would be essentially a temporary workspace before I merged my changes into trunk, it turns out that an email was sent out to the Xinu mailing list (which I was not on) every time I committed to the branch I created, which was problematic because I had over 600 commits to make. Secondly, some patches did not apply correctly due to SVN unhelpfully changing the contents of files by itself (via the magical SVN tags "feature"). After Dr. Brylow redirected all the emails to me I was able to re-run the script and make all the commits; however it still did not quite work out as intended and I'll need to try again sometime after figuring out how to properly deal with the SVN tags, and ideally preserve original authors and dates (Farzeen had quite a few commits) rather than having everything committed by me on the same day.
This morning I spent a bit more time looking into the SVN problem. It turns out that the patch sequence I was attempting to turn into SVN commits does not even apply cleanly when SVN is taken completely out of the picture. This is apparently caused by our git history being nonlinear, so `git format-patch' is not able to format a linear sequence of patches that correctly represents our history. This is a major problem for importing our work into SVN, since SVN apparently only supports linear history. As an alternative, I prepared a series of 17 linear commits, generally grouping changes by directory, that generates our git master branch from SVN trunk and summarizes the work we did over the summer (mainly detailing platform-independent code we changed, since any Raspberry-Pi specific files are completely new and don't really need any detailed explanation in the commit messages). This method also has the advantage that the number of "unimportant" commits is greatly decreased, so it's much easier to get an overview of what we did over the summer. However, this method throws away the original history, and due to inter-directory dependencies some the intermediate revisions are probably not in a working state.
At the same time, there is a discussion on the Xinu mailing list about moving the repository to git, and most people seemed in favor of it. If this does indeed happen then it will be easy to use our original history if we choose to go that route, and it will be easier to make future changes.
Also this afternoon I started looking into SD card support on the Raspberry Pi and am reading the "SD Host Controller Simplified Specification Version 3.00". It seems it may be much easier than USB to get working because the SD Host Controller Interface (including register layout) is standardized. However that still doesn't say much since the standard is fairly complicated, and there could be hardware-specific quirks.
Today I mainly continued to work on writing a SD card driver. Perhaps not surprisingly, it turns out this is more complicated than I had hoped. One might hope that when a SD card is inserted, an interrupt would be received to inform software of this, and then software could immediately begin reading and writing from the SD card. Neither of those are true, however. Although the SD Host Controller specification defines "card removed" and "card inserted" interrupts, the Arasan implementation used in the Raspberry Pi does not generate these interrupts correctly. (My understanding is that, strictly speaking, they're lying when they say this hardware complies to the SD Host Controller standard v3.00, for this and other reasons such as not bothering to implement the standard Capabilities registers. However I only have access to the "Simplified" specification because my "company" is not a member of the SD Association, so perhaps the full specification is different.) Therefore, software must use workarounds to detect whether a card is present or not, and I need to figure out the best way to do this. Secondly, SD cards are *not* ready to read and write from before several actions are taken by the software. These actions include:
- Setting the SD clock frequency. This, however, is programmed using one of several methods and also relies on knowing the clock being fed to the SD Host Controller itself, which is implementation-dependent and is not documented by Broadcom. Therefore, as usual I had to find the appropriate magic number in the Linux sources. As a further complication, the SD clock frequency apparently needs to be set to 400 KHz initially, but then increased to a card-dependent value later.
- Enabling power to the SD card. Not only is this not done automatically, but software must enable power in an implementation-dependent way (which is not documented by Broadcom, but I'm thinking it's basically the same as turning on the USB controller but uses a different magic number), then tell the host controller that power was turned on, which requires knowing the voltage being supplied, which again is implementation-dependent. At this point I just guessed that it's 3.3V.
- After the above steps, the card is ready to receive commands, but not data commands. Instead, software must perform various complicated configuration actions and work through multiple separate control flows to handle various special cases, such as cards complying with different versions of the SD standard, or with the MMC standard(s). Furthermore, SD actually provides the ability to have multiple cards on the same bus, so this adds another level of complexity to the configuration, even on the Raspberry Pi where there can be at most 1 card.
Week 10 (July 29 - August 2)
Today I primarily worked on preparing slides for my final talk on Thursday or Friday. Although Farzeen, Tyler, and I wanted to do a combined talk to present our information in a more logical way and have time to demonstrate Embedded Xinu running on the Raspberry Pi, we're required to all give talks individually. Therefore I'm planning to give an extended and updated version of the talk I gave at the end of June. But as was a problem for both the paper and previous talk, summarizing our work is difficult because most of our work to this point has been technical and very complicated, which is generally not what people are interested in hearing, even though this work is a necessary prerequisite to the primary end goal of having a laboratory environment and educational modules for teaching operating systems, embedded systems, networking, and other computer science concepts with Embedded Xinu on the Raspberry Pi.
In the morning I continued to prepare for my "final" talk. In the afternoon I presented my poster at the poster session.
Today I continued to prepare for my "final" talk. I also began writing documentation for the Raspberry Pi port of Embedded Xinu on the Embedded Xinu wiki. The challenge with writing this documentation is making it complement the source code, rather than duplicate the information contained in it (including in comments).
In the morning I gave my "final" talk. After the other talks were over at 2 p.m. I worked a little more on the Embedded Xinu Wiki.
Today was the final day of the REU program. The remaining talks were in the morning. I plan to continue working on "XinuPi" during the school year, especially the documentation and finishing any remaining work on drivers such as SD card, graphics, keyboard, and sound.