TITLE :
DIY A1 PCB WITH RASPBERRY PI AS CONTROLLER
YACTO-IMAGE TURBO-MODE KERNEL DEVELOPERS GUIDLINE
DESCRIPTION :
LIMITATIONS AND BOTTLENECKS regarding TURBO-MODE for RPI-based controllers with Raspberry Pi - technical
YACTO-IMAGE DEVELOPERS guidline
CONTENTS :
About Raspberry Pi
Transmission Performance issues
Reception performance issues
RAW TEXT :
About the controller
The Raspberry Pi (model B) in the rest of the text mentioned as RPi or RasPi,
is a low-cost computer designed for educational purposes, developed by the
Raspberry Pi Foundation charity. Its main component is a BCM2835 system-on-
chip by Broadcom, which features an ARM1176JZF-S processor running at
700 Mhz (1 GHz boosts), and a Videocore 4 GPU, capable of high-definition
video resolutions and support for OpenGL ES2.0. It also mounts a LAN9512
PHY by SMSC, with 100 Mb/s Ethernet capabilities.
The board provides the Ethernet RJ-45 socket, two USB-HS type A ports,
HDMI and composite video outputs, stereo Line headphone socket, and a
SD-HC card slot. Most of the BCM2835 signals (GPIO, UART, I2C, SPI,
PWM, display, camera, and so on), are exposed by a set of pin headers and
Camera Interface connectors. The model used for the benchmarks mounts
256 MiB of RAM.
The operating system is usually some flavor of Linux, but there exist at least
6-7 different choices, with Raspbian, a Debian-based Linux distribution
as most widely used with specific support for the Raspberry Pi. The platform is
controlled through a SSH connection, which makes negligible impact on the
performance. No other user software or services are running, except the SSH
connection and the benchmark executables.
Here are some benchmarkings for RPi which gives more insight into Rx/Tx issues, that is,
the problems which might occur while RPi talks to DIY boards and A1 chips.
Instead as single threads, like for the R2P_GW benchmarks, CPU usages were
collected as aggregate values from /proc/stat, while the network stack runs
at the system and interrupt levels.
Transmission performance issues
Similarly to R2P_GW, the RasPi generates /benchmark/output messages at
the maximum speed achievable, by not introducing forced delays. Timeouts
are disabled, and a single message actually resides in memory, being streamed
by the message loop.
The platform can saturate the host receiver at 20000 msg/s when
the message size is not greater than 200 B. The CPU is used less than 50%,
mainly by system processes (around 25%) and the topic handler (around 15%).
As the message size increases from 8 B to 200 B, the impact of (software)
interrupt requests grows, but stays below 10%.
Between 200 B and 500 B per message there is a sudden increase of the
CPU usage, saturated by interrupts and system processes, which limits the
throughput to 13000 msg/s. The topic handler usage stays around 15%,
which means that the Linux network stack has a substantial effect in these
circumstances.
With messages larger than 500 B there are no considerable changes in the CPU
usage. At 10000 msg/s interrupts have a share of 40% and system calls of
55%, while the topic handler uses the CPU for less than 5%. The bandwidth
gets close to 100 Mb/s, but it is still not reached at 10000 msg/s; indeed, the
idle time stays around 1% without growing.
Reception performance
The reception performance was measured by streaming messages at 14000 msg/s,
the maximum achievable by the host computer. As for R2P_GW, the reception
was first evaluated by buffering each new incoming message, and then by processing
the incoming message stream by skipping its contents.
Up to 100 B per message, the platform can receive all of the messages with low effort.
The CPU is idle for more than 40% of the time, with the topic handler using less
than 20% of the CPU time, and the system calls less than 40%. There is a
strange decrease in the effect of system calls at a message size of 50 B, probably
caused by some kernel optimization. The throughput stays at the maximum.
Between 100 B and 500 B, the CPU usage of interrupts increases over 40%, and
the CPU becomes saturated. The topic handler and system calls do not show
significant changes in their impact. After 200 B per message, the throughput
starts decreasing, but is still above 13000 msg/s.
With a message size beyond 500 B, the bandwidth is completely used. Software
interrupts use the CPU at 10%, while the effect of system calls keeps around
35%, and that of the topic handler decreases as low as 10%. The idle time
goes back to almost 50%.
The performance results of on-the-fly reception shows that bellow 100 B per message,
the platform can receive all of the messages with low effort.
The CPU is idle for 60% of the time, primarily used by system calls for less than 30%,
and the topic handler for 10%, the rest by (software) interrupts. Again, there is a
strange decrease in usage by system calls at a size of 50 B.
Between 100 B and 500 B per message, where the CPU usage of system calls
and interrupts increases up to 45% and 35% respectively, while the topic
handler stays slightly above 10%.
With a message size greater than 500 B, the bandwidth reaches the 100 Mb/s
limit, and the CPU load decreases. Software interrupts are steadily below 10%
as well as the topic handler, which keeps decreasing. System calls go down to
30%, and the idle time almost reaches 60% again.
CONCLUSION :
This is important when programming kernel for overclocking the boards and mining software on RPi,
since curently the more clock and power you bring to the boards (regardless of cooling), more errors you get,
so this might be a good guidline for future firmware and kernel improvements, for these things to have in mind.
In other words, there are RPi LIMITATIONS not A1 chip limitations, and in order to boost the hashing speed of desk(s) and rig(s) based on this chip (Concraft A1) and this PCB design (DIY 2xA1 board), this problem needs to be circumvented or exploited and it is the software issue, but off-course dependable on PCB fabric design.
The way it (RPi) communicates with daisy chained chips and boards populated with A1 chips is the principal bottleneck to reach TURBO-mode deployment, and cooling in this case is just a technical limitation for time-domain (long-term non-stop operation of desk(s) and rig(s) and should not be and issue for short-time speed trial boost, as a proof of concept, but currently it is mostly so.
THIS IS SO NOT A SOLUTION; BUT A GUIDLINE FOR SOMEONE TO FIND A SOLUTION FOR THIS.