Raspberry optimized scratch, performance beta4

Since some weeks, a new beta scratch version for RPi is around, announced on raspberrypi.org.

The work done by Tim Rowledge is in the area of performance. First impression is ‘it is faster’ in editing and runtime. So it is time to measure some performance numbers.

I measured timings for three systems:

  • RPi-1.4-scratch is current scratch/squeak as on raspian, clocked at 1GHz.
  • RPi-1.4-beta is current version of beta scratch (2014-06-13).
  • win-1.4-scratch: To compare with a more powerful system, I have run some of the tests on a laptop machine, running scratch 1.4 from scratch.mit.edu, windows 7, 4 core processor 2.2GHz

Update: jamesh asked to repeat the tests with ‘HW cursor implementation for X’ xf86-video-fbturbo – video driver. Sounds complicated, but installation was straightforward. The tests executed with this modified X-system are marked with ‘X’

  • RPi-1.4-scratch-X, modified X  running RPi-1.4-scratch
    RPi-1.4-beta-X , modified X running RPi-1.4-beta.

Results

In loops and calculations, the new scratch version on RPi even outperforms my windows-machine running legacy-1.4-scratch from mit.edu. On Pi, it needs only 50% execution time compared to current pi-scratch. This is impressive good.
For the other tests, execution time is down to some 85%, 80%.

One exceptional improvement is in these cases where variables are displayed on stage. This slows down current scratch, but in beta and with the modified X it executes 1o times faster (move2_presentation).

Especially for the graphic operations, improvements are noticeable.

performance_summary_2

The results are blue, dark yellow for raspbian system, and light blue, light yellow for the modified driver.

The modified driver in X results in better performance, execution times are 0.8 times only in most cases. The quite simple rotate and move-examples do not benefit too much, but whenever it gets crowded on stage it is noticeable.

For scratch remote sensor connections, the improvements are not so impressive and I assume it is based on overall performance optimizations. But the tests show that remote connections for broadcasts or variables need 40 ms for sending or receiving. Which is not bad. The great improvement in pingpong_remote is due to the comparison of presentation mode operations. Here, the RPi-1.4-scratch is much slower in presentation mode. Compared with full-stage mode, this is in the 80% range of other results.

The scratch projects are in performance.zip.
For the scratchClient, see download page.

Graphic system (performance_rotate.sb)

Rotating sprites needs quite a lot of computation power. It needs rotating the sprite by an angle and redisplay the graphics. In order to avoid possible caching of calculated sprite graphics, I have choosen to apply extra ‘one degree’ rotations in between.

RPi-1.4-scratch  10.6 sec
RPi-1.4-beta 8.5 sec

Graphic system 2 (performance_rotate_say.sb)

Displaying the ‘say’-bubble is a challenge. The system needs to look for the solid icon inside the alpha background, and adjust the bubble accordingly.

RPi-1.4-scratch  22.5 sec
RPi-1.4-beta 14.0 sec. This is impressive good.

win-1.4-scratch 6.0 sec

Graphic system move, move2

The move sample I usually explain to the kids in school as a scratch-antipattern: while true; goto x,y; inc x; inc y; endwhile; This works, but movement speed is limited by cpu-usage. The second is movement of two sprites with variable display on stage. This slows down execution speed drastically in RPi-1.4-scratch. In this area, the beta is a class better.

move2

The presentation mode timings are

RPi-1.4-scratch 159.4 sec
RPi-1.4-scratch-X 119.2 sec  using the modified X driver

RPi-1.4-beta 17.6 sec.
RPi-1.4-beta-X 14.3 sec  using the modified X driver

 

win-1.4-scratch 12.1 sec

Scratch Sensor Network performance (performance_pingpong_remote.sb)

There are many assumptions on remote access for scratch timings. So I took the opportunity to measure some values.

It is not possible to measure time from a broadcast in scratch till it arrives in a remote system. It would need software ‘instrumentation’ inside scratch. But it is possible to send out a broadcast, and wait for a response coming back, using a remote scratchClient.

For the test, scratch script sends broadcast “ping”, and my scratchClient-software responding with “pong”. In scratch, this is repeated 200 times and time recorded.

RPi-1.4-scratch fullscreen 30.3 sec (!)
RPi-1.4-scratch edit mode 18.4 sec

RPi-1.4-beta fullscreen 16 sec.

The legacy scratch in fullscreen needs much longer than in edit mode, although the script animations cost some time. Strange.
The new scratch is 10 percent faster.

What does this mean on IO-Performance ? One event out, one in in 16sec/200 = 80ms or one way in 40ms. This is much faster than reported elsewhere. Not to forget: nothing else running around, no animations or alike.

When you want to run this test on your machine, load my scratchClient software, and use command line

cd ~/scratchClient
sudo python src/scratchClient.py -c config/config_pingpong.py

Scratch Sensor Network performance (performance_pingpong_sensor_remote.sb)

Similiar setup as in the broadcast example, but there are variable values send over the network.

analog

When scratchClient receives the ‘a’-value, it increments it by ‘1 and sends it back.

RPi-1.4-scratch 77.1 sec
RPi-1.4-beta 67.2 sec

win-1.4-scratch 25.0 sec

These values are very close to the broadcast-timings.

The scratchClient is same as for the broadcast test.

Scratch Calculations (performance_calculations.sb)

Simply a loop, and a few calculations.

RPi-1.4-scratch 41.3 sec
RPi-1.4-beta 20.7 sec (checked twice, real fast)

win-1.4-scratch 25.0 sec

Scratch Broadcasts (performance_pingpong.sb)

Sending broadcasts inside scratch. Remote sensor connections are disabled, and code is executed in presentation mode to avoid the script animations during executions.

RPi-1.4-scratch 83.0 sec presentation screen
RPi-1.4-scratch 75.0 sec full stage screen

RPi-1.4-beta 41.9 sec presentation screen
RPi-1.4-beta 84.5 sec full stage screen

win-1.4-scratch 50.0 sec