I noticed today that the BRP data downloads were very slow and even going into backoff. Is it that E@H has so much GPU power at its disposal that the network is unable to keep up?
Maybe its a similar situation to S@H where there are so many hungry GPU hosts and insufficent network bandwidth to deal with them all at once.
I wonder if we could come up with some way to reuse some/all of the data files rather than having to download 8 x 4Mb files for every wu (yes I know this has been asked before).
Copyright © 2024 Einstein@Home. All rights reserved.
Download speeds slow for BRP data
)
Hi
The BRP4 work unit generator was improved recently and Is now doing a much better job to fill the task queues of volunteers. So I guess there is a bit more downloading going on right now compared to when the WU generator was too slow.
Anyway, as long as the downloads can be completed in time for the WU to run, I personally don't mind the downloads being slow, as long as you have a permanent Internet connection. If you are using some kind of dial up or pay by connection time, this is a real pain of course.
Unlike the GW search data, the input data for the BRP search is unique for that task and is completely consumed by the app: no other volunteer ( except for the validating "wingman") will get another copy of the data files your app gets.
This is because a certaIn pre-processing step (de-dispersion) of the raw input data happens on the project server, and the volunteer hosts get the de-dispersed data, not the" raw " telescope data.
Now, if de-dispersion was done on the volunteer hosts as well, the data could be reused across work units. However, the size of the raw input data that would need to be transferred (and the required RAM to process it) is prohibitive for consumer PCs, so the scientists ruled out this possibility for the moment.
CU
HB
Unfortunately while I have a
)
Unfortunately while I have a decent DSL speed it has download limits, so I tend to use the off-peak period for a couple of the machines. The off-peak window is only 6 hours, so download speed effects those machines as they try and replenish their 1 day cache.
Cheers,
MarkJ
BOINC blog
RE: Hi The BRP4 work unit
)
It would be interesting to learn a little bit more about the WU generation process and the stages involved. In particular, it feels to me as if it's a batch process, with blocks of WUs being released in clumps, rather than a steady flow.
I'm also seeing great variability in download speeds. Sometimes no progress is possible at all, sometimes they crawl down at 5 or 10 KB/sec, and sometimes they come through at up to 100 KB/sec.
It led me to wonder whether the intermediate datafile storage server had difficulty coping with both high-speed data insertion and high-speed downloading at the same time.
For the first time in days I
)
For the first time in days I had now normal download speeds. Last week it was arround 8-15kB/s.
I think we are really staying
)
I think we are really staying near a moment when users machines will be able to get the whole beam at once by downloading large data file with raw telescope data, crunch it from the beginnig to the end completely (including pre-processing and post-processing), and return back to server all the data in a post processed state. There are a lot of modern users machines that are able to do this job already today. They alread have rather good internet connection and are able to work with huge databases right in big enough PCs memory. So, it will be great if we will have an opportunity to crunch BIG WUs on sufficient machines. BOINC already can return machine capacity information to server so it can decide whether it is possible for that machine to do BIG job at once.
I'm going into backoff for
)
I'm going into backoff for nearly all of my downloads (doesn't matter what kind of job (CPU only)). DSL, constant internet presence. (5 machines are showing this activity)
I'm seeing five different
)
I'm seeing five different behaviours:
1) Failure to connect at all to the download server
2) Connect, but an almost immediate (<20 seconds) disconnect
3) Connect, but around 5 minutes with no data flow before timeout and disconnect
4) Connect and data, but a low sustained transfer rate (<10 KB/sec)
5) Connect and data, with a normal/fast transfer rate (~100 KB/sec is 'normal' here)
Modes (1)-(4) are all too familiar from that other big CUDA project, SETI@Home, and seem to be symptomatic of an overloaded download link/router/server.
The data rates here are prodigious - BRP4 requires roughly four times as much data per unit of computation time, as even the fastest 'shorty' SETI work. And with SETI being largely down at the moment, a lot of that data demand will have transferred here.
It would be interesting to know if Einstein/AEI has any publicly-accessible network monitors like SETI's Cricket graphs. Even if there's nothing available to the public, it might be worth the project's network support staff checking their internal tools, and seeing if the download system is best tuned to support the sort of volume we're seeing now - those five-minute timeouts must waste a lot of socket memory, for example.
Is there a possibility of
)
Is there a possibility of using a different compression algorithm for BRP4? Would using LZMA2 for example reduce the size of each task some? I am not sure what the existing compression algorithm is. With each task being 32MB and GPUs being able to process these tasks in as little as 20-minutes, the transfer requirements become significant. This is even more so the case when multiple GPUs are running.
RE: Is there a possibility
)
well, there are several projects which use 7z http://www.7-zip.org/ to compress data.
RE: RE: Is there a
)
and applying 7-zip to a random data file reduces its size from 4,098 KB to 3,979 KB - about 3% compression.
I do think the project might have thought of that one, if it was going to be any significant use ;-)