Guys, we think we found the culprit causing the signal 6 errors on Sonoma, finally!
The details can be found here but it boils down to you having to install BOINC 7.24.3 to fix the permission issue. If a simple update doesn't help you might want to uninstall and fully reinstall BOINC to make sure.
The good thing is that investigating the issue also allowed us to improve the numerical stability of our app such that it should validate much better :)
We're going to release a new version (2.08) without the debugging stuff soon.
Thanks for your patience! This one took a while to figure out because we couldn't reproduce the error in our labs.
If you look back closely the error didn't change. We just augmented the output which gave us the needed clue.
Please note that the app isn't yet at its full potential since we're not done yet with porting all of the data analysis code. So stay tuned for more :)
If you look back closely the error didn't change. We just augmented the output which gave us the needed clue.
Please note that the app isn't yet at its full potential since we're not done yet with porting all of the data analysis code. So stay tuned for more :)
Oliver
will you be porting the BRP7 and O3AS applications for use on Apple Silicon GPUs?
In BRP7 double precision is required only for a small part that may be possible to isolate and do on the CPU, similar to what we did with FGRP. But that would require additional work and the ultimate benefit is unclear. Not sure whether we'll do that at all and when we'll get to that.
Something is weird about how BOINC and this app react together. My M1 Mac mini downloaded almost a thousand of these tasks, when my preferences were set to a .1 day of cache. I was running this single app on Einstein, and no others on any other project. My machine is an M1 Mac mini with 4p/4e, and so I am running BOINC at 50% of the CPU threads, so that BOINC CPU tasks run only on the P cores. BOINC says that a task allocates 1 full CPU thread plus the 1 full GPU. Fair enough, even thought all other project set the CPU usage at something less than a full CPU core. But maybe that is not relevant. I am guessing that BOINC thinks it is a CPU task, and keeps asking for more tasks, because the other three CPU threads are idle? In any case, it is not acting like a normal GPU task, where if the GPU is fully loaded, it does not ask for more tasks beyond the cache setting (.1 days in this case).
BOINC says that a task allocates 1 full CPU thread plus the 1 full GPU. Fair enough, even thought all other project set the CPU usage at something less than a full CPU core.
That's a deliberate choice by us, not BOINC. The reason is that this app isn't yet fully done in that it still runs significant parts on the CPU. We want to prevent that BOINC is over-scheduling your CPU cores by running more than one app on it. That would starve the GPU parts of it, skewing the performance figures we get back.
On the subject of ignoring the cache setting, I have a few ideas. We'll look into it.
I set the GPU App to require only half a core. Feedback is welcome.
Regarding the overshoot of tasks, this is an issue with BOINC's runtime estimation, which is difficult to handle. If you're not running any other App of this project on the machine, it should adjust itself after running some dozens of these tasks. I'll see what I can do to mitigate that on the server side, too. Regarding server side management these processors' behavior is pretty new, and the app is also still under development.
Guys, we think we found the
)
Guys, we think we found the culprit causing the signal 6 errors on Sonoma, finally!
The details can be found here but it boils down to you having to install BOINC 7.24.3 to fix the permission issue. If a simple update doesn't help you might want to uninstall and fully reinstall BOINC to make sure.
The good thing is that investigating the issue also allowed us to improve the numerical stability of our app such that it should validate much better :)
We're going to release a new version (2.08) without the debugging stuff soon.
Thanks for your patience! This one took a while to figure out because we couldn't reproduce the error in our labs.
Cheers,
Oliver
Einstein@Home Project
I'd noticed that 2.07 was
)
I'd noticed that 2.07 was showing a different error - the permission error - so I reinstalled 7.24.3 and it looks like the jobs are running now.
Thank you Oliver and team for your efforts, I'm glad I can run native ARM code and Mac GPU tasks here!
-Steve
If you look back closely the
)
If you look back closely the error didn't change. We just augmented the output which gave us the needed clue.
Please note that the app isn't yet at its full potential since we're not done yet with porting all of the data analysis code. So stay tuned for more :)
Oliver
Einstein@Home Project
Oliver Behnke wrote: If you
)
will you be porting the BRP7 and O3AS applications for use on Apple Silicon GPUs?
_________________________________________________________________________
Both of these require double
)
Both of these require double precision, which the Apple M GPU doesn't support.
BM
oh ok. thanks. wasnt sure if
)
oh ok. thanks. wasnt sure if the GPU lacked it completely or not. but that explains it.
_________________________________________________________________________
In BRP7 double precision is
)
In BRP7 double precision is required only for a small part that may be possible to isolate and do on the CPU, similar to what we did with FGRP. But that would require additional work and the ultimate benefit is unclear. Not sure whether we'll do that at all and when we'll get to that.
BM
Something is weird about how
)
Something is weird about how BOINC and this app react together. My M1 Mac mini downloaded almost a thousand of these tasks, when my preferences were set to a .1 day of cache. I was running this single app on Einstein, and no others on any other project. My machine is an M1 Mac mini with 4p/4e, and so I am running BOINC at 50% of the CPU threads, so that BOINC CPU tasks run only on the P cores. BOINC says that a task allocates 1 full CPU thread plus the 1 full GPU. Fair enough, even thought all other project set the CPU usage at something less than a full CPU core. But maybe that is not relevant. I am guessing that BOINC thinks it is a CPU task, and keeps asking for more tasks, because the other three CPU threads are idle? In any case, it is not acting like a normal GPU task, where if the GPU is fully loaded, it does not ask for more tasks beyond the cache setting (.1 days in this case).
Reno, NV Team: SETI.USA
zombie67 [MM wrote:]BOINC
)
That's a deliberate choice by us, not BOINC. The reason is that this app isn't yet fully done in that it still runs significant parts on the CPU. We want to prevent that BOINC is over-scheduling your CPU cores by running more than one app on it. That would starve the GPU parts of it, skewing the performance figures we get back.
On the subject of ignoring the cache setting, I have a few ideas. We'll look into it.
Oliver
Einstein@Home Project
I set the GPU App to require
)
I set the GPU App to require only half a core. Feedback is welcome.
Regarding the overshoot of tasks, this is an issue with BOINC's runtime estimation, which is difficult to handle. If you're not running any other App of this project on the machine, it should adjust itself after running some dozens of these tasks. I'll see what I can do to mitigate that on the server side, too. Regarding server side management these processors' behavior is pretty new, and the app is also still under development.
BM