I tried the suspend/resume on a stuck Nvidia task and it didn’t really work. had to abort the task in BOINC, then manually kill the running process. (Aborting doesn’t take, shows aborted but the percentage stays low and doesn’t jump to 100% like it should, the process is still hung on the GPU, and a new process won’t start)
Uploads should work again since yesterday, and adjustments were made to the validator, and all tasks that were "validate errors" checked again. Most of these should be "valid" (or at least "invalid" or "inconclusive") now.
has anyone run with cuda-5.5 application any tasks that have "segment_4" in their name? Did any of those tasks succeed?
The segment_4 tasks have smaller padding factor than the previous ones (down from 3.0 to 1.5) and run time is significantly lower as fft_size has gone down (relatively from 4 to 2.5).
My "anonymous platform" CUDA 11.7 app runs them, but does not validate against NVIDIA OpenCL nor ATI OpenCL.
p.s. I got rid of the random lockup (caused by Trap 11) by increasing memory allocation sizes in the source code. (demod_binary.c). EDIT: My stuck tasks were segment_3 and _4 -tasks.
All of the tasks that I have run that had the random lockup are segment_4 tasks.
I don't really have a large enough sample size to say anything definitively, but anecdotally, I didn't start to see the lockups until I started running 2x tasks. 27 tasks ran singly without locking up. I just switched back to 1x tasks, and I'll let my current queue finish up at 1x.
*Edit* I just had a stuck task at 1x, so running 2x or 1x was just a random observation.
Similar to other Einstein apps, the Radeon VII is still the most performant AMD GPU.
has anyone run with cuda-5.5 application any tasks that have "segment_4" in their name? Did any of those tasks succeed?
The segment_4 tasks have smaller padding factor than the previous ones (down from 3.0 to 1.5) and run time is significantly lower as fft_size has gone down (relatively from 4 to 2.5).
My "anonymous platform" CUDA 11.7 app runs them, but does not validate against NVIDIA OpenCL nor ATI OpenCL.
p.s. I got rid of the random lockup (caused by Trap 11) by increasing memory allocation sizes in the source code. (demod_binary.c). EDIT: My stuck tasks were segment_3 and _4 -tasks.
The "official" CUDA 5.5 app is limited to "old" WUs (segment 2+3) to finish these first.
Which memory sizes did you increase, and by how much? (PM welcome)
i had to stop crunching and
)
i had to stop crunching and Abort all my meerkat tasks. Something happened when power went down. Everything running well now.
I tried the suspend/resume on
)
I tried the suspend/resume on a stuck Nvidia task and it didn’t really work. had to abort the task in BOINC, then manually kill the running process. (Aborting doesn’t take, shows aborted but the percentage stays low and doesn’t jump to 100% like it should, the process is still hung on the GPU, and a new process won’t start)
_________________________________________________________________________
Out of 7 WUs I had, 4 have
)
Out of 7 WUs I had, 4 have validate error, 1 with error while computing and only 2 Completed and validated.
Tom M wrote: cecht
)
Hmmm. That doesn't always work. I just had to abort a task that continued to hang after two resets.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
Uploads should work again
)
Uploads should work again since yesterday, and adjustments were made to the validator, and all tasks that were "validate errors" checked again. Most of these should be "valid" (or at least "invalid" or "inconclusive") now.
BM
About the "stuck" tasks: is
)
About the "stuck" tasks: is there a difference between "new" tasks (they do have a "segment_4" in the name) or older ones ("segment" 3 or 2)?
BM
Hi,has anyone run with
)
Hi,
has anyone run with cuda-5.5 application any tasks that have "segment_4" in their name? Did any of those tasks succeed?
The segment_4 tasks have smaller padding factor than the previous ones (down from 3.0 to 1.5) and run time is significantly lower as fft_size has gone down (relatively from 4 to 2.5).
My "anonymous platform" CUDA 11.7 app runs them, but does not validate against NVIDIA OpenCL nor ATI OpenCL.
p.s. I got rid of the random lockup (caused by Trap 11) by increasing memory allocation sizes in the source code. (demod_binary.c). EDIT: My stuck tasks were segment_3 and _4 -tasks.
--
petri33
Both the v0.05 OpenCL nvidia
)
Both the v0.05 OpenCL nvidia app and the cuda55 app on tasks that got "stuck" show this in their stderr.txt:
https://einsteinathome.org/task/1341636743
mostly segment 3's, but i have a few 4's in there also. but I have not run many BRP7 tasks for a few days.
great insights, Petri!
_________________________________________________________________________
All of the tasks that I have
)
All of the tasks that I have run that had the random lockup are segment_4 tasks.
I don't really have a large enough sample size to say anything definitively, but anecdotally, I didn't start to see the lockups until I started running 2x tasks. 27 tasks ran singly without locking up. I just switched back to 1x tasks, and I'll let my current queue finish up at 1x.
*Edit* I just had a stuck task at 1x, so running 2x or 1x was just a random observation.
Similar to other Einstein apps, the Radeon VII is still the most performant AMD GPU.
*Edit2*
Hardware running beta tasks.
CPU - Threadripper 3960X
GPUs - 6900XT and Radeon VII
OS - Arch Linux | kernel 5.19.4 | ROCm 5.2.3
petri33 wrote: has anyone
)
The "official" CUDA 5.5 app is limited to "old" WUs (segment 2+3) to finish these first.
Which memory sizes did you increase, and by how much? (PM welcome)
BM