BRP4G-cuda55 topics

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056724931
RAC: 1603074
Topic 201314

With the impending end of BRP6 (Parkes) work availability many of us who have worked exclusively with the cuda55 application are faced with running BRP4G as an only-available work type pending the hope-for GPU version of the Gamma-Ray pulsar search.

Until today, BRP4G was only available to nvidia users in cuda32 form, but apparently sparked by Mumak's inquiry, Bernd announced release of a cuda55 variant.

Sadly, multiple users have seen similar failures, often in the first few seconds.  A common symptom has the web page for the task showing "-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION".

This problem has occurred on multiple hosts with different versions of Windows, different nvidia driver versions, and different models of GPU card.

example error on host 11368689
example error on host   3409259
example error on host 12421865
example error on host 12254311
example error on host 12318766

Despite the quite recent initial release, it is actually rather easy to find examples of the syndrome.  While some tasks appear to have run time far longer than just a few seconds, I harbor a suspicion that just reflects a longer time before the user responded to a Windows error notification, as the stderr does not log any useful activity in the cases I have reviewed.

Possibly in response to these observations, or perhaps based on other data available at the mother ship, Bernd announced withdrawal of the cuda55 BRP4G variant from service pending analysis. 

Why bother with this thread?  I thought people interested in the potential availability and status of a cuda55 form of the BRP4G might find it useful.  

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4751
Credit: 17677609250
RAC: 5769806

I too was encouraged by the

I too was encouraged by the release of the CUDA55 variant last night.  Unfortunately, I have already returned one error of the -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION type. Still have many queued up.  Question is to abort them or let them run for analysis by the developers.

example error on host 3967953

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056724931
RAC: 1603074

Keith Myers wrote:Question is

Keith Myers wrote:
Question is to abort them or let them run for analysis by the developers.

There is so very little information in stderr for this particular situation, and it seems so uniform, that I doubt adding more instances from the same machine to the pile would help.  

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4751
Credit: 17677609250
RAC: 5769806

As has been pointed out to me

As has been pointed out to me many times over at SETI; the stderr.txt file that gets sent back to the project is just a "friendly name" file and NOT the science result.  The science result is a completely different file and is not normally viewable by the public.  The file is the one that the scientists need to troubleshoot what went wrong.  That file is only available for examination while it is still in the slot directory the task was run in and NOT already uploaded to the project.  That makes it hard to find and examine by the user unless you know what you are doing and quick to find it and copy it to another directory before it gets deleted. At least that is how it works at SETI, I am assuming the same conditions apply here at Einstein.

 

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3317052001
RAC: 1171554

It seems that only the x64

It seems that only the x64 version of BRP4G-CUDA55 (v1.56) had a problem.
I have completed several v1.57 on a XP32 machine and was surprised by the speed-up on a 750 Ti: from 2600s for cuda32 to 2000s (running 1 WU).

-----

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1433399146
RAC: 595618

My GTX660 is doing 48 BRP4G

My GTX660 is doing 48 BRP4G betas a day for obviously 48000 credit, it was doing 18 BRP6 betas a day for 79200 credit per day. Is this what is intended? 

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232277016
RAC: 186736

My 1070 is doing 96 a day (3x

My 1070 is doing 96 a day (3x every 45min). I think the 6G WUs were more PPD but so goes it.

Trotador
Trotador
Joined: 2 May 13
Posts: 58
Credit: 2122643213
RAC: 113

mmonnin wrote:My 1070 is

mmonnin wrote:

My 1070 is doing 96 a day (3x every 45min). I think the 6G WUs were more PPD but so goes it.

My HD7950 is making four WUs of these every hour in Linux so it does not look bad compared to your GTX1070.

My GTX 660Ti is finishing 1 every 30 minutes as it  kicks in as backup of GPUGRID.

Just info.

 

Gaurav Khanna
Gaurav Khanna
Joined: 8 Nov 04
Posts: 42
Credit: 29431147596
RAC: 0

Not quite a CUDA topic; but

Not quite a CUDA topic; but close enough. The OpenCL BRP4G app is giving transfer buffer errors on a Mac Pro. This looks vaguely familiar from a few years back (Oliver?)

https://einsteinathome.org/host/12303547/tasks

 [20:40:58][23994][ERROR] Error in OpenCL context: OpenCL Error : Error loading transfer buffer

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.