FGRPB1G GPU work for both NVIDIA and AMD GPUs not available

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1588772396
RAC: 758854
Topic 218299

Any thoughts on why GPU work is not being available?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18716511513
RAC: 6376418

Maybe no staff around this

Maybe no staff around this weekend to load more work to the FGRPB1G splitters?  I believe I read somewhere else in another forum thread that the percent done is not a reflection on the actual amount of work left.  That is in a post worrying about the project status page.  And someone, Bernd? posted that the data coming off the sky surveys is continuously being added to the splitters.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117554486704
RAC: 35332204

Betreger wrote:Any thoughts

Betreger wrote:
Any thoughts on why GPU work is not being available?

There are a finite number of tasks associated with a given data file.  The current file was LATeah1046L.dat which was first issued around late 27th Feb.  These files tend to 'last' around 4 days which makes a new file pretty much 'due' right around now.

My guess is that all primary tasks for LATeah1046L.dat have been issued and whatever procedure is in place to transition to the next data file simply hasn't happened - for whatever reason.

I'd be extremely surprised if it's anything to do with completely running out of data.  As Keith mentions, new data from the large area telescope (LAT) on board the Fermi satellite continues to be available so that the stats on the server status page to do with availability of work for this particular search can never reflect the true state of affairs.  As they say, "It ain't over till the fat lady sings", and I haven't heard a peep out of her yet :-).

Worst case scenario - we might have to wait until Monday for this to be fixed.  Best case scenario - new work might be stuck in some pipeline and some kind soul might intervene to clear the blockage.  There hasn't been any sign of abnormal usage that I know of and the Staff do try to make sure that available work will outlast the weekend so maybe this could get fixed relatively quickly.

 

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

I think a more interesting

I think a more interesting question, which I have not seen addressed recently, is what happens to O1OD1 in 22.9 days?

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1588772396
RAC: 758854

Gary I hope you are correct,

Gary I hope you are correct, with Seti being broken and this I had to attach to GPUGRID in order to keep busy. 

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3394076540
RAC: 2837909

MilkyWay is also down hard.

MilkyWay is also down hard. Crazy weekend if 3 major GPU projects are down at once.

Cherokee150
Cherokee150
Joined: 13 May 11
Posts: 24
Credit: 884050164
RAC: 340914

With multiple BOINC projects

With multiple BOINC projects down at the same time, the probability that they all have a common cause is extremely high.  Since the only thing all projects have in common is BOINC, the problem very likely comes from the BOINC system itself.  The message I believe provides the best clue is in the message I am getting from SETI.  At the end of each attempt to contact the server I get the following message:

3/3/2019 0:32:18 | SETI@home | [error] No scheduler URLs found in master file

I have seen this message before.  It was a number of years ago, so I don't remember all the details, but I do believe it refers to a master list of send and receive addresses that BOINC uses to route all communications to and from each project.  As I recall, the BOINC client software refreshes this from the BOINC host periodically.  I believe it is once every 24 hours.  If the BOINC host does not respond, the BOINC client, under its current logic, does not proceed until it can get the new list.  It then sets a timer before trying again.

This would explain why, one by one, multiple projects are going down.  It also seems plausible because the BOINC project, servers and software are basically run by the SETI people at Berkeley.  SETI was the first to go down.  It is most likely, therefore, that a problem at Berkeley, whether hardware or software, is affecting both the BOINC and SETI servers.

Unless someone like Kittyman or one of the other people with close connections with the Berkeley staff are lucky enough to get a quick answer from them during this major disaster, I guess we will have to sit back and wait until things get back up before we learn the whole story.

If anyone has more input into this theory, please let us know as soon as possible.

Thanks!

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

I think you’ll find the

I think you’ll find the master url is per project. That way each project can manage their own. There is a url used to check connectivity when comms fail, it defaults to google.com. You can disable it or use a different URL via cc_config.

There is a separate project list (all projects) which is maintained at boinc.berkeley.edu, but not being able to download it would just mean it can’t update until next time. That is the BOINC server, not the Seti servers. The BOINC message boards are still working, suggesting there is no issues with their server.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Also, MilkyWay has been

Also, MilkyWay has been falling apart for weeks.  I think they have hit rock bottom.  Most BOINC projects (including GPUGrid) are still up.

mikey
mikey
Joined: 22 Jan 05
Posts: 12681
Credit: 1839085411
RAC: 3872

Cherokee150 wrote:With

Cherokee150 wrote:

With multiple BOINC projects down at the same time, the probability that they all have a common cause is extremely high.  Since the only thing all projects have in common is BOINC, the problem very likely comes from the BOINC system itself.  The message I believe provides the best clue is in the message I am getting from SETI.  At the end of each attempt to contact the server I get the following message:

3/3/2019 0:32:18 | SETI@home | [error] No scheduler URLs found in master file

I have seen this message before.  It was a number of years ago, so I don't remember all the details, but I do believe it refers to a master list of send and receive addresses that BOINC uses to route all communications to and from each project.  As I recall, the BOINC client software refreshes this from the BOINC host periodically.  I believe it is once every 24 hours.  If the BOINC host does not respond, the BOINC client, under its current logic, does not proceed until it can get the new list.  It then sets a timer before trying again.

This would explain why, one by one, multiple projects are going down.  It also seems plausible because the BOINC project, servers and software are basically run by the SETI people at Berkeley.  SETI was the first to go down.  It is most likely, therefore, that a problem at Berkeley, whether hardware or software, is affecting both the BOINC and SETI servers.

Unless someone like Kittyman or one of the other people with close connections with the Berkeley staff are lucky enough to get a quick answer from them during this major disaster, I guess we will have to sit back and wait until things get back up before we learn the whole story.

If anyone has more input into this theory, please let us know as soon as possible.

Thanks!

Primegrid has tons of gpu work available, as does Collatz, Amicable Numbers, GpuGrid and Moo Wrapper.

Primegrid: http://www.primegrid.com/

Collatz: https://boinc.thesonntags.com/collatz/

Collatz units pay the most credits of any gpu project if you use the optimization codes listed in the Number Crunching forum

Amicable Numbers: https://sech.me/boinc/Amicable/

GpuGrid: http://www.gpugrid.net/

IF you choose GpuGrid keep the cache very low as they do give bonus credits if you return the units within a pretty short deadline, it's explained on the website, and they ONLY accept higher end gpu's.

MoowWrapper: https://moowrap.net/

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3394076540
RAC: 2837909

MW: Server is down hard and

MW: Server is down hard and site is not responding. Site was always available before when the db was not accessible.

SETI: 2nd Download server hasn't been working right since Tuesday maint. Site is back up today with DLing improvements

E@H: No GPU work.

Not related at all.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.