new units not downloading

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

This may be at least partly a

This may be at least partly a screw-up on my side.

The "new" S4 data files are named l1_XXXX.X and h1_XXXX.X, in contrast to the "old" files which are named L1_XXXX.X and H1_XXXX.X.

Unfortunately I had not realized that on Win32, file names are case-insensitive.
So there may be some issues in the next few days if workunits which are supposed to use the file H1_0400.0 (which has a particular size and checksum) try to instead use the file h1_0400.0 (which has a DIFFERENT size and checksum).

Meanwhile, I'll see what I can do on the server side to ameliorate this issue.

Bruce

Director, Einstein@Home

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

RE: The "download looping"

Message 13517 in response to message 13515

Quote:
The "download looping" problem is in boinc 4.43 ...

4.19 here

... and it's still happening, on a different PC now, while it loops it needs most CPU power.

ABT Chuck P
ABT Chuck P
Joined: 9 Feb 05
Posts: 20
Credit: 363204
RAC: 0

RE: This may be at least

Message 13518 in response to message 13516

Quote:

This may be at least partly a screw-up on my side.

Bruce


==============
Whew, thought I was looking at Boinc Seti for a few minutes. Had 9 errors (7 DL and 2 computing) on ID 11073.


Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

What about deleting all the

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68962
RAC: 0

RE: What about deleting all

Message 13520 in response to message 13519

Quote:

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.

would this waste work that has already been done (even work that has been returned) on those wu?

~~gravywavy

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68962
RAC: 0

RE: Unfortunately I had

Message 13521 in response to message 13516

Quote:

Unfortunately I had not realized that on Win32, file names are case-insensitive.

yes, when writing a cross-platform system, it is safest to use only lower case, (or only upper case !?) throughout. Maybe the BOINC developers community should add this requirement to the policy on filenames across all BOINC projects, which would reduce the chances of similar errors in future.

It is not fair to expect developers with a single-OS background to know all the cross-platform pitfalls and policies can help with that.

All versions of DOS & Win have been case insensitive, but then so too were many mainframe OS's. Sooner or later someone is going to put BOINC on a platform with some other case-insensitive filing system, so whle Win makes the issue urgent here, this is one that would eventually have wanted sorting out anyway.

~~gravywavy

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

RE: RE: What about

Message 13522 in response to message 13520

Quote:
Quote:

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.

would this waste work that has already been done (even work that has been returned) on those wu?

The current situation does the same, some of my team already did report lost WUs after the restart and it happened to me too.

Maybe it would help to remove the H1 and h1 ones for some time, later reissue only the h1 ones there and later (much later) reissue the H1 ones.

Those endless loops are very much a waste of CPU cycles too, the CPUs are heavily loaded mostly with the download, my system had a permanent high load on BOINC (not on the project client) and BOINC does not run with low priority. Not much CPU power left for any project client and (that's worst) for me.

If that happens on a production system where BOINC should stay in background, the users and admins of those systems might become really mad.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

After some discussions with

After some discussions with David Anderson, I've taken the simple way out. I've cancelled the workunits with names that start "h1_" (NOTE: this is case sensitive, work starting "H1_" is NOT cancelled).

I've also removed the problematic h1_XXXX.X data files from the download servers. After these changes propagate to the data server mirrors (15 to 30 minutes) this should generate hard download errors for any client that attempts these WU.

I'll rename the workunits and files using "w1" (w for Washington state, where the Hanford detector is located) and reissue them.

Apologies to everyone for this fiasco. It's my fault. Hopefully we can recover quickly.

Please feel free to manually abort any h1_ workunits. My apologies for wasted CPU cycles. Fortunately these workunits have only been out there for a half-day so this shouldn't be too severe.

Bruce

Director, Einstein@Home

Ulrich Metzner
Ulrich Metzner
Joined: 22 Jan 05
Posts: 113
Credit: 963370
RAC: 0

RE: ...Please feel free to

Message 13524 in response to message 13523

Quote:

...Please feel free to manually abort any h1_ workunits. My apologies for wasted CPU cycles. Fortunately these workunits have only been out there for a half-day so this shouldn't be too severe.

Bruce


Thank you for handling this issue so quickly :)
A project reset (I only have h1_... left) should do the trick, right?

Aloha, Uli

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Any chance to reset the

Any chance to reset the "daily quota" things too for today?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.