Bernd, Bruce, Gary, Mike, a question for you!

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5849

Credit: 110015594366

RAC: 23235213

RE: what you prefer

28 Jun 2006 9:16:26 UTC

Message 40787 in response to message 40786

(moderation:

)

Quote:

what you prefer getting BSOD on one 1hour WU, or getting it on 18h WU ...

Actually, it shouldn't make any difference. If the machine crashes with the progress at 17hr 55min on an 18 hour result, when you reboot it and restart BOINC, it picks up the last checkpoint and then continues on from virtually where it left off. I've seen it happen many times. At most you lose half the checkpoint interval on average.

Now it is possible that the machine might die in the middle of writing a checkpoint so that the checkpoint is corrupt. I imagine it would be possible to move a previously written checkpoint to a backup status until the new checkpoint is successfully written so even that situation could be protected. I have no idea if BOINC is that cunning in its checkpointing procedure.

One day I had a power failure in a room where 30 boxes were running. On the law of averages, I figured some would probably have corrupted something at the point of power failure. On restarting, I was pleasantly surprised to see that none of them actually restarted from zero. 100% of them started from a saved checkpoint.

Cheers,
Gary.

LucaB76 - BOINC...

Joined: 16 Jan 06

Posts: 14

Credit: 754232

RAC: 0

RE: Back to the original

28 Jun 2006 17:04:39 UTC

Message 40788 in response to message 40784

(moderation:

)

Quote:

Back to the original question:
Roughly speaking anything you see in your cache (number of Tasks, movement etc.) would be the same in the database, multiplied by the number of users (or actually CPUs). If you cut the current WUs in a half, you have twice the number of results the database needs to keep track of. The database size is still our limiting factor, we're currently running a server with 24GB main memory and it's already tight.
BM

You're right!
I didn't realize that the situation on the server side were so tight!
Now I understand that in this scenario every attempt to split wus in smaller size is unfeasible.
I hope things get better soon!
And in the meanwhile... I'll keep to crunch!
Thanks Bernd!
Greetings from a hot (34°) Italy!

LucaB76 - BOINC...

Joined: 16 Jan 06

Posts: 14

Credit: 754232

RAC: 0

RE: I use the official

28 Jun 2006 17:15:38 UTC

Message 40789 in response to message 40782

(moderation:

)

Quote:

I use the official manager. What I do (with BOINC stopped) is edit client_state.xml to make both long term debts (LTD) to be zero and the short term debt (STD) for EAH to be +20 (seconds) and for Seti to be -20. STD controls what project will run if a decision has to be made. They have to balance to zero.[CUT]With my way the affinity seems to survive quite happily through many consecutive results. The machine is right at my desk so I just check it once in the morning and once in the evening (if I even remember). At the moment (I've just checked it now) there is still one project per cpu and this has been going on for several days without interference from me. Of course it would be a bit tedious if the machine isn't running 24/7 :).

Gary, I've given a try at your method!
It works well on an unattended machine that runs 24/7, as you said! Last night I've followed your instructions and the pc is still running well with two different projects on two different thread!
On my desktop pc, instead, I've got to suspend/resume a "victim" wu every time I restart the Manager each morning! Ok, this is not a big work, but I hope to remember to perform this operation each time to avoid bad results.
Thanks again for this useful trick!

J Langley

Joined: 30 Dec 05

Posts: 50

Credit: 58338

RAC: 0

RE: The database size is

28 Jun 2006 18:21:00 UTC

Message 40790 in response to message 40784

(moderation:

)

Quote:

The database size is still our limiting factor, we're currently running a server with 24GB main memory and it's already tight.

BM

But what has server memory got to do with database size? The server I work on during the day (a lot fewer users than E@H I admit) only has 8G RAM, but the database is stored on the 300G of hard disk space. Surely you never want to hold the entire database in RAM? Typically you only need enough RAM to hold the results of your largest query / data for your largest update, plus the overhead of the OS and applications running on the server...

On other hand, couldn't the deadlines be increased instead? If I had 4 weeks to complete a 10 hour WU, rather than 2 weeks, that would make my machine much less likely to miss a deadline.

J Langley

Joined: 30 Dec 05

Posts: 50

Credit: 58338

RAC: 0

RE: Your BOINC Manager

28 Jun 2006 18:25:08 UTC

Message 40791 in response to message 40781

(moderation:

)

Quote:

Your BOINC Manager should be able to deal with all that, without intervention: when it notices that a WU is at risk of missing the deadline, given the computer's up-time and the project's resource share, it will preÃ«mpt (or refuse) other work to make sure the deadline is met. This may put it into â€˜panic modeâ€™ for a while, but once it's become accustomed to the larger WU sizes itâ€™ll avoid overfilling its cache.

But if your computer doesn't run 24x7, and isn't net connected 24x7, you need to keep an eye on projects with large WUs and short deadlines. For example, if I know I'm only going to have my computer on for 7 hours in the next 2 weeks, or I won't be connecting to the net in the second week, I have to make sure I don't load a WU that will take more than 7 hours to complete, or will finish sometime in the first week, otherwise the WU misses the deadline and my crunchtime is wasted. With smaller WUs, if I don't keep an eye on things, I could still waste crunchtime but it would be much less.

detached

Joined: 15 Jun 06

Posts: 3

Credit: 11407

RAC: 0

RE: RE: The database size

28 Jun 2006 19:11:48 UTC

Message 40792 in response to message 40790

(moderation:

)

Quote:

Quote:
The database size is still our limiting factor, we're currently running a server with 24GB main memory and it's already tight.

BM

But what has server memory got to do with database size? The server I work on during the day (a lot fewer users than E@H I admit) only has 8G RAM, but the database is stored on the 300G of hard disk space. Surely you never want to hold the entire database in RAM? Typically you only need enough RAM to hold the results of your largest query / data for your largest update, plus the overhead of the OS and applications running on the server...

On other hand, couldn't the deadlines be increased instead? If I had 4 weeks to complete a 10 hour WU, rather than 2 weeks, that would make my machine much less likely to miss a deadline.

But if you increase the deadline-time, other users need to wait more time for an WU-result, other users whoose have a smaller cache could do this work and the result could be there faster ... a deadline-time of 4 weeks means that in worst case, you need to wait more than a month for your results and credits.
I think, 2 weeks are enougth, propably people like him:
http://einsteinathome.org/host/519135/tasks
would watch for their S5-WUs.
I aborted some S5-WUs as i realized, that they need a lot of time (my pc doesn't work 24/7) ... nobody has to wait for me ...
Why we don't use smaller WUs ... that would stop several discussions and the deadline-problem would be fixed ...
The new credit-system is ok.

greets!

dark-enforcer

EDIT: sorry for my bad english

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 386400

RAC: 1241

RE: On other hand,

28 Jun 2006 19:50:52 UTC

Message 40793 in response to message 40790

(moderation:

)

Quote:

On other hand, couldn't the deadlines be increased instead? If I had 4 weeks to complete a 10 hour WU, rather than 2 weeks, that would make my machine much less likely to miss a deadline.

Nice to se creative suggestions, but unfortunate this would make some problems.
If we lengthening the deadline with 2 weeks, all results for WUâ€™s that will time out, will take up space in the database for an extra 2 weeks. We had a lot of discussions about this then we were pleading for an extension of the deadline from 7 to 14 days.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

J Langley

Joined: 30 Dec 05

Posts: 50

Credit: 58338

RAC: 0

RE: If we lengthening the

28 Jun 2006 22:00:08 UTC

Message 40794 in response to message 40793

(moderation:

)

Quote:

If we lengthening the deadline with 2 weeks, all results for WUâ€™s that will time out, will take up space in the database for an extra 2 weeks. We had a lot of discussions about this then we were pleading for an extension of the deadline from 7 to 14 days.

True. But I would hope that only a very small percentage of WUs would take longer than 2 weeks (whereas chaging the WU size would affect 100% of the WUs in the database), so the impact should be much less. I know a missed deadline doesn't affect the project greatly (since they can simply hand the WU out again), and is mainly an issue for crunchers.

Another idea (obviously not implementable with the the current BOINC client), but since there are small WUs for Einstein, it would be nice if crunchers could set a preference for their client to be given small WUs rather than large ones. Fast crunchers could crunch big WUs and claim high credit, slower crunchers could go for small WUs and less credit, and the deadlines and database would be unaffected.

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 386400

RAC: 1241

As long as the database is

28 Jun 2006 23:27:34 UTC

Message 40795

(moderation:

)

As long as the database is the limiting factor for this project, we would have to suggest other ways to minimize the number of entries in the database, if we want shorter results.

Then the S4 run is over there will hopefully be less entries in the database as all hosts will be running longer results. The change from replication of 3 to 2 will also shortening the time entries is in the database.

Bruce wrote:

Quote:

â€?There are two types of workunits: short and long. The short workunits have XXXX.X less than or equal to 0400.0.

There are also two types of data files: short and long. The short data files (l1_XXXX.X) are from the LIGO Livingston Observatory, and are about 4.5MB in size. The long data files (h1_XXXX.X) are from LIGO Hanford and are about 16MB in size. Note: once your computer downloads one of these data files, it should be able to do many workunits for that same file.â€?

First time the host asks for work it gets assigned to a data file, and will get work for that data file as long as there is work left, for that particular data file. If there is no more work, the host gets assigned to a new data file, and so on. So whatâ€™s needed is a subroutine on the scheduler that determents what data file the host should be assigned to. The subroutine only needs to run then a new data file is needed. We also need sensible rules for choosing what data files.

To help modem users.
In project specific preferences, ad a question: do you have a fast internet connection? (default YES)
If YES, you download the 16MB data file, if NO you download the 4.5MB data file.

To help slow hosts.
If â€œMeasured integer speedâ€? > 1000 AND â€?% of time BOINC client is runningâ€? > 80 then host should get long results.

If â€œMeasured integer speedâ€? 3500 And â€?Number of CPUsâ€? > 1 then host should get extra long results.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245267071

RAC: 12450

RE: RE: The database size

29 Jun 2006 2:47:12 UTC

Message 40796 in response to message 40790

(moderation:

)

Quote:

Quote:
The database size is still our limiting factor, we're currently running a server with 24GB main memory and it's already tight.

But what has server memory got to do with database size? The server I work on during the day (a lot fewer users than E@H I admit) only has 8G RAM, but the database is stored on the 300G of hard disk space. Surely you never want to hold the entire database in RAM?

Unfortunately we found we need to (at least the largest tables), to keep the server responsive. Having not much experience with DBs of this size I'm inclined to push the blame to MySQL, which BOINC is currently bound to. It might get better with other DBS, but you would also need to change BOINC code for that (David Anderson expects some 50 lines). No one of us or another project I know of has the time to actually do this and test it, especially as it is everything else but sure that you gain anything from it. Not to speak about e.g. migrating the DBS in a running system.

Doubelling the deadline also roughy doubles the size of the DB - we did this once about a year ago.

We already modified the scheduler to give shorter Tasks to slower machines. However our modifications only shift proababilities, you can't completely avoid to give a long Task to a slow machine or vice versa.

We might be able to bind the datafile to the hosts download rate in a similar way, but I'm afraid that might also require to change e.g. the Workunit Generator. I'll take a look at that, but I doubt that I come to that in the next few weeks. However, in the longer term - volunteer computing is a great concept, and the number of BOINC projects is continously growing. It might be that with a dialup connection Einstein@Home is not the best way to contribute your computing power to bleeding-edge science. My crystal ball is not very clear, but at the end of the LSC S5 science run we'll hopefully have more data to analyze than the intermediate set we are using now, so the data files for the next run will rather become larger.

Bernd, Bruce, Gary, Mike, a question for you!

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner