Forgive my naiveté, but I only visit the message boards occasionally.
I run Einstein and Rosetta at 10% each, and SETI at 80%. After recently installing the last version of BOINC, I checked the status of some of the projects. The SETI task is expecting 28 hours of work to be done by 2/14, yet Einstein is expecting 61 hours to be done by 11/23. While I understand independent project can make different technical decisions regarding workload, somehow this just seems out of whack. In addition, a similar Einstein task on another computer had 6 hours remaining with one day left, causing BOINC to give it high priority in order to meet the deadline.
It feels like Einstein is not playing "fair." Is there some sort of adjustment I can make to remedy this? I’d rather not have to drop out in order to restore balance.
Bob
Copyright © 2024 Einstein@Home. All rights reserved.
Why are deadlines so *short*?
)
Bob,
You bring up the flip-side of the discussion currently going on over at SETI. If you look at this thread on the SETI boards you'll see me talking about this stuff with Ned Ludd. The reality is, SETI's deadlines are set so long as to not have to shift into EDF (Earliest Deadline First). This is because, as Ned put it, people would throw fits about EDF. The SETI team decided to make deadlines so lax that EDF would only happen in very rare cases or low resource allocation. You noticed EDF with the Einstein unit that ran exclusively for a while. Looked at over a long period of time, the resource shares are still honored, it just won't look that way if you look at only the short-term.
I've stated there that I think the deadlines are too long. I have stated here that I think the deadlines are a tiny bit too short, like 16-18 days would be good rather than 14 days. Deadline pressure here will be helped when they get optimization into the current apps, assuming you have a processor that will support SSE optimization (I don't know if they are going to do FPU optimizations).
At any rate, 61 hours cannot be done in 14 days at a 10% allocation. If the system is running 24x7, 61 hours of cpu time needs 610 hours of actual time, which is 25.42 days. To just squeak in without having to go into EDF, you'd need to increase the resource allocation for Einstein to 20% (18.2%, but I bumped it up for both downtime and the system being used for things other than BOINC for at least a little while). Alternatively, you could just not worry about EDF and realize that it's there for a purpose. For a period of time after the EDF, BOINC will increase allocations to the other projects to compensate for the extra time that was given to Einstein to meet the deadline.
Brian
RE: To just squeak in
)
Thanks for the reply and analysis that I was too lazy to perform.
So stated another way, Einstein is essentially ignoring the time allocation I've set, and relying on EDF to compensate? Seems like a poor design decision for a system that is relying on the goodwill of users in the first place.
Bob
RE: So stated another way,
)
BOINC does rely on EDF to compensate for the user having too small of a resource share. The functionality is built into BOINC, not each individual project. The goal of which is to try to ensure that results make it back in.
The best thing for me to tell you to do is to read this information about the work scheduler. This may help explain why things happen the way that they happen.
For Einstein S4 and S5R1, the results were much shorter running than what they are now. As such, your lower allocation was fine. When Einstein S5R2 came along, computation times of the results increased dramatically. It's at that point where they should've dealt with the deadlines better, making them longer, like 4 weeks. Instead, they bumped up to 3 weeks for the deadline. This still was not really enough for some systems, so there were many reissued results because hosts couldn't make the deadline.
Now with S5R3, it appears that the team thought that since the results were roughly 1/3 the overall "size", they should take 1/3 the runtime as S5R2 took. This would've gotten most hosts back down under the original 2 week deadline, so the deadline was dropped back to 2 weeks. The only problem was that the runtimes for S5R3 didn't end up being 1/3, but more like 1/2. Bernd could come up with some better numbers, but that's what it looks like to me. My system would take about 22-25 hours for the large S5R2 units. It now takes me 10-13 hours for S5R3, so that's roughly 1/2.
Actually, it is a very good design once you understand the purpose behind it... Take a look at the information about the work scheduler, but keep some coffee (or tea/soda) on hand in case you get bored... :)
Brian
RE: Seems like a poor
)
Oh, and BTW, SETI used to routinely send my system results that were due in 4 days in the middle of other results that were due in 2-3 weeks. This caused my system to shift into EDF fairly frequently because of having a too much of a work cache and only allocating 66% to SETI, which it figured out that it wouldn't make all of the results in 4 days...
I guess what I'm trying to say is, don't sweat EDF (or whatever they are calling it these days..."overcommited", I think). It will all work itself out. It won't be 100% lockstep when looked at short-term, but your percentages will be honored over time... Also, like I said, the deadline pressure here will be helped by optimization. That said, I do think that the deadlines here could be increased to 16 days instead of 14 and it would help some...
Thanks, Brian. I guess I'll
)
Thanks, Brian. I guess I'll put up with EDF for now, but I would also vote for increased deadlines.
After developing software for many years, I prefer to set deadlines that I can actually keep without scrambling around at the end like EDF does. Mechanisms like EDF have their own impact. And it's okay to be early. In addition, I like to set reasonable tolerances so I only have to manage exceptional cases when they're really important. It's an engineering tradeoff between precision and time/effort. If the science can tolerate it, make the deadline 28 days and let the system reach equilibrium on its own.
Enuff grumping.
Bob
Another limiting factor for
)
Another limiting factor for long deadline is the amound of pending credits that users will be willing to tolerate. Whatever you think about it, it's a fact that many users simply hate to wait for several weeks, sometimes months, for getting a result's credits.
To get a preview of how optimization will improve the situation, take a look at the discussion of the MacOS beta app 4.10. This app uses vectorized SSE code in a critical hotspot part of the app. The top performing Mac Pro on E@H is now completing workunits in 5 to 7 CPU-hours with this app.
If your PC supports SSE (AMD since Athlon XP or Intel since PIII), this will help to meet the deadline even on smaller resource shares in the future.
CU
H-B
So the Mac users have an
)
So the Mac users have an optimized app already? Cool! When will other people get it, do you know anything? I can hardly wait (even though I guess I'll have to wait longer for a FreeBSD app).
RE: After developing
)
You may not react favorably to this at first, but I'm just going to be honest with you...
It's actually your own setting that is causing EDF. That said, it's not exactly entirely your fault. S4 and S5R1 were much faster, so what you have worked fine up until S5R2. To completely avoid EDF, either your allocation needed to increase or the deadlines needed to increase, or perhaps both things needed to happen.
Also, it is only a "problem" if viewed over a short span of time. If you look at it over months of time, the overall percentages that you specify are maintained.
As Ned pointed out, people just don't like it. It's a perception thing...
RE: Another limiting factor
)
Yep. That's a big discussion at SETI too. I don't have a huge amount of pending, only about 1000-1500. As I stated in the SETI thread though, I'm not looking at it only for credits. Keeping the results around for much longer has an impact on database size. When they reduced the IR from 3 to 2, that helped reduce the size of the database, but it has to be partially offset by the results hanging out there for so long. The oldest pending I have over there was from the first week in October. There are some that still have pendings from late August and early September.
Ah. Peanut must've gotten a new datapack. When I looked it was 6-8 hours. I just picked up a new datapack and it seems like it will be slightly shorter on my system as well...
RE: So the Mac users have
)
Only Bernd can answer this, of course, but OS X on Intel was selected for the test because it's the best in terms of stability at E@H. Once the few remaining issues are worked out for the Windows and Linux apps, those platforms will follow (not necessarily in this order).
CU
H-B