DCF disabled :-(

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0
Topic 197933

Crappy BOINC change (7.1.18) :

> client: if project sends dont_use_dcf, set its DCF to 1.

I just installed 7.4.36 (from 5.10.28) and unfortunately they seem to have set the default to 1 and if a project does not enable DCF, the client will not use DCF. Einstein does not send but still my client has a value of 1 there (in all my current projects).

As user voices are hardly heard by the BOINC devs ... maybe you could make them fix that bug.

edit : It seems to be fine, it just took much longer than usual and I was impatient - it is still default on server side but that I can fix with a small patch on client side

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

DCF disabled :-(

Set dcf_debug and post some output.

I'm running Boinc 7.4.36, Einstein still uses DCF here, from the client_state.xml:

Quote:
1.743956

Edit: and on the website: Task duration correction factor 1.743956

Claggy

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

sorry, I edited my starting

sorry, I edited my starting post ... it does work but it took several updates until it was visible on the host's status page here. Thanks :-) Maybe it requires a validated result until the change is taken over, I expected it to be taken over on each contact.

I disabled the flag completely in my client so all projects will use DCF again.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

According to your last

According to your last scheduler contact DCF is being used (The time is UTC, so that was over three hours ago):

http://einstein.phys.uwm.edu/host_sched_logs/11723/11723146

Quote:
2015-01-18 12:03:15.9447 [PID=6562] Request: [USER#xxxxx] [HOST#11723146] [IP xxx.xxx.xxx.110] client 7.4.36
2015-01-18 12:03:15.9460 [PID=6562 ] [handle] [HOST#11723146] [RESULT#478757445] [WU#209304066] got result (DB: server_state=4 outcome=0 client_state=0 validate_state=0 delete_state=0)
2015-01-18 12:03:15.9460 [PID=6562 ] [handle] cpu time 55723.050000 credit/sec 0.005297, claimed credit 295.187298
2015-01-18 12:03:15.9469 [PID=6562 ] [handle] [RESULT#478757445] [WU#209304066]: setting outcome SUCCESS
2015-01-18 12:03:15.9534 [PID=6562 ] [send] effective_ncpus 16 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-01-18 12:03:15.9534 [PID=6562 ] [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2015-01-18 12:03:15.9534 [PID=6562 ] [send] Not using matchmaker scheduling; Not using EDF sim
2015-01-18 12:03:15.9534 [PID=6562 ] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2015-01-18 12:03:15.9534 [PID=6562 ] [send] work_req_seconds: 0.00 secs
2015-01-18 12:03:15.9534 [PID=6562 ] [send] available disk 20.55 GB, work_buf_min 30239
2015-01-18 12:03:15.9534 [PID=6562 ] [send] active_frac 0.999969 on_frac 0.999988 DCF 1.342297
2015-01-18 12:03:15.9580 [PID=6562 ] Sending reply to [HOST#11723146]: 0 results, delay req 60.00
2015-01-18 12:03:15.9590 [PID=6562 ] Scheduler ran 0.018 seconds

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2960166065
RAC: 713613

That's very, very old news -

That's very, very old news - dating back to April 2012 at least.

I have four hosts running BOINC v7.4.36: their respective DCFs here at Einstein are

1.8470
0.7083
0.6058
0.6071

To a large extent, that reflects the problem with a single DCF: the different hosts run a different mixture of Einstein applications, and each tends to settle - if allowed - at a different DCF value. If you run more than one application per project on a single host, DCF gets pulled in opposite directions, and can never stabilise.

DCF takes one value per project. The fact that your other projects show values of exactly 1 should not affect Einstein: if they do, or if Einstein remains at exactly 1, that would indeed be a bug - but be careful not to confuse Einstein with Albert, which does send .

The attempt to make a multi-valued analog of DCF dates back even further than 2012, to documents like Runtime Estimation from 2010. In my estimation, the replacement host_app_version APR mechanism works tolerably well - at least as well as DCF - in the limited range of situations for which it is suitable. I have spent a significant part of the last five years pointing out the circumstances - non-deterministic runtimes, initial launch of a new application, initial attachment of new hosts - where it fails. Those, IMHO, are bugs: the replacement of DCF per se (again IMHO) isn't.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

RE: ... DCF takes one value

Quote:
... DCF takes one value per project. ...

Agreed, this is a bug, that has been reported many years ago already

Quote:
... host_app_version ...

If that is used as a DCF replacement when a project reply contains , it does not work so well.

Extreme example :

When I switched to 7.4.36, Asteroids estimated runtimes jumped up to more than 6 days/WU (actual runtime on CPU is ~6-7 hours for that box) but it didn't adjust the estimated runtime at all when results were finished. This has been a lot better in in 5.10.28.

It is not a benchmark result issue, as those values did not change that much after the update.

Strange : When I switched off the dont_use_dcf flag recognition, the estimate even jumped up to 8.22 days and just when a result has been finished, it even went up some more. It looks very much as if they have messed up something since 5.10.28, couldn't figure out what they had broken and therefore replaced the whole DCF thing.

But sorry, it turned out that Einstein is not affected so I guess I posted this to the wrong project :-/ I just knew from the past that the devs here and on PrimatesPirates know the Berkeley stuff best on source level so this was the first place to visit. I lost track at some point and didn't follow the change history anymore. I might still be familiar with a few lines of code but most sure have changed since my last SVN checkout.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

RE: According to your last

Quote:
According to your last scheduler contact DCF is being used (The time is UTC, so that was over three hours ago):...


You're right, when I posted it, it already had fixed itself but I failed to check it again before I posted.

p.s.: I checked it just immediately after I posted, found that it had fixed itself and edited my posting, while you replied. I would rather have deleted the thread if I would have been allowed.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Some of the mysteries did

Some of the mysteries did clear up now. The remaining time reported by BOINCview and the remaining time reported by the BOINC GUI are totally different for some (but not for all) projects.

And the remaining time reported for unstarted results through the BOINC GUI does change after each finished result, whereas the time in BOINCview does that only now and then and only if DCF is enabled.

The host where I run Einstein is a "headless cruncher", so I did not even start the BOINC GUI there.

I'm aware that the official reading is, that BOINCview does not work with CC versions above 6 - but I like it and the basic functions seem to work so I will not switch to BOINCtasks for now.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: I'm aware that the

Quote:
I'm aware that the official reading is, that BOINCview does not work with CC versions above 6 - but I like it and the basic functions seem to work so I will not switch to BOINCtasks for now.

There is also BOINCtui if you're running Linux on that headless cruncher. Can't say I've tried it thought, I run BOINCtasks.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

RE: There is also BOINCtui

Quote:
There is also BOINCtui if you're running Linux on that headless cruncher. Can't say I've tried it thought, I run BOINCtasks.


Nice finding, I ported some software from DOS 4.01 to SYSV using libcurses, it has been a bit sluggish back then (compared to the DOS versions, where I accessed the hardware through an ASM/C mix) but I used only a small subset and kept a screen backup in the program memory so it was still fairly fast. The speed of nowadays ncurses has probably been improved a lot.

But no, I work with AIX, SLES and RHEL but at home I currently have only Windows boxes. I bookmarked the project site though, I'm thinking about a small Avoton cruncher that might need Linux.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.