However the very first of those tasks somehow missed to read the wisdom file although it had been provided in the same directory - maybe that task had run a second or two right after downloading, before creation of wisdom began?
It takes a while for wisdom to take effect if there are existing tasks running.
My first wisdom made it slower and my second made it faster...
A quick spreadsheet and picture showing a plot of the time of completion (X axis) vs elapsed time (Yaxis) - Apologies for the wide screen... (it would be nice to have a wide screen)
So wisdom does not run automatically with this application? A bit out of my league otherwise.
Thanks!
I see you run Windows 10 on your hosts. Here's a procedure for generating the wisdom file. Below is a bunch of text, but it's really quite simple to generate the wisdom-file.
1. Exit Boinc client completely (stop it from running, via command prompt or from the exit dialogue on the Manager).
2. Make the host stay as still as possible. This is where different amounts of effort could be put into. Simple version is to close/stop/disable as much as possible of the non-critical programs and services that are running. Disconnecting internet connection will help calm down Windows 10 from doing this and that in the background.
3. Start command prompt (press WIN key, type "cmd", right click cmd.exe or command promt with the mouse and choose "run as administrator".)
4. Go to Einstein data directory. Type in commands something like this, depending on where the data directory is located in your host (e:\boinc_data\projects\einstein.phys.uwm.edu , in this example):
e:
cd boinc_data\projects\einstein.phys.uwm.edu\
5. Type in this and press enter: dir fftwf-wisdom_FGRPB1_1.08_*
You will see what is the [exact name] of the exe-file, that is used to generate the wisdom file. I believe it is fftwf-wisdom_FGRPB1_1.08_windows_intelx86.exe, but check that. Then use that [exact name] in final command, which will start the generation of the wisdom-file. Below is an example.
Process will start and then there is nothing at all happening at the command prompt window for a long time. No blinking cursor or anything. It may take close to an hour until the process is finished. After that you will see a new command prompt line and a blinking cursor again, but there won't be any kind of information or report.
A fresh wisdom-file was automatically saved in that Einstein project directory. You'r all set. Then this FGRPB1 app should be able to find that wisdom-file automatically and use it with new tasks. After the tasks have finished you can check that from the TASK ID STDerr output, for example
2. Make the host stay as still as possible. This is where different amounts of effort could be put into. Simple version is to close/stop/disable as much as possible of the non-critical programs and services that are running. Disconnecting internet connection will help calm down Windows 10 from doing this and that in the background.
How certain are you about this? How do you explain AgentB's plot earlier in this thread where he got the best result with wisdom2 which was determined while crunching was occurring? What about his comment that, "the logic being the wisdom tunes the FFT based on the running environment"? That sounds quite plausible to me.
I did my own experiment around the same time where I shut down BOINC and had nothing else running whilst doing the wisdom generation. The file produced was then used on the same host to run a beta task and the result was a slightly longer run time. I then tried again on a different (more modern) architecture and this time (unintentionally) created wisdom whilst BOINC was running at full bore. I had thought I'd shut it down - but hadn't. It was all reported here.
As a followup to that report, I should note that the very same wisdom file on the group of G3260 CPU hosts has resulted in all of them showing a similar modest speedup. I had intended to redo the wisdom generation on an otherwise idle host but haven't got a round tuit yet - they're still on backorder :-). Once AgentB had posted his logic, there didn't seem to be the same urgency :-).
As a result of Bernd's recent announcement about FGRPB1 being replaced with FGRP5, there's a bit more urgency for me. I have a lot of hosts with E6300/E6500 Pentium dual core CPU (Wolfdale series) and even more with Q8400 core2 quads (Yorkfield series). I set up a host with an E6300 CPU and an RX 460 graphics card with the beta app and determined wisdom with everything running - normal crunching load.
I transferred that wisdom to a couple of others (same hardware) as well. The first result is in and it's quite a decent speedup. The environment is an RX 460 GPU (2x) in an E6300 CPU host running 1 FGRP style CPU task with a 'free' core for GPU support. V1.05 tasks on these hosts take around 10-11 hours in the main. I have 1 completed and a couple of partly completed tasks and it looks like there will be close to a 1 hour improvement in crunch time - close to 10% speedup. It will take many more results over a range of hosts to know more precise figures. I now have five E6300 hosts running the V1.08 app and will be converting the remainder shortly.
I've also setup four Q8400 hosts with HD7850/R7 370 (Pitcairn series) GPUs to run the V1.08 CPU tasks. It will be a while before the first ones finish but with the progress percentage that seems to be more reliable with the new app, these seem destined to have a nice speedup as well.
2. Make the host stay as still as possible. This is where different amounts of effort could be put into. Simple version is to close/stop/disable as much as possible of the non-critical programs and services that are running. Disconnecting internet connection will help calm down Windows 10 from doing this and that in the background.
How certain are you about this? How do you explain AgentB's plot earlier in this thread where he got the best result with wisdom2 which was determined while crunching was occurring? What about his comment that, "the logic being the wisdom tunes the FFT based on the running environment"? That sounds quite plausible to me.
I'm not certain at all. I must have missed that AgentB's message or it had went through my head without me understanding the content. I was stuck on my own initial interpretation that a host shoudn't be doing anything at the background while creating a wisdom-file. But creating the file under "real" crunching circumstances sounds logical for sure, because those same conditions would then apply for the rest of the time when the wisdom-file would be actually used.
A bit of an update, this is actually a complicated problem and it's not quite as obvious as i first thought.
There are quite a number of factors in play. From what i have read the wisdom file is the result of solving a specific FFT problem (rib67108864) using different parameters and algorithms.
There are several bottlenecks, how fast memory can be moved in and out of cache(s), processor speed, size of SIMD registrars and probably a bunch more.
I imagine that using one wisdom file we might use 100% of one resource and very little of another resource.
Rerunning the calculations "under load" would skew the next wisdom to use more unused resources and away from what is being used "under load".
I found that if i reran the wisdom when the system was running efficiently (under load), it made for a wisdom that was significantly slower and vice versa. It did not get faster for each iteration. I think it might be better behaved if you were running only one task.
I only really tested on the Xeon, and if i had some more time i'd start up PCM and see what is really going on (with each wisdom).
I'm seeing if running with the exhaustive option (-x) produces a super wisdom...
but after 24 hours it hasn't exhausted itself yet!
So in summary - build it and see, if it is slower build another one, then pick the fastest. If you want to get closer to reality a goodish strategy might be to drop boinc one processor thread while running the wisdom, or try running N wisdoms simultaneously!
I found that if i reran the wisdom when the system was running efficiently (under load), it made for a wisdom that was significantly slower and vice versa. It did not get faster for each iteration. I think it might be better behaved if you were running only one task.
Thanks very much for posting your experiences. It should save me a lot of stuffing around :-).
On my E6300 Wolfdales and Q8400 Yorkfields, I've calculated wisdom once in each case on a machine under normal crunching load - 2x GPU tasks (all AMD) and a single CPU task. I've deployed the appropriate wisdom file to a number of other hosts, similarly configured. A number of these already have completed tasks showing a very nice speedup of the order of 8-10% or so.
I certainly agree that generating wisdom whilst the host is under its normal crunching load seems to be the correct thing to do. Perhaps I've got really lucky with my choice of just running a single CPU task, irrespective of whether it's a dual core or a quad. That '1 CPU task' choice was for power/heat reasons since just about 100% of my hosts now have crunching GPUs and the CPU contribution is fairly minimal for the extra power used.
I've just checked four Q8400 hosts that have so far returned a 'wisdom assisted' result and all are showing around a 40 minute reduction in crunch time compared to what they were getting with the V1.05 app. Seems pretty good to measure once and deploy across all 'same CPU model' hosts.
I rerun wisdom while host was under moderate load at least (crunching 5 of these tasks). Compared to "no-load" wisdom, tasks seem to take only about 2 minutes longer now. If this difference will be observable later on, it still seems to be small.
solling2 wrote:However the
)
It takes a while for wisdom to take effect if there are existing tasks running.
My first wisdom made it slower and my second made it faster...
A quick spreadsheet and picture showing a plot of the time of completion (X axis) vs elapsed time (Yaxis) - Apologies for the wide screen... (it would be nice to have a wide screen)
This Host, Xenon E3 1225 v3
)
This Host, Xenon E3 1225 v3 with Win7-64, ist ~one hour faster with 1.08:
1.08: ~13,600 Sec (with wisdom file, took ~35 min)
1.05: ~17.100 Sec
4 tasks FGRPB1 at a time.
Nice speedup!
So wisdom does not run
)
So wisdom does not run automatically with this application? A bit out of my league otherwise.
Thanks!
rbpeake wrote:So wisdom does
)
I see you run Windows 10 on your hosts. Here's a procedure for generating the wisdom file. Below is a bunch of text, but it's really quite simple to generate the wisdom-file.
1. Exit Boinc client completely (stop it from running, via command prompt or from the exit dialogue on the Manager).
2. Make the host stay as still as possible. This is where different amounts of effort could be put into. Simple version is to close/stop/disable as much as possible of the non-critical programs and services that are running. Disconnecting internet connection will help calm down Windows 10 from doing this and that in the background.
3. Start command prompt (press WIN key, type "cmd", right click cmd.exe or command promt with the mouse and choose "run as administrator".)
4. Go to Einstein data directory. Type in commands something like this, depending on where the data directory is located in your host (e:\boinc_data\projects\einstein.phys.uwm.edu , in this example):
e:
cd boinc_data\projects\einstein.phys.uwm.edu\
5. Type in this and press enter: dir fftwf-wisdom_FGRPB1_1.08_*
You will see what is the [exact name] of the exe-file, that is used to generate the wisdom file. I believe it is fftwf-wisdom_FGRPB1_1.08_windows_intelx86.exe, but check that. Then use that [exact name] in final command, which will start the generation of the wisdom-file. Below is an example.
6. Type in this and press enter:
fftwf-wisdom_FGRPB1_1.08_windows_intelx86.exe -o FGRPB1wisdom.dat rib67108864
Process will start and then there is nothing at all happening at the command prompt window for a long time. No blinking cursor or anything. It may take close to an hour until the process is finished. After that you will see a new command prompt line and a blinking cursor again, but there won't be any kind of information or report.
A fresh wisdom-file was automatically saved in that Einstein project directory. You'r all set. Then this FGRPB1 app should be able to find that wisdom-file automatically and use it with new tasks. After the tasks have finished you can check that from the TASK ID STDerr output, for example
https://einsteinathome.org/task/670914773
There should be lines saying something like
Thanks for the explanation, I
)
Thanks for the explanation, I will give that a try!
Edit: I can see the process working via the Windows Task Manager, so it works!
Richie_9 wrote:2. Make the
)
How certain are you about this? How do you explain AgentB's plot earlier in this thread where he got the best result with wisdom2 which was determined while crunching was occurring? What about his comment that, "the logic being the wisdom tunes the FFT based on the running environment"? That sounds quite plausible to me.
I did my own experiment around the same time where I shut down BOINC and had nothing else running whilst doing the wisdom generation. The file produced was then used on the same host to run a beta task and the result was a slightly longer run time. I then tried again on a different (more modern) architecture and this time (unintentionally) created wisdom whilst BOINC was running at full bore. I had thought I'd shut it down - but hadn't. It was all reported here.
As a followup to that report, I should note that the very same wisdom file on the group of G3260 CPU hosts has resulted in all of them showing a similar modest speedup. I had intended to redo the wisdom generation on an otherwise idle host but haven't got a round tuit yet - they're still on backorder :-). Once AgentB had posted his logic, there didn't seem to be the same urgency :-).
As a result of Bernd's recent announcement about FGRPB1 being replaced with FGRP5, there's a bit more urgency for me. I have a lot of hosts with E6300/E6500 Pentium dual core CPU (Wolfdale series) and even more with Q8400 core2 quads (Yorkfield series). I set up a host with an E6300 CPU and an RX 460 graphics card with the beta app and determined wisdom with everything running - normal crunching load.
I transferred that wisdom to a couple of others (same hardware) as well. The first result is in and it's quite a decent speedup. The environment is an RX 460 GPU (2x) in an E6300 CPU host running 1 FGRP style CPU task with a 'free' core for GPU support. V1.05 tasks on these hosts take around 10-11 hours in the main. I have 1 completed and a couple of partly completed tasks and it looks like there will be close to a 1 hour improvement in crunch time - close to 10% speedup. It will take many more results over a range of hosts to know more precise figures. I now have five E6300 hosts running the V1.08 app and will be converting the remainder shortly.
I've also setup four Q8400 hosts with HD7850/R7 370 (Pitcairn series) GPUs to run the V1.08 CPU tasks. It will be a while before the first ones finish but with the progress percentage that seems to be more reliable with the new app, these seem destined to have a nice speedup as well.
Cheers,
Gary.
Gary Roberts wrote:Richie_9
)
I'm not certain at all. I must have missed that AgentB's message or it had went through my head without me understanding the content. I was stuck on my own initial interpretation that a host shoudn't be doing anything at the background while creating a wisdom-file. But creating the file under "real" crunching circumstances sounds logical for sure, because those same conditions would then apply for the rest of the time when the wisdom-file would be actually used.
A bit of an update, this is
)
A bit of an update, this is actually a complicated problem and it's not quite as obvious as i first thought.
There are quite a number of factors in play. From what i have read the wisdom file is the result of solving a specific FFT problem (rib67108864) using different parameters and algorithms.
There are several bottlenecks, how fast memory can be moved in and out of cache(s), processor speed, size of SIMD registrars and probably a bunch more.
I imagine that using one wisdom file we might use 100% of one resource and very little of another resource.
Rerunning the calculations "under load" would skew the next wisdom to use more unused resources and away from what is being used "under load".
I found that if i reran the wisdom when the system was running efficiently (under load), it made for a wisdom that was significantly slower and vice versa. It did not get faster for each iteration. I think it might be better behaved if you were running only one task.
I only really tested on the Xeon, and if i had some more time i'd start up PCM and see what is really going on (with each wisdom).
I'm seeing if running with the exhaustive option (-x) produces a super wisdom...
./fftwf-wisdom_FGRPB1_1.08_x86_64-pc-linux-gnu -x -o FGRPB1wisdom.5.dat rib67108864
but after 24 hours it hasn't exhausted itself yet!
So in summary - build it and see, if it is slower build another one, then pick the fastest. If you want to get closer to reality a goodish strategy might be to drop boinc one processor thread while running the wisdom, or try running N wisdoms simultaneously!
Good luck.
AgentB wrote:I found that if
)
Thanks very much for posting your experiences. It should save me a lot of stuffing around :-).
On my E6300 Wolfdales and Q8400 Yorkfields, I've calculated wisdom once in each case on a machine under normal crunching load - 2x GPU tasks (all AMD) and a single CPU task. I've deployed the appropriate wisdom file to a number of other hosts, similarly configured. A number of these already have completed tasks showing a very nice speedup of the order of 8-10% or so.
I certainly agree that generating wisdom whilst the host is under its normal crunching load seems to be the correct thing to do. Perhaps I've got really lucky with my choice of just running a single CPU task, irrespective of whether it's a dual core or a quad. That '1 CPU task' choice was for power/heat reasons since just about 100% of my hosts now have crunching GPUs and the CPU contribution is fairly minimal for the extra power used.
I've just checked four Q8400 hosts that have so far returned a 'wisdom assisted' result and all are showing around a 40 minute reduction in crunch time compared to what they were getting with the V1.05 app. Seems pretty good to measure once and deploy across all 'same CPU model' hosts.
Cheers,
Gary.
I rerun wisdom while host was
)
I rerun wisdom while host was under moderate load at least (crunching 5 of these tasks). Compared to "no-load" wisdom, tasks seem to take only about 2 minutes longer now. If this difference will be observable later on, it still seems to be small.