The truncation issue should be fixed now. I now see longer feature strings in the logfile where hosts are missing AVX. You should get work now if your CPU does support AVX.
There is evidence in the O1AS20 task list list for one of AgentB's Linux machines that the truncation fixed worked to enable AVX WU download by that machine.
The last non-AVX task on view there shows a 19 Feb 2016, 8:24:51 UTC send date, while more than one task sent starting 19 Feb 2016, 12:01:23 UTC is AVX.
Here is an updated version of the table showing GW tasks distributed per ~24 hour period.
Date and Time UTC Status page item O1AS20-100T Change Comment
16 Feb 2016, 20:48:31 Tasks total 746,736
16 Feb 2016, 20:48:31 Tasks to send 741,997 0
17 Feb 2016, 21:05:01 Tasks to send 738,475 -3552 Initial cache fillup on hosts
18 Feb 2016, 21:35:02 Tasks to send 736,549 -1926 Hmmm.. these tasks chew sloooow
19 Feb 2016, 21:45:01 Tasks to send 733,902 -2647 Picked up a bit... New V1.04 app
20 Feb 2016, 21:00:01 Tasks to send 731,502 -2400 Looks 'steady state' now
21 Feb 2016, 21:50:02 Tasks to send 728,666 -2836 Up a bit more...
A new version (1.04) of the app appeared just a few hours ago - around closing time on a Friday!! I hope they weren't doing that remotely from a bar somewhere :-).
The truncation issue should be fixed now. I now see longer feature strings in the logfile where hosts are missing AVX. You should get work now if your CPU does support AVX.
There is evidence in the O1AS20 task list list for one of AgentB's Linux machines that the truncation fixed worked to enable AVX WU download by that machine.
I aborted a number of the v1.03 X64 (SSE tasks) - to get the AVX tasks started.
A panic requiring a restart, during these tasks, or close to their completion AVX1 and AVX2 and the details logged appear truncated. This could be simple as tasks completed, panic occurred - stderr_txt log files truncated, and these then reported later after restart. I expect i may see some invalids on those two or related tasks.
It will be a few hours before more AVX will finish.
A couple of early observations as these AVX complete on this host. The AVX times for me are ~10% longer, and CPU temps ~5C higher.
Comparing 32 bit(v1.02) and 64 bit(v1.03 an v1.04) - 64bit is much (at least 25%) faster.
Aside: I notice most tasks have unsent "wingmen", and there is a growing number of tasks to validate - which have no quorum tasks sent as yet. I guess this is to be expected in the beta stage where tasks are many and wingmen, few.
Aside: I notice most tasks have unsent "wingmen", and there is a growing number of tasks to validate - which have no quorum tasks sent as yet. I guess this is to be expected in the beta stage where tasks are many and wingmen, few.
This is what happens with GW type runs. It's due to the way locality scheduling (LS) works. The large data files are arranged in steps of 0.05Hz. To crunch a single task you need a number of these data files covering a frequency range. Double this again for the two detectors (h1 and l1). If you got tasks 'at random' your data download would be enormous, potentially a full set of data for each task. LS is vital for people who have low monthly bandwidth caps.
Once you have a set of data, you can get lots (up to thousands) of tasks that will reuse the same data. When that particular frequency set is finished, you will get just the extra 2 files to extend the range by a further 0.05Hz so that a whole slew of new tasks becomes available that will still be using some files you already have. This all works quite well when there are not too many frequency bins and lots of computers to provide quorum partners.
The current situation is that there aren't all that many computers that have 'test apps allowed' and all of the available frequency bins are in play (all tasks in the tuning run are in the database). The scheduler has all frequencies to choose from so hosts will be 'scattered thinly' :-). This should change fairly soon when the app (hopefully V1.04) is deemed ready for prime time.
As a side note, the transition from V1.00 -> 1.01 -> 1.02 -> 1.03 -> 1.04 has been quite rapid. The last transition was for a problem in the science code whereas the earlier ones were to fix operational problems. If anyone has earlier version tasks that have not yet started, the best action would be to abort them so that they could be reissued as the current version. I don't think it's useful to waste the time on a version with an already known problem.
A couple of early observations as these AVX complete on this host. The AVX times for me are ~10% longer, and CPU temps ~5C higher.
I've just found an example of a direct comparison of the X64 and AVX Linux apps on a single host of mine. The tasks of interest are at the bottom of the page. The link should work, as is, for the next 24 hours at least, since the host won't get new work for the moment. Of the (currently) 4 completed tasks there are two examples for each app type.
The CPU is an i3-3240. As you say, AVX is taking 10% longer. That's pretty disappointing. Hopefully something can be done about that.
EDIT: I managed to get two machines mixed up. The one above is an i3-4130 (Haswell and not Ivy Bridge) and it's continuing to draw new tasks so the 4 comparison tasks are partly on the second page now. There are also further AVX results continuing to show the longer times :-(.
The i3-3240 that I thought I was looking at started with the AVX app from the beginning and the crunch times are just slightly slower than those of the Haswell. I thought there would be more of a difference between Ivy Bridge and Haswell. It's got some FGRPB1 to finish before it gets back to GW tasks.
Comparing 32 bit(v1.02) and 64 bit(v1.03 an v1.04) - 64bit is much (at least 25%) faster.
I have this host which got SSE2 (32bit) tasks for a while until the 64bit detection worked. The speedup is not 25% on this old clunker but nevertheless it's still quite impressive (102K -> 88K).
The CPU is a single core AMD Sempron and, through a BIOS trick, it's been turned into a dual core. I had retired it quite a while ago but I thought it would be good to run it again for testing purposes on the new GW run.
The CPU is an i3-3240. As you say, AVX is taking 10% longer. That's pretty disappointing. Hopefully something can be done about that.
It's even worse on my FX-8320E. The 1.03 X64 tasks took between 18 and 19 hours which is already much longer than expected. The 1.04 AVX tasks running right now seem to be heading for the 23 hours mark.
Looks like the O1AS20-100T apps may now be out of test status? Either that or something really, really big has been added to the beta test crunchers. Note the sudden dramatic drop in the remaining 'tasks to send' that has occurred in the last 24 hours.
Date and Time UTC Status page item O1AS20-100T Change Comment
16 Feb 2016, 20:48:31 Tasks total 746,736
16 Feb 2016, 20:48:31 Tasks to send 741,997 0
17 Feb 2016, 21:05:01 Tasks to send 738,475 -3552 Initial cache fillup on hosts
18 Feb 2016, 21:35:02 Tasks to send 736,549 -1926 Hmmm.. these tasks chew sloooow
19 Feb 2016, 21:45:01 Tasks to send 733,902 -2647 Picked up a bit... New V1.04 app
20 Feb 2016, 21:00:01 Tasks to send 731,502 -2400 Looks 'steady state' now
21 Feb 2016, 21:50:02 Tasks to send 728,666 -2836 Up a bit more...
22 Feb 2016, 21:05:02 Tasks to send 707,949 -20717 Open the floodgates...?
22 Feb 2016, 21:30:01 Tasks to send 707,300 -649 In 25 mins ie. ~38,000/day
The 'drop' seems to be accelerating so get in quick before they're all gone :-).
Looks like the O1AS20-100T apps may now be out of test status?
You cite pretty good evidence at the macro level. At a micro level, I just looked at my resumption on my Westmere, and realized my quorum partner (the same for all six tasks, naturally) never got a single O1AS20-100T task until 22 Feb 2016, 10:00:07 UTC, and has subsequently gotten 134.
I predict an epidemic of deadline misses starting 5 days from now.
The first pulse will be from 5-day deadline O1AS20 tasks issued today, the next from 7-day deadline work, then a more diffuse cloud as longer-deadline work on Gamma-Ray Pulsar Binary 1 or from other projects pre-empted by priority given the O1AS20 work then fails to meet deadline.
In other words, I think the work content for these is still underestimated, so over-fetch will happen on hosts set for big caches which have stabilized on the previous work. That will interact with the short deadlines (yes, longer than they were at the start) to give a bit more of this sort of commotion than usual. The good news is that if the host goes into deadline protection priority mode quickly, the DCF will boom up after the first completion, which will snub the excess fetching.
RE: The truncation issue
)
There is evidence in the O1AS20 task list list for one of AgentB's Linux machines that the truncation fixed worked to enable AVX WU download by that machine.
The last non-AVX task on view there shows a 19 Feb 2016, 8:24:51 UTC send date, while more than one task sent starting 19 Feb 2016, 12:01:23 UTC is AVX.
Here is an updated version of
)
Here is an updated version of the table showing GW tasks distributed per ~24 hour period.
A new version (1.04) of the app appeared just a few hours ago - around closing time on a Friday!! I hope they weren't doing that remotely from a bar somewhere :-).
Edit: Added 20 Feb entry to table.
Edit2: Added 21 Feb entry to table.
Cheers,
Gary.
RE: RE: The truncation
)
I aborted a number of the v1.03 X64 (SSE tasks) - to get the AVX tasks started.
A panic requiring a restart, during these tasks, or close to their completion AVX1 and AVX2 and the details logged appear truncated. This could be simple as tasks completed, panic occurred - stderr_txt log files truncated, and these then reported later after restart. I expect i may see some invalids on those two or related tasks.
It will be a few hours before more AVX will finish.
A couple of early observations as these AVX complete on this host. The AVX times for me are ~10% longer, and CPU temps ~5C higher.
Comparing 32 bit(v1.02) and 64 bit(v1.03 an v1.04) - 64bit is much (at least 25%) faster.
Aside: I notice most tasks have unsent "wingmen", and there is a growing number of tasks to validate - which have no quorum tasks sent as yet. I guess this is to be expected in the beta stage where tasks are many and wingmen, few.
RE: Aside: I notice most
)
This is what happens with GW type runs. It's due to the way locality scheduling (LS) works. The large data files are arranged in steps of 0.05Hz. To crunch a single task you need a number of these data files covering a frequency range. Double this again for the two detectors (h1 and l1). If you got tasks 'at random' your data download would be enormous, potentially a full set of data for each task. LS is vital for people who have low monthly bandwidth caps.
Once you have a set of data, you can get lots (up to thousands) of tasks that will reuse the same data. When that particular frequency set is finished, you will get just the extra 2 files to extend the range by a further 0.05Hz so that a whole slew of new tasks becomes available that will still be using some files you already have. This all works quite well when there are not too many frequency bins and lots of computers to provide quorum partners.
The current situation is that there aren't all that many computers that have 'test apps allowed' and all of the available frequency bins are in play (all tasks in the tuning run are in the database). The scheduler has all frequencies to choose from so hosts will be 'scattered thinly' :-). This should change fairly soon when the app (hopefully V1.04) is deemed ready for prime time.
As a side note, the transition from V1.00 -> 1.01 -> 1.02 -> 1.03 -> 1.04 has been quite rapid. The last transition was for a problem in the science code whereas the earlier ones were to fix operational problems. If anyone has earlier version tasks that have not yet started, the best action would be to abort them so that they could be reissued as the current version. I don't think it's useful to waste the time on a version with an already known problem.
Cheers,
Gary.
RE: A couple of early
)
I've just found an example of a direct comparison of the X64 and AVX Linux apps on a single host of mine. The tasks of interest are at the bottom of the page. The link should work, as is, for the next 24 hours at least, since the host won't get new work for the moment. Of the (currently) 4 completed tasks there are two examples for each app type.
The CPU is an i3-3240. As you say, AVX is taking 10% longer. That's pretty disappointing. Hopefully something can be done about that.
EDIT: I managed to get two machines mixed up. The one above is an i3-4130 (Haswell and not Ivy Bridge) and it's continuing to draw new tasks so the 4 comparison tasks are partly on the second page now. There are also further AVX results continuing to show the longer times :-(.
The i3-3240 that I thought I was looking at started with the AVX app from the beginning and the crunch times are just slightly slower than those of the Haswell. I thought there would be more of a difference between Ivy Bridge and Haswell. It's got some FGRPB1 to finish before it gets back to GW tasks.
Cheers,
Gary.
RE: Comparing 32 bit(v1.02)
)
I have this host which got SSE2 (32bit) tasks for a while until the 64bit detection worked. The speedup is not 25% on this old clunker but nevertheless it's still quite impressive (102K -> 88K).
The CPU is a single core AMD Sempron and, through a BIOS trick, it's been turned into a dual core. I had retired it quite a while ago but I thought it would be good to run it again for testing purposes on the new GW run.
Cheers,
Gary.
RE: The CPU is an i3-3240.
)
It's even worse on my FX-8320E. The 1.03 X64 tasks took between 18 and 19 hours which is already much longer than expected. The 1.04 AVX tasks running right now seem to be heading for the 23 hours mark.
Looks like the O1AS20-100T
)
Looks like the O1AS20-100T apps may now be out of test status? Either that or something really, really big has been added to the beta test crunchers. Note the sudden dramatic drop in the remaining 'tasks to send' that has occurred in the last 24 hours.
The 'drop' seems to be accelerating so get in quick before they're all gone :-).
Cheers,
Gary.
RE: Looks like the
)
That sounds correct to me: I´ve not enabled testing but received my first O1 WU today.
RE: Looks like the
)
You cite pretty good evidence at the macro level. At a micro level, I just looked at my resumption on my Westmere, and realized my quorum partner (the same for all six tasks, naturally) never got a single O1AS20-100T task until 22 Feb 2016, 10:00:07 UTC, and has subsequently gotten 134.
I predict an epidemic of deadline misses starting 5 days from now.
The first pulse will be from 5-day deadline O1AS20 tasks issued today, the next from 7-day deadline work, then a more diffuse cloud as longer-deadline work on Gamma-Ray Pulsar Binary 1 or from other projects pre-empted by priority given the O1AS20 work then fails to meet deadline.
In other words, I think the work content for these is still underestimated, so over-fetch will happen on hosts set for big caches which have stabilized on the previous work. That will interact with the short deadlines (yes, longer than they were at the start) to give a bit more of this sort of commotion than usual. The good news is that if the host goes into deadline protection priority mode quickly, the DCF will boom up after the first completion, which will snub the excess fetching.