Web replica down

robl

Joined: 2 Jan 13

Posts: 1709

Credit: 1454554471

RAC: 3561

RE: RE: RE: RE: I am

7 Mar 2013 14:31:55 UTC

Message 115166 in response to message 115163

(moderation:

)

Quote:

Quote:
Quote:
Quote:
I am not sure its just a "stats" issue. I share my computer with SETI and have plenty of work from them. However my downloads from E&H have dropped off quite a bit in the last few days. As of right now I have 2 tasks in progress. This not the norm so something is not quite right.

I am seeing the following in the online log which I don't understand:

2013-03-06 16:20:12.3545 [PID=12285] Request: [USER#xxxxx] [HOST#6382800] [IP xxx.xxx.xxx.22] client 7.0.29
2013-03-06 16:20:12.3568 [PID=12285] [debug] [HOST#6382800] Resetting nresults_today
2013-03-06 16:20:12.3576 [PID=12285] [handle] [HOST#6382800] [RESULT#351302051] [WU#151592225] got result (DB: server_state=4 outcome=0 client_state=0 validate_state=0 delete_state=0)
2013-03-06 16:20:12.3576 [PID=12285] [handle] cpu time 0.000000 credit/sec 0.003894, claimed credit 0.000000
2013-03-06 16:20:12.3578 [PID=12285] [handle] [RESULT#351302051] [WU#151592225]: setting outcome SUCCESS
2013-03-06 16:20:12.4147 [PID=12285] [send] effective_ncpus 8 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2013-03-06 16:20:12.4147 [PID=12285] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2013-03-06 16:20:12.4147 [PID=12285] [send] Not using matchmaker scheduling; Not using EDF sim
2013-03-06 16:20:12.4147 [PID=12285] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] work_req_seconds: 0.00 secs
2013-03-06 16:20:12.4147 [PID=12285] [send] available disk 87.30 GB, work_buf_min 0
2013-03-06 16:20:12.4147 [PID=12285] [send] active_frac 0.999992 on_frac 0.999911 DCF 0.897591
2013-03-06 16:20:12.4169 [PID=12285] Sending reply to [HOST#6382800]: 0 results, delay req 60.00
2013-03-06 16:20:12.4172 [PID=12285] Scheduler ran 0.069 seconds

What is the "matchmaker" comment about?

That's an interesting question, which I'm sure one of our esteemed (and technically adept) moderators will answer in due course.

But it has nothing to do with your downloads dropping off. You're not getting any new work because you're not asking for any new work. Look closer to home.

I am running on Linux X64. If I update E&H using the "update" button in Boinc Manager I download new WUs. It almost seems as though the "automatic" update is not taking place when jobs are complete. I have looked at the various parameters on this site and do not see any that could effect automatic download of WUs.

What am I missing?

The settings for the cache of work works like this for Boinc version 7:
"Computer is connected to the Internet about every: xx days" is a low water mark.
"Maintain enough work for an additional xx days" forms a high water mark.
Boinc will request enough work for low + high and then wait until it drops below the low water mark again before asking for more.
So if you set it to something like 1 + 0.1 Boinc will always keep about one days worth of work.

If you run more than one project you have to consider resource share, you have probably run more Einstein work in recent time than Seti work and now Seti is allowed to catch up.

Interesting. I am supporting S@H in addition to E&H. S&H had been down 3 days (Fri, Sat, and most of Sun) for electrical upgrades. They came back up late Sunday, ran Monday and were down again on Tues for their weekly admin functions. This gave E&H free reign over computer resources for several days. If a project is down because of admin/hardware upgrades does it make sense to give them back their time? I thought that timesharing was based upon "online/available" status and that downtime was not factored in. Is my understanding correct?

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2140

Credit: 2770501792

RAC: 913015

RE: RE: RE: 2013-03-06

7 Mar 2013 15:17:54 UTC

Message 115167 in response to message 115166

(moderation:

)

Quote:

Quote:
Quote:
2013-03-06 16:20:12.4147 [PID=12285] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00

The settings for the cache of work works like this for Boinc version 7:
"Computer is connected to the Internet about every: xx days" is a low water mark.
"Maintain enough work for an additional xx days" forms a high water mark.
Boinc will request enough work for low + high and then wait until it drops below the low water mark again before asking for more.
So if you set it to something like 1 + 0.1 Boinc will always keep about one days worth of work.

If you run more than one project you have to consider resource share, you have probably run more Einstein work in recent time than Seti work and now Seti is allowed to catch up.

Interesting. I am supporting S@H in addition to E&H. S&H had been down 3 days (Fri, Sat, and most of Sun) for electrical upgrades. They came back up late Sunday, ran Monday and were down again on Tues for their weekly admin functions. This gave E&H free reign over computer resources for several days. If a project is down because of admin/hardware upgrades does it make sense to give them back their time? I thought that timesharing was based upon "online/available" status and that downtime was not factored in. Is my understanding correct?

Not really. What is counted is the work actually done, without any consideration of why work may not have been available from a particular project at a particular time. Balancing the resource shares between projects is something which is done by your computer: the various project servers don't share information about downtime between themselves - each project is entirely autonomous.

In this case, your computer requested zero seconds of work for both CPU and GPU - in other words, it asked for no work at all. The reasons for that will be entirely contained in your own computer - it may be possible to deduce it from the level of local logging that you were maintaining at the time, or you may have to enable extra debug logging and wait for it to happen again.

TPCBF

Joined: 24 Nov 12

Posts: 17

Credit: 145310756

RAC: 14652

RE: - stats have been

7 Mar 2013 15:47:58 UTC

Message 115168 in response to message 115165

(moderation:

)

Quote:

- stats have been dumped from the master DB now, it depends on the stats sites when they'll pick it up.

Thanks, great, BOINCStats already did for me overnight (here in LA)...

Quote:

- work at UWM progresses faster than feared, we may have the replica working (or at least been worked on) later today.

Good to hear things are getting back to normal...

Ralf

Darth Beaver

Joined: 28 Jul 08

Posts: 49

Credit: 14208989

RAC: 0

Hi mate Scootty checked stats

7 Mar 2013 21:53:18 UTC

Message 115169 in response to message 115165

(moderation:

)

Hi mate Scootty checked stats today and yes e@H has updated stats but there seems to be a problem i've lost my combined stats and seti stats not showing up lost 250,000 points on seti ?? E@H score is yesterdays ,hopefully you guys can fix it by weekend ?

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5779100

RAC: 0

RE: - BOINC offers three

10 Mar 2013 22:46:33 UTC

Message 115170 in response to message 115165

(moderation:

)

Quote:

- BOINC offers three "schedulers", referred to as "old"/"array", "locality" and "matchmaker". On Einstein@Home we're using the array scheduler to send work for BRP(4) and FGRP(2) and the locality scheduler for GW (S6BucketLVE) work, the matchmaker isn't used. The log entry about it can safely been ignored.

Isn't the matchmaker scheduling used in combination with homogeneous redundancy, where the hosts will be compared to each other, to see if they match the required 'level-of-hardware', before work is sent out to them?

Nobody316

Joined: 14 Jan 13

Posts: 141

Credit: 2008126

RAC: 0

yay our Einstein@Home server

11 Mar 2013 12:02:35 UTC

Message 115171

(moderation:

)

yay our Einstein@Home server status page is back up fully.

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245217663

RAC: 12917

We are currently using the

11 Mar 2013 14:47:01 UTC

Message 115172

(moderation:

)

We are currently using the replica hosted @AEI for the Einstein@Home web pages. This means

- the information displayed during ~3-6 AM CET will not be updated (while the DB is dumped to disk)

- all information that is gotten from the replica is transferred (twice) across the Atlantic. This may increase latency.

- to limit I/O, the Pendings page has been modified to show only the sum of claimed credit. For individual tasks it refers to the Tasks page, filtering Pending tasks.

- to make a bit up for that the Tasks page now shows the number of tasks that match the current selection (State / Application) next to "Previous 20 / Next 20" navigation. This is not as convenient as the newer web code we are using over at Albert@Home, but should be better than nothing.

ggesmundo

Joined: 3 Jun 12

Posts: 31

Credit: 18699116

RAC: 0

Thanks for the update. I

11 Mar 2013 15:18:27 UTC

Message 115173 in response to message 115172

(moderation:

)

Thanks for the update. I especially like the counts, saves a lot of IO paging thru them to get a count.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245217663

RAC: 12917

RE: Seems every time

11 Mar 2013 15:27:44 UTC

Message 115174 in response to message 115156

(moderation:

)

Quote:

Seems every time something with the hardware on pretty much any of the DC projects goes tits up, it does so big time... :-(

Well, the replica failure is not such a big deal. The main problem was that it coincided with some much larger problem independent of Einstein@Home that bound the manpower that would otherwise be available to fix the E@H problem fast.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245217663

RAC: 12917

RE: Isn't the matchmaker

11 Mar 2013 15:34:26 UTC

Message 115175 in response to message 115170

(moderation:

)

Quote:

Isn't the matchmaker scheduling used in combination with homogeneous redundancy, where the hosts will be compared to each other, to see if they match the required 'level-of-hardware', before work is sent out to them?

No, HR (homogeneous redundancy) even works with the classical "array" scheduler.

The "matchmaker" scheduler generates a "score" for each task in the array for the host in question, in order to pick the best matching task from the array (the scoring function must be supplied by the project). In contrast the "array" scheduler treats all task equally and just picks the first ones satisfying a couple of constraints (like disk space, computing time and possibly platform if HR is used).

Web replica down

Forums › Technical News

Comment viewing options

Forums › Technical News