Grid.org Project News: Weekly Status

News and project updates which haven't been added to distributedcomputing.info yet.

Moderators: Jwb52z, CedricVonck, kpearson, Honza, Lupine1647

Postby Jwb52z » Sat Mar 18, 2006 4:06 am

UD - Robby Brewer
UD Employee

13 Mar 2006 22:01 Post subject: Grid.org status

Dear Members,

Here is the weekly status for Grid.org.

Cancer job - As you probably noticed, we had a small outage last week due to the amount of results being returned for the new cancer jobs. These WUs are running within a few hours each probably due to the (non) complexity of the new protein. This translates to lots of disk space being used up very quickly. Due to this, I will need to roll up the results of these jobs much faster than pervious ones in order to keep below maximum disk space. Here is what I propose:

On Wednesdays I will submit a new cancer job. Once it has been activated, I will run the roll up script which should only affect the previous job. The roll up script will mark the previous job inactive as part of its processing. Once the roll up script has finished, I will reset the job to be active. This will allow outstanding results to be credited.

On Fridays, I will delete the older job that has already had the results rolled up so any outstanding workunits will not be credited. You would have had 2 days to return your results which should be plenty considering the WUs complete within hours.

There may be a small window while the results are being rolled up to where newly returned results will be rejected. This would be due to the job being temporarily marked as inactive. Unfortunately, I see no way around this. The impact should be very minimal. If you are that concerned about a lost workunit, shutdown your agent on wednesdays until I post that the job has been reactivated.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Wed Mar 29, 2006 1:44 pm

UD - Robby Brewer
UD Employee

20 Mar 2006 16:22 Post subject: Grid.org status

Dear Members,

Here is the weekly status of Grid.org.

Cancer data - Things are running relatively smoothly right now. I hope the new cancer job process is working for everyone. As soon as we crunch through all of the new data against the current new protein, Oxford would like us to crunch the new data against the last older protein as well. It will be interesting to see if the number of aborts we are seeing goes down or remains the same.

Please note that for UDMon users the ud_mon.ini file must be updated with the new protein in the Proteins section. If this is not done, you will see many aborts for WU 8581771 in the log, which is a protein not a workunit. Please remind newer members of this if you see them post about the aborts. We do have a random abort problem, but those have nothing to do with WU 8581771. The line should look like this:

8581771=LF:AUR-B

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

27 Mar 2006 18:46 Post subject: Grid.org status

Dear Members,

Here is the weekly status:

Cancer job - There is no new information to post at this time other than to restate that each week the previous cancer job will be deleted on Mondays instead of Wednesdays. This will give members running on slower machines a chance to return their outstanding workunits for credit.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Tue Apr 04, 2006 12:16 am

UD - Robby Brewer
UD Employee

03 Apr 2006 14:53 Post subject: Grid.org status

Dear Members,

Here is the weekly status update for Grid.org.

Cancer data - I have deleted the previous Cancer job so no more credit can be given for outstanding workunits. We continue to process a new job each week, which means things are going pretty well. The Ligandift aborts still continue and probably will through the entire new batch of data that Oxford provided. We will see what the behavior is when we run this data against the previous protein which will be the next task.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Tue Apr 18, 2006 11:51 pm

UD - Robby Brewer
UD Employee

10 Apr 2006 18:22 Post subject: Grid.org status

Dear Members,

All is relatively quiet. I have succeeded in putting some disk space monitors in place to hopefully prevent another disk space issue like we had last week. Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

17 Apr 2006 14:29 Post subject: Grid.org status

Dear Members,

There is no new information to share. We continue to crunch away at the new cancer data. A new job will be submitted Wednesday.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sun Jun 18, 2006 10:52 am

UD - Robby Brewer
UD Employee

05 Jun 2006 18:46 Post subject: Grid.org Status

Dear Members,

Here is the weekly status:

Cancer job - Oxford would like us to finish the current batch of data by the end of June. This means I will have to start submitting multiple jobs and rolling them up a bit sooner than we have been doing. Since these workunits process so quickly even on slow machines, it should not affect anyones ability to return results for credit even if I cut the processing time by a few days. I will make posts as I submit/rollup jobs.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Fri Jun 30, 2006 12:37 pm

UD - Robby Brewer
UD Employee

28 Jun 2006 14:37 Post subject: Grid.org status

Dear Members,

Here is the (late) update for Grid.org. I apologize for my late post, but there have been some support issues that have consumed much of my time recently.

Cancer job: I will be submitting a new job today that will actually consist of workunits that we have already processed. The reason for this is that I must submit at least something before I do my result rollup or else members will get the "cannot connect..." message. I will then rollup the results of the current cancer jobs. This will be the final results for all data that Oxford has provided for us to crunch. They will analyze it and hopefully will not require a respin.

I will then look at submitting this same cancer data set against the previous protein that was used. It may take a couple of days to get the job correct so please be patient.

Stats: The team stats did not update the other day so I had to manually rerun the task yesterday. This would affect the stats task that was supposed to run last night so I am rerunning again currently. Everything should be caught up by tomorrow.

Connectivity: Some members have reported some connectivity problems to the grid.org portal - members services, forums, etc. Other members are able to connect with no problem. We have not been able to identify what may be happening here. I have not been able to reproduce this issue and there have been no changes to the UD hardware/network. We are still looking into this and if anyone has any additional info, please let me know.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sun Jul 09, 2006 12:13 am

UD - Robby Brewer
UD Employee

Post subject: Grid.org / Cancer Job Status

Dear Members,

Here is the weekly status.

Cancer Job - Obviously, the WUs are taking too long to process. I tried to upload a new job as soon as possible after rolling up the last batch because many members were talking about leaving. Therefore I did not test how long the new WUs would take, but just that they would indeed process. These long running WUs have caused a harsher response apparently than not having anything to crunch. I guess I can't win either way.

I have suspended the 600 ligand job so that no more WUs will be sent out. I have uploaded a new job with a total of 100 ligands. I have not tested how long these will now take. If it is still takes too long, we will reduce the number of ligands further. There is no built in timeout in the previous 600 ligand WUs, so if you want to continue crunching on them you can. I will wait a reasonable amount of time before deleting that job so that credit may be given. If you do not want to continue, delete the WU and you will get one of the 100 ligand ones.

Rosetta Job - I think I mentioned this before, but some members have asked me recently. The HPF project that we were working on is apparently done. There are no more new batches of data available to us to work on. I have heard nothing about Grid.org participating in HPF2 if there is such a thing. The WUs that are currently active have all been previously processed so the work is redundant. I am not going to delete it though until I am sure that there will be no more HPF projects for Grid.org to work on.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sat Jul 22, 2006 2:40 am

UD - Robby Brewer
UD Employee

10 Jul 2006 15:20 Post subject: Grid.org Status

Dear Members,

Here is te weekly Grid.org status.

Cancer - After a few false starts, I think we have a good Ligand number at 100. These take much longer than the previous ones, but it seems that even a slow machine should be able to crunch a single WU in one week. Unless I hear otherwise, we will stick with 100 ligand WUs.

It also appears that the aborts have gone away. Since nothing has changed other than the protein we are working on (as was the case when they started happening), I think this proves that there was something in the last protein that the Ligandfit application had an occassional problem with and not an issue with UD hardware. Since Oxford delivered the protein, they would have to be the ones to figure out what is different in the two proteins.

Regarding the 600 ligand job, I will let that one spin for another week or two to give everyone a chance to return their results. Otherwise, we will go back to the one week cycle we were on previously with the 100 ligand WUs.

Rosetta - This project is apparently complete. Since I do not know if we will be doing any other Rosetta projects, I am going to leave this running until I hear something official. This means currently any Rosetta work is for points only. The actual work will be ignored. If I turn the job off, I am sure that many machines will get the flashing explanation point and cause an uproar. Active members who read the boards can make a decision whether they want to set their profile to the cancer job only or not. Maybe some members think the Rosetta screensaver is cool.

Connection issues - We have had random connection issues recently. Restarting the load balancer last week appeared to have fixed the issue, but I noticed a few complaints today. I do not know if this is the same issue or something different. There is a request into IT to check the load balancer again to see if it has some sort of chronic problem. In the meantime, I can only suggest to retry until you get the page you want. There is a 50/50 chance you will get directed correctly if the load balancer is acting up. Luckily these are not agent connection issue, but limited to the member web pages.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sat Jul 22, 2006 2:41 am

UD - Robby Brewer
UD Employee

17 Jul 2006 18:38 Post subject: Grid.org/Cancer Job Status

Dear Members,

Here is the weekly status.

Cancer Job - I received several PMs asking to extend the result credit time on the previous jobs since they were taking so long. Even the 100 Ligand WUs were taking a while for some members. I have submitted a new 50 Ligand WU job to try to get the processing time down. We will see how this works out.

We want all members to be able to participate regardless of the power of their machines. I will leave the current jobs active for at least another week to give time for results to come in. Note that the jobs must be set inactive as always while I roll up the results. Hopefully you will not have the bad luck of trying to return one of these large WU results while the job is inactive.

Please post in a couple days to let me know how the 50 Ligand WUs are working out.

Thank you for your contribution.
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Wed Jan 03, 2007 4:54 pm

UD - Robby Brewer
UD Employee

02 Jan 2007 15:44 Post subject: Grid.org Status

Dear Members,

Here is the weekly status update.

Member Services - The issue we had over the holidays was due to a problem with a RAID controller on the member services server. Maintenance was performed on the RAID controller and the system has been brought back online. Although having no access to member services was an inconvenience, grid processing itself was not affected so there should be no lost work.

Team stats - Due to the hardware failure, team stats are a bit behind. I will begin manual rollup of the stats, but please be aware that this may take until tomorrow to finish. Stats must be rolled up one day at a time and it takes a couple of hours for each day to complete.

I hope everyone had a safe and happy New Year. As always, thank you for your continued contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Previous

Return to New News

Who is online

Users browsing this forum: No registered users and 2 guests

cron