Grid.org Project News: Weekly Status

News and project updates which haven't been added to distributedcomputing.info yet.

Moderators: Jwb52z, CedricVonck, kpearson, Honza, Lupine1647

Grid.org Project News: Weekly Status

Postby Jwb52z » Wed Oct 26, 2005 1:10 am

UD - Robby Brewer
UD Employee

Dear members,

October 24th

Here is the weekly status...

New cancer data: We have received the respun data from Oxford and will be working to see if we get better results now. It will take at least a week to get the data converted, loaded, tested, etc. even if we have no issues, so do not expect this to be available this week.

Grid.org upgrade: We are in the process of acceptance testing for the Member Web portion of the new code drop. Hopefully, that will GA this week, and we can start our initial end to end beta testing on the new hardware starting next week.

Rosetta: We finished another Roseetta job over the weekend which may have caused some agents to go into a backoff state. A new job has been uploaded and enabled so workunits should be available again.

Support via web: I saw at least one post on this over the weekend. As I mentioned in a previous post, support via the Help web page on Grid.org is broken. If you submit something via the Help web page, you will not get a response back. This should not be a huge issue since support is supposed to go through Member To Member Support on the forums anyway. If you require something specifically from UD support (such as a custom bulk installer) please PM me.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Wed Nov 02, 2005 2:16 am

UD - Robby Brewer
UD Employee

31 Oct 2005 16:47 Post subject: Grid.org status

Dear members,

Here is the weekly status...

New cancer data: The respun data is being evaluated currently. We hope to have a better status on this later in the week. I have received a few PMs expressing concern about the Rosetta data being completed and having no new data to work on. We are doing everything we can to ensure that the new Cancer data is available prior to the Rosetta data being exhausted.

Grid.org upgrade: The new Member Web product is still in acceptance testing. There were a couple of issues found that needed to be addressed so we did not get as far as I would have liked. Finding bugs is progress though. We will keep testing until everything works correctly and we have a solid system.

Rosetta: These jobs are being blown through with great results. There are a few more weeks of data left, so enjoy it before it goes away.

Support via web: Since there was a complaint about support in a thread and some PMs over the weekend, I will mention this again. Support via the Help web page on Grid.org is broken. If you submit something via the Help web page, you will not get a response back. This should not be a huge issue since support is supposed to go through Member To Member Support on the forums anyway. If you require something specifically from UD support (such as a custom bulk installer) please PM me.

Geographical stats: There are apparently a few country flags that are not correct. This would have been the case for the last few years since the software has not been updated since then. I will try to dig into this and see what needs to be done to remedy the problem.

Team stats: There was some sort of an issue on October 24 that caused missing team stats for that day. I have our DBA looking into this issue and as soon as we determine why the problem happened, we will correct the data. As some of you might remember, our stats rollup job did not run on time that day. This is an automatic function so something must have broke in the system.

Stats mismatch: I will re-mention this since it has come up a few times recently. The stats shown on the stats web page and the stats shown in the agent GUI (blue pill) are taken from different places in the database and are programmatically unrelated. The stats on the Grid.org web page are tallied every day and are the "correct" ones. The stats on the agent GUI are a running total that is never recalculated. It is possible for this number to be wrong. When in doubt, go by the number on the grid.org stats web page.

Easy News: There is apparently an issue with the Easy News team and redeemed points. Other than the 24th of October stats issue that is still under investigation, all grid.org stats are up to date and assumed correct. The point redemption is not part of United Devices software and I cannot comment on that. If someone can explain to me that this is happening because of United Devices software, I will try to investigate this more.

Finally, I would like to mention that there were some posts and PMs that could be considered flames regarding United Devices communication on this forum. I agreed to post a weekly status of any outstanding issues as part of communication. I also answer every PM sent to me, and I respond to posts in the forum when I can. When possible, please remind other members in the forum to check Member News frequently as that is where the majority of communication comes from.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sat Nov 12, 2005 9:53 pm

UD - Robby Brewer
UD Employee

07 Nov 2005 22:53 Post subject: Grid.org status

Dear members,

Here is the (late) weekly status...

New cancer data: Some progress has been made. The workunit processes without error and does produce limited output. We have an engineer working with this to make sure we are getting the results we expect.

Grid.org upgrade: The new Member Web product is still in acceptance testing.

Rosetta: For some reason, this lastest batch of Rosetta data is proving to be quite intense. Workunits are taking many days to process. In order for members to get fair credit for their work, I will extend the cutoff day for results at least a week and more if it appears that we need it. This gives members who have less powerful machines to still participate in the Rosetta project.

Geographical stats: These are incorrect in the new beta version of Member Web as well, so I have opened a request to have this fixed during the upgrade.

Team stats: There was some sort of an issue on October 24 that caused missing team stats for that day. Our DBA has told me this is because the stats task did not run on time October 24. The way the task is designed, if a day is missed for some reason, data for that day will be missing until the task is manually rerun. Everything should be caught up now.

Thanks to everyone for your contribution.
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Tue Nov 15, 2005 3:26 am

UD - Robby Brewer
UD Employee

14 Nov 2005 16:56 Post subject: Grid.org status

Dear members,

Here is the weekly status...

Grid.org upgrade: Grid.org will be physically moving to a new facility in the next week. This will provide better reliability and is part of the Grid.org migration. There was a small outage this weekend due to some hardware maintenance that was performed in preparation for the facility move. Please note that next weekend there will be some additional outages. Grid.org must be completely powered down in order to move. This will include the forum, member web, and the grid itself. Since this is not a trivial outage, we cannot give an exact time that Grid.org will be down. The outage will take as long as it takes.

Rosetta: We have received a new batch of Rosetta data that we are in the process of uploading currently. Since the last batch is taking so long to process, we will continue to allow results to be uploaded for a while. Hopefully this new batch will be more in line with the previous batches.

New cancer data: Some progress has been made. The workunit processes without error and does produce limited output. We have an engineer working with this to make sure we are getting the results we expect.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Thu Dec 01, 2005 1:22 pm

UD - Robby Brewer
UD Employee

28 Nov 2005 21:39 Post subject: Grid.org status

Dear members,

Here is the weekly status...

Grid.org upgrade: The physical migration of Grid.org to the new facility has been completed. There were a few hiccups over the past few days, but everything should be running fine now. For those who care, the "cannot connect" errors that some members were experiencing were due to a problem with one of our border routers that has since been resolved. That is why some members were having no issues while others could not connect at all.

There was a comment about the new hardware on one of the member threads that was not correct. Grid.org was physically moved, but remains on the old hardware. We are not ready to move to the new hardware yet as we have not finished our acceptance testing of the new software release. Also note that the applications running on Grid.org will need to be ported to run on the new version of UD software before we can move to the new hardware as well. We had some maintenance done on some of the old hardware to reduce the possibility of another failure until we can move to the new hardware.

Rosetta: Rosetta workunits stopped dispatching for a while earlier today. This problem has been resolved and I verified that I am now able to grab a Rosetta workunit.

Stats: There appears to be a small problem with stats missing for yesterday. I will get the stats job resubmitted shortly so that should be cleared up by tomorrow.

New cancer data: I do not have any new information on this. With the holidays and the move, there has not been any additional work done on this since last week.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Thu Dec 08, 2005 1:47 am

UD - Robby Brewer
UD Employee

05 Dec 2005 16:19 Post subject: Grid.org status

Dear members,

Here is the weekly status...

Grid.org upgrade: Now that the move is complete, we continue to work on the new Member Web release. There have been many issues identified that are being fixed. Since the final release has not been delivered yet, that is holding up acceptance testing on the new hardware. It is a good thing that QA is being very thorough with this release and we expect it to be very solid.

Rosetta: There was an issue with the Rosetta job over the weekend. For some reason it quit dispatching. I had our DBA reset the job a couple of times, but it did not help. To resolve this issue, we had to reload the job. Everyone should be able to grab a Rosetta workunit now. I verified that I was able to grab one this morning.

Stats: Stats were rerun for the missing day and everything should be caught up now.

New cancer data: I do not have any new information on this. We are also looking for additional research to be run on Grid.org since Rosetta will be completed soon and the cancer issue is still outstanding.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Tue Dec 20, 2005 11:57 pm

UD - Robby Brewer
UD Employee

13 Dec 2005 15:22 Post subject: Grid.org status

Dear members,

Here is the weekly status...

Grid.org temporary outage: We experienced a temporary outage last night on grid.org. This was the result of a change that was made to our web servers for our corporate web site which is being updated and was not anticipated. There was not an issue with any of the grid.org software or hardware. This was strictly a web server issue. There should be no reason that devices need to be reregistered.

Grid.org upgrade: Now that the move is complete, we continue to work on the new Member Web release. There have been many issues identified that are being fixed. Since the final release has not been delivered yet, that is holding up acceptance testing on the new hardware. It is a good thing that QA is being very thorough with this release and we expect it to be very solid.

Rosetta: A few members have had problems returning results for some of the huge Rosetta workunits. This is unfortunately due to the length of time it took to process the last batch of Rosetta coupled with the glitch we had last week that caused us to have to resubmit the Rosetta job. Hopefully this only affected a very few.

Thanks to everyone for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Wed Dec 21, 2005 12:02 am

UD - Robby Brewer
UD Employee

19 Dec 2005 15:24 Post subject: Grid.org Status

Dear Members,

There is nothing interesting to report this week. With the holidays approaching all project status remains the same and will through the end of the year. Note that United Devices will be closed 12/23/05 - 01/02/06.

Please have a safe and happy holiday and we will see you back here after the new year.
_________________
Robby Brewer
Senior Support Engineer
United
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Mon Jan 09, 2006 2:01 am

UD - Robby Brewer
UD Employee

03 Jan 2006 15:32 Post subject: Grid.org status

Dear Members,

I hope everyone had a happy new year. Here is the latest status:

Grid.org Outage - As I posted previously, we had a temporary outage due to a blown City of Austin transformer (a rogue squirrel is suspected). This outage should not have required any reregistrations. There was a temporary period where devices may have been "Backing off..." due to a high load after the servers became available, but this should no longer be occurring.

New Cancer Data - A small batch of the new cancer data has been uploaded and members are currently crunching away at it. This is a very small subset of the data we received, ~20 WUs, just so we can verify that we are getting the results we expect. I plan on sending some of the results to our Oxford contact today to verify these are the results they expect. Once that is confirmed, I will upload more data. There have been a few members complaining of occassional lost results. I am not sure how wide spread this is yet nor the cause of the loss. Other members have reported complete success so we will need to investigate this more closely. Note that the old cancer job has been disabled so if you are working on the cancer project, you are crunching the new data.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Thu Jan 12, 2006 5:47 pm

UD - Robby Brewer
UD Employee

10 Jan 2006 20:49 Post subject: Grid.org status

Dear Members,

Sorry for the late status, but I wanted to finish a test before posting:

New Cancer Data - I have been running some tests on an internal system and believe that I have some good news. It appears from my testing that everything is working as expected. There was a false alarm last week when I reported that we were ready to send results to our Oxford contact. Upon further investigation it was seen that there were some missing output files. This problem has been resolved internally so I am ready to grab the results from the new test job that has been running on Grid.org to see if the results are the same.

I will need to run the result aggregation script which is always run after a job is complete. An effect of running the script is that the workunits will all be marked complete and dispatching will stop. This may result in a lost workunit result or two. I will re-enable the job as soon as the result script completes so there will be minimum outage.

Lost Workunits - There have been some complaints of lost workunits with the new data. Since we have many results for each workunit, this is not a problem with the new cancer data per se. We will keep investigating this issue until we understand what is happening.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

10 Jan 2006 22:35 Post subject:

I have sent the results from our small test job to our Oxford contact. We will now have to wait to see what they say. As soon as I get confirmation, I can upload all of the new data. Note that we know that there is a problem with the format of the data. I had to manually convert the data in order to run our test. I am asking our contact to provide us with the correct format so that we know the input data is valid from their point of view.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sun Jan 22, 2006 9:42 pm

UD - Robby Brewer
UD Employee

16 Jan 2006 22:30 Post subject: Grid.org status

Dear Members,

Not a lot to mention this week:

New Cancer Data - The results from the current job have been sent to our Oxford contact. We must now wait for the results to be verified. Note that there is already one issue. Oxford is requesting an additional tag Molecule_ID be added to the input data. Hopefully this is something they will be able to deliver. As soon as I hear back, I will post.

Lost Workunits - One of our 3 servers that handle dispatch and result retrieval was not working properly. I do not see how that would cause workunits to be lost, but I guess it is possible. That server has been fixed and is working properly now. Let's keep an eye on the workunits and see if this has a positive affect on those experiencing the problem.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Tue Feb 07, 2006 10:11 am

UD - Robby Brewer
UD Employee

26 Jan 2006 17:38 Post subject: Grid.org Status

Dear Members,

I apologize for not posting a status on Monday, but we are having a company conference this week which is consuming most of my time. The status of the new cancer data is the topic everyone is interested in so here is a very quick status.

I have a conference call with our Oxford contact tomorrow morning to discuss the new data and the results I sent a couple of weeks ago. After the call, I hope to have much more information to share. I know there is frustration about not having all of the new data loaded for members to work on. Since Oxford is the one that will ultimately be using the results, I have had to wait for them to respond as to the validity of the results. That is out of my control.

Regardless of what Oxford has to say, I will load some more of the new data next week. If the data must be recrunched later, so be it. I was trying to minimize the amount of data that must be reworked to cut down on the complaints about wasted time later. Since there are already complaints about time wasted crunching the same workunits, I guess it does not matter.

Please understand that having the data validated by Oxford is the bottleneck here and that United Devices is ready to upload all the new data as soon as the results are confirmed.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

30 Jan 2006 17:09 Post subject: Grid.org Status

Dear Members,

Here is the weekly status:

New Cancer Data - I had a conference call with our Oxford contact on Friday. There is an additional field they would like in the results so they have sent a sample data file containing this information. I will be uploading this later today or tomorrow. Hopefully this will produce the results they are expecting and we will be able to make all of the new data available for members later this week or next.

Rosetta - The current batch has been processed and we will be making a new batch available later this week. As always, there will be a two week period for any outstanding workunits to be credited.

Team Stats - For some reason these are missing for the 26th. The stats job will be rerun shortly to pick up this day.

Thanks to everyone for their contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

02 Feb 2006 21:45 Post subject: New Cancer Data

Dear Members,

I have just uploaded a portion of the latest data I received from Oxford. My internal tests looked good as far as the workunits being able to be processed. We will let this job run for a while before shipping the results back to Oxford for verification. Note that the previous cancer job is still running as well, so it is up to the dispatcher as to whether you get one of the new workunits or not. As soon as I see a few successful results, I will stop the previous job from dispatching so everyone will be crunching the new data.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

03 Feb 2006 16:19 Post subject: New Cancer Results

Dear Members,

I have received confirmation from our Oxford contact that the sample result data from the latest cancer job looks good. This means that we are ready to start crunching all of the new data. The new data will be uploaded in pieces (jobs) just like we have been doing with Rosetta. When a job has completed, I will allow at least a week for any outstanding results to be uploaded by members.

This is great news for all of us. Thank you for your continued patience and contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices

UD - Robby Brewer
UD Employee

06 Feb 2006 21:26 Post subject: Grid.org Status

Dear Members,

Here is the weekly status:

Cancer data - As I mentioned previously, Oxford has verified our previous results and has released all of the new data. There is plenty to keep us busy for a while. They also wish to run all of the new data against the previous protein when we are done with the current protein.

Some members are experiencing aborted WUs with the new data. We are currently investigating this issue. The problem is a bit elusive since not all members are experiencing it and some members are experiencing it much more frequently than others. I have verified that there are results for each and every WU and that the total is approximately the same for all. I have also verified that there are approximately the same (small) number of errors returned for each WU.

This tells me that there is not a problem with any particular WU, but some other issue. If there was a bad WU, we would see either a significantly lower number of results or a significantly greater number of errors. This is not the case. We will keep looking into this issue until we find a solution.

These WUs appear to be processing much faster than the previous batches. I am not sure of the reason for this and will rely on Oxford to tell us if something is not right. Note that there is a bunch of data to be processed. I am not quick to modify the job configuration (adding more Ligands) since the next ones I upload may take longer.

It has been noticed that the number of hits is high for some of the WUs. Again I do not know the reason for this and will have to rely on Oxford to tell us if something is wrong.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Thu Feb 23, 2006 5:41 pm

UD - Robby Brewer
UD Employee

20 Feb 2006 16:30 Post subject: Grid.org Status

Dear Members,

Here is the weekly status:

Cancer Data - We continue to have an issue with occassional aborts that is still under investigation. As I mentioned in the Member to Member Support thread, this issue has been identified as a problem with the Ligandfit application itself crashing. This has nothing to do with the UD servers dropping results or not giving due credit. The Ligandfit application has not changed in a very long time, prior to the last batch of cancer data that received no complaints. This would point to something with the new data that is causing Ligandfit to be unstable.

Note that we are receiving successful results from each and every workunit so there are not "bad workunits" that will always fail. Some members have stated that retrying a workunit that just aborted will result in a successful completion.

Although a bit frustrating because of lost points, know that we are getting successful results that can be returned to Oxford to assist in the search for the cure for cancer which is the ultimate objective of this project.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sun Mar 05, 2006 11:43 pm

UD - Robby Brewer
UD Employee

27 Feb 2006 20:30 Post subject: Grid.org Status

Dear Members,

Here is the weekly status of Grid.org.

Outage: There was a temporary outage over the weekend that caused connectivity problems with Grid.org servers. This was caused by a router failure in our datacenter and was unanticipated. The problem has since been rectified and everything should be functioning normally.

Cancer data: I have uploaded a new chunk of data and have sent the last result set to Oxford for analysis. Some members have noticed that this batch produces an awful lot of hits. I am asking if this is to be expected and if this could in any way be related to the occassional workunit aborts that some members are seeing.

Thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Postby Jwb52z » Sat Mar 11, 2006 12:58 am

UD - Robby Brewer
UD Employee

06 Mar 2006 16:42 Post subject: Grid.org Status

Dear Members,

Here is the weekly Grid.org status. Things are relatively quiet although we still have a few outstanding issues.

Cancer data - I have not heard back from our Oxford contact regarding the last batch of results I sent. This was an attempt to have them validate the many hits we are seeing on some of these workunits. It seems suspicious to me, but I am not well versed in computational chemistry and cannot speak to this. It will be up to Oxford, who provided the data, to determine if the results are valid for their research. Until then we will just keep crunching away at the new data.

There is still the issue of Ligandfit crashing. Although I do not know why, it appears that there is something about the new data that causes Ligandfit to get in a bad state. I have also mentioned this to our Oxford contact and hope that he may be able to shed some light on this issue. Perhaps the many hits or complexity of the new data is the contributing factor.

As always, thank you for your contribution.
_________________
Robby Brewer
Senior Support Engineer
United Devices
Jwb52z
 
Posts: 997
Joined: Tue Aug 30, 2005 10:56 pm

Next

Return to New News

Who is online

Users browsing this forum: No registered users and 1 guest

cron