The Joyent Community

A place where the Joyent community can gather, help each other out, and stay informed.

You are not logged in.

#81 2007-12-06 01:06:10

eli
Member
From: Washington, DC
Registered: 2005-11-26
Posts: 461
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

(I hope I'm not pissing anyone off by posting this story; Rackspace and Joyent aren't really competitors since they provide very different services at very different price points.)

I've got a very expensive dedicated box on Rackspace. Rackspace had some pretty nasty downtime recently -- actually several instances of hours of downtime over a period of weeks -- which really isn't supposed to happen when you're paying many hundreds of dollars per month.

However, I was blown away by how well they handled it. I got an email as soon as the problems started and perhaps a dozen updates as they learned more are tried to mitigate damage. After everything was back online the CEO posted a video apology along with a detailed technical explanation with a timeline of what went wrong and detailed steps they're taking to prevent it from happening again. (the slightly more upbeat public version is here: http://www.rackspace.com/information/an … center.php). It actually made me feel better about hosting with them going forward. There's something to learn here.

Offline

 

#82 2007-12-06 02:12:09

kristiewells
Mama Bear
From: San Francisco, CA
Registered: 2007-04-07
Posts: 540
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Hi gang: it doesn't matter if you host a personal site or a business site with us. The issues dogging some of the Accelerators needs to be remedied. Will be remedied one way or another. My hope is we can find a short term resolution while we continue to dig into this beast. The Systems team is focused on it and I will do my best to get another update from Jason or Dave on Thursday.


If you're a mama bear, everyone knows you mean business. You swat anyone who bothers your cubs (Joyeurs). If your cubs (Joyeurs) get out of line, you swat them too.  If you're a bear, your mate EXPECTS you to wake up growling. He EXPECTS that you will have hairy legs and excess body fat.  Yup, I'm a bear!

Offline

 

#83 2007-12-06 02:28:00

someguy
Moderator
Registered: 2005-09-13
Posts: 577
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

kristiewells wrote:

Hi gang: it doesn't matter if you host a personal site or a business site with us. The issues dogging some of the Accelerators needs to be remedied. Will be remedied one way or another. My hope is we can find a short term resolution while we continue to dig into this beast. The Systems team is focused on it and I will do my best to get another update from Jason or Dave on Thursday.


This seems like a good time for me to repose the question that I posted in the second post in this thread (and was then apparently ignored by Joyent staff):

someguy wrote:

By "older" do you mean shared accelerators that were some of the first to be deployed? If so, does it make sense to update the shared accelerator configuration or just redeploy those on the "older" setups? Do you have a sense for when these issues will be resolved? (N hours, N days, N weeks, or N months, where N<10)

I'm sure the Joyent staff has been putting in extra hours and effort to try to get to the bottom of this. THANKS!

Offline

 

#84 2007-12-06 08:19:40

ubernostrum
My internets, let me show you them
From: Lawrence, KS
Registered: 2005-02-23
Posts: 2174
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

FWIW I was having fairly regular issues with this (incredibly long response times via web and SSH), but within the last week or so that's gone away.

/me knocks on wood.


When they lay you on the table, better keep your business clean.

Offline

 

#85 2007-12-06 18:12:10

mrdale
Member
Registered: 2005-03-02
Posts: 209
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

FYI: Bolinas is mired... Nelson is beginning to look pretty darn good. LOL


★ nelson (fare-thee-well, old unreliable) ★ johnson ★ bolinas ★ Sx2 ★

Offline

 

#86 2007-12-06 19:16:17

atomgiant
Member
From: Cary NC
Registered: 2005-07-12
Posts: 57
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

I am also noticing the performance issues on Bolinas seem to be getting much worse. The monpage shows response times are increasing substantially in the last couple of hours (http://joyent.monpage.com/index.php?n=b … &z=out).

Could we have an update on the status of this issue? Any visibility on the steps you are taking to resolve this issue would be greatly appreciated.

Thanks,
Tom

Offline

 

#87 2007-12-06 19:45:32

Richard
Member
Registered: 2006-02-11
Posts: 102
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

I've been getting these a lot on Jones over the last week.

Offline

 

#88 2007-12-06 23:16:13

ronp001
#@$%^@?!
Registered: 2005-06-19
Posts: 40
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

bridgeway has collapsed again ("500 Internal Server Error")

"time ls": 9 to 28 seconds.

Last edited by ronp001 (2007-12-06 23:20:28)

Offline

 

#89 2007-12-07 01:13:04

jason
a chief (i started this place)
From: San Francisco
Registered: 2004-06-01
Posts: 8814
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

We have the remaining storage in place to complete resolving the issues for the remaining 5 cohorts, and we have begun a "snapmirroring" process to migrate the storage LUNs and bring the mirroring window down to 10 minutes. The process then is that all services will be brought down on a shared accelerator, a final sync will be performed, the zpools will be exported and then re-imported from the new storage. We're capping the total downtime to a 1 hour window on each shared host.

The final cutovers will be staged per day and are scheduled:

Wednesday, December 12

Cohort 1
- bridgeway
- johnson
- humboldt
- bonita
- litho
- cooper

Thursday, December 13

Cohort 2
- magnolia
- myrtle
- tamalpais
- tunstead
- crescent
- mariposa

Friday, December 14

Cohort 3
- belle
- jones
- bolinas
- kemp
- madrone
- karl

Saturday, December 15

Cohort 4
- berlin
- caledonia
- turney
- napa
- spinnaker
- prospect

Sunday, December 16

Cohort 5
- girard
- harrison
- excelsior
- miller
- glen
- spencer

The schedules are taking into account that things may not go perfectly, if they do go perfectly we'll notify you of an increase in the schedule and adjust the days.

Thank you again for your patience in this.

Offline

 

#90 2007-12-07 01:23:51

ngungo
a monpageur
Registered: 2004-06-01
Posts: 3465
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Dang! This is the Mother of All Updates.

Offline

 

#91 2007-12-07 02:13:06

Richard
Member
Registered: 2006-02-11
Posts: 102
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Cool - thanks for the update.

Offline

 

#92 2007-12-07 02:23:07

fitzage
Chief Digression Officer
Registered: 2006-02-14
Posts: 4380
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Rock. On.

Best. Forum. Post. Evar.

And all my servers are on the first 2 days. :-)


Don't ask me. I only work here.

Offline

 

#93 2007-12-07 03:24:59

atomgiant
Member
From: Cary NC
Registered: 2005-07-12
Posts: 57
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Thanks Jason. I appreciate the update. Good luck and I hope all goes well.

Tom

Offline

 

#94 2007-12-07 04:02:12

gtcaz
Raconteur
From: Tucson, AZ
Registered: 2005-01-21
Posts: 1605
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Thanks Chief. (And congrats!)


You're gonna have to answer to the Coca-Cola company.

Offline

 

#95 2007-12-07 07:27:20

madams
 
From: Edinburgh
Registered: 2005-05-11
Posts: 2067
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

ngungo wrote:

Dang! This is the Mother of All Updates.


MoAU WTF!


Mark
Live in the city, work in the country. | OpenSolaris Immigrant

Offline

 

#96 2007-12-07 09:22:31

ronp001
#@$%^@?!
Registered: 2005-06-19
Posts: 40
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Thanks for the update, Jason. Hope things go smoothly.

Offline

 

#97 2007-12-07 11:47:03

ichigo
panem et circenses 2.0
From: Vienna, Austria, Europe
Registered: 2005-02-25
Posts: 591
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

thanks for the update.

Offline

 

#98 2007-12-07 16:17:57

mrdale
Member
Registered: 2005-03-02
Posts: 209
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

This is much appreciated Jason.

I like the way Joyent's tech issues and their solutions are transparent.

-Dale


★ nelson (fare-thee-well, old unreliable) ★ johnson ★ bolinas ★ Sx2 ★

Offline

 

#99 2007-12-10 19:46:29

alexxale
New member
Registered: 2007-11-29
Posts: 4
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

No comments for a few days.
Is this issue 100% fixed already?

I still experience slowdowns.

Offline

 

#100 2007-12-10 20:02:53

jacques
The Local Coffee Aficionado
From: Cape Town, South Africa
Registered: 2005-06-28
Posts: 1677
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Awaiting resolution by Jason's schedule.

Offline

 

#101 2007-12-10 20:03:50

fitzage
Chief Digression Officer
Registered: 2006-02-14
Posts: 4380
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

alexxale wrote:

No comments for a few days.
Is this issue 100% fixed already?

I still experience slowdowns.


Look a little bit up the thread at the dates listed by the admins. Hasn't truly started yet.


Don't ask me. I only work here.

Offline

 

#102 2007-12-10 20:04:29

tweir
Accelerati
From: Vancouver, Canada
Registered: 2005-05-30
Posts: 338
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

alexxale wrote:

No comments for a few days.
Is this issue 100% fixed already?

I still experience slowdowns.


Did you actually read Jason's update above?

jason wrote:

The final cutovers will be staged per day and are scheduled:

Wednesday, December 12
...


Given that today's the 10th, it's not unexpected that things are still slow. My expectation would be that the staff update us again if the schedule is delayed, and once all the cutovers are complete.

Tom


Tom
VC3 / MxG / AC200 - litho

Offline

 

#103 2007-12-11 01:15:50

kristiewells
Mama Bear
From: San Francisco, CA
Registered: 2007-04-07
Posts: 540
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

We are still on track to start the move this Wednesday based on Jason's schedule. The systems team is doing minor tweaks each day trying to reduce some of the load, but Wednesday is the big day.


If you're a mama bear, everyone knows you mean business. You swat anyone who bothers your cubs (Joyeurs). If your cubs (Joyeurs) get out of line, you swat them too.  If you're a bear, your mate EXPECTS you to wake up growling. He EXPECTS that you will have hairy legs and excess body fat.  Yup, I'm a bear!

Offline

 

#104 2007-12-12 16:45:17

bretthoerner
...
Registered: 2006-12-24
Posts: 843
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Just curious, will the uptime reset or something? I want something to check back on to know when it's done because I need to do some pro-active work to basically turn my site back on.

Offline

 

#105 2007-12-12 21:19:52

timjcoulter
Lifer/Litho
From: Portland, Oregon
Registered: 2006-01-21
Posts: 223
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Yeah, I'd love to know when my downtime is going to be so we can plan accordingly. Failing that, it'd be cool to know exactly when it's done, so I can assure my users that our troubles are mostly over.

Obviously, if neither of those is possible, I'll get over it, it'd just be nice to have some definitive info.

My, we are needy little bastards, aren't we? :)

Offline

 

#106 2007-12-12 21:28:48

kristiewells
Mama Bear
From: San Francisco, CA
Registered: 2007-04-07
Posts: 540
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

I am in the process of sending an email to everyone listed on CoHort 1 that will be moved today. The plan is to start this around 11:59pm PST, I am just awaiting confirmation from Jason. Expected downtime is up to two hours. Hoping it is a lot less than this, but plan for an hour.

Emails are also being sent to those on the other affected Accelerators.

EDIT: Got the actual start time from Jason/Ben, so we are dialed in.


If you're a mama bear, everyone knows you mean business. You swat anyone who bothers your cubs (Joyeurs). If your cubs (Joyeurs) get out of line, you swat them too.  If you're a bear, your mate EXPECTS you to wake up growling. He EXPECTS that you will have hairy legs and excess body fat.  Yup, I'm a bear!

Offline

 

#107 2007-12-12 21:33:38

ngungo
a monpageur
Registered: 2004-06-01
Posts: 3465
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Cool!

Offline

 

#108 2007-12-12 22:17:53

madams
 
From: Edinburgh
Registered: 2005-05-11
Posts: 2067
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

kristiewells wrote:

11:59pm PST


That's 7:59am GMT the following day. Adjust your watch accordingly.


Mark
Live in the city, work in the country. | OpenSolaris Immigrant

Offline

 

#109 2007-12-13 08:29:16

jason
a chief (i started this place)
From: San Francisco
Registered: 2004-06-01
Posts: 8814
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

update on 1 here and here

Offline

 

#110 2007-12-13 08:37:14

lee
Member
From: France
Registered: 2004-06-21
Posts: 477
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Good work, looking very quick - thanks.

Offline

 

#111 2007-12-13 09:23:52

jason
a chief (i started this place)
From: San Francisco
Registered: 2004-06-01
Posts: 8814
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Offline

 

#112 2007-12-13 10:31:26

andrewbarnett
Discussion Artist
From: Melbourne, Australia
Registered: 2005-09-12
Posts: 2064
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Lovely jubbly :-) Many thanks.


VC3 - MGX - Acceleratus - Litho - SSSTMWCNGQBRKAS

Offline

 

#113 2007-12-13 12:55:17

ronp001
#@$%^@?!
Registered: 2005-06-19
Posts: 40
Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

Results of the "perpetual ls test" -- bridgeway vs. cardero in the last 4 1/2 hours:

Code:

loaded 19058 lines for bridgeway
Results for bridgeway

Total measurements: 3175 Measurements taken from Thursday, 13 December 2007 -- 8:22:12 GMT to Thursday, 13 December 2007 -- 12:48:16 GMT less than 5ms: -- 85.102% (2702) 5ms - 10ms: -- 10.677% (339) 10ms - 50ms: -- 2.866% (91) 50ms - 100ms: -- 0.409% (13) 100ms - 500ms: -- 0.945% (30) 500ms - 1sec: -- 0.000% (0) 1sec - 5sec: -- 0.000% (0) 5sec - 10sec: -- 0.000% (0) 10sec - 20sec: -- 0.000% (0) 20sec - 30sec: -- 0.000% (0) 30sec - 60sec: -- 0.000% (0) 1 - 2 minutes: -- 0.000% (0) over 2 minutes: -- 0.000% (0)

loaded 11805 lines for cardero
Results for cardero

Total measurements: 1966 Measurements taken from Thu, 13 Dec 2007 -- 08:42:10 GMT to Thu, 13 Dec 2007 -- 12:44:13 GMT less than 5ms: -- 48.627% (956) 5ms - 10ms: -- 11.953% (235) 10ms - 50ms: -- 20.956% (412) 50ms - 100ms: -- 5.341% (105) 100ms - 500ms: -- 7.782% (153) 500ms - 1sec: -- 2.035% (40) 1sec - 5sec: -- 1.679% (33) 5sec - 10sec: -- 1.017% (20) 10sec - 20sec: -- 0.560% (11) 20sec - 30sec: -- 0.051% (1) 30sec - 60sec: -- 0.000% (0) 1 - 2 minutes: -- 0.000% (0) over 2 minutes: -- 0.000% (0)


Bridgeway is showing major improvement. Nice work folks!

Offline

 

#114 2007-12-13 14:08:08

ngungo
a monpageur
Registered: 2004-06-01
Posts: 3465
Website  Expertise

Re: Performance issues for some (shared) accelerators and aggregate I/O

It's so cool. Thank you very much.

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson