SegwayChat
Home . Old Gallery

Go Back   SegwayChat > Segway Forums > Community Feedback and Support

Notices

Community Feedback and Support For community-related suggestions, feedback, technical questions, problems and support.

Reply
 
Thread Tools Display Modes
Old 11-13-2009, 12:17 AM   #1
JohnG
Uber Administrator
Wise Segway Elder
JohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to behold
 
JohnG's Avatar
 
Join Date: Sep 2002
Location: Greater Boston
Posts: 6,996
5 yr Member HT/PT Owner SegwayFest Attendee
Exclamation Status Update on the Community

Our apologies for the unexpected downtime today. What follows is a technical service update for those of you who are interested.

At approximately 3:00pm ET, we were notified by a log service on one of the database servers that one of the drives was experiencing sudden, unexpected failures. In an attempt to recover the drive, it failed at 4:30pm ET.

It was not a catastrophic failure, however, so we were able to make an additional backup of the most recent data from all forums (otherwise the data would've come from a 4-day old backup). This took about an hour.

We then shutdown the server and replaced the faulty drive. A full OS reinstall took about 2 hours, and the restore of all backup data and cleaning of the database tables took about another hour and a half.

As you know, we take our uptime here very seriously and worked immediately on resolving this issue as soon as the drive started failing. Given the way it failed, we were fortunate to not suffer from any data loss.

We are back up and running 100%. Please let me know if you notice any problems or unexpected errors on the forums. Thank you.

Again, we apologize for the unexpected downtime and appreciated your patience.

Best,
JohnG
__________________
--
An original Segway employee, 2001-2005
JohnG is offline   Reply With Quote
Old 11-13-2009, 08:53 AM   #2
gbrandwood
Advanced Member
gbrandwood is a jewel in the roughgbrandwood is a jewel in the roughgbrandwood is a jewel in the roughgbrandwood is a jewel in the rough
 
gbrandwood's Avatar
 
Join Date: Nov 2004
Location: North west England, UK.
Posts: 3,043
5 yr Member HT/PT Owner
Thumbs up

I didn't notice the downtime. The fact that you responded so promptly is good news for all of us and I really appreciate you keeping my favourite chat site on-line. Thanks! And thanks for letting us all know the details.

But, since the rebuild, I note that my reputation has dropped down to only three bars. I'm sure I had at least a dozen prior to the failure....

Well done to all involved.
__________________
Gareth Brandwood
The comments posted are made by the fat figners of the individual and do not necessarily represent the views of the brain.
gbrandwood is offline   Reply With Quote
Old 11-13-2009, 10:29 AM   #3
Joushou
Member
Joushou will become famous soon enoughJoushou will become famous soon enough
 
Join Date: Jun 2009
Location: Charlottenlund, Denmark
Posts: 749
5 yr Member HT/PT Owner
Default

Quote:
Originally Posted by gbrandwood View Post
I didn't notice the downtime. The fact that you responded so promptly is good news for all of us and I really appreciate you keeping my favourite chat site on-line. Thanks! And thanks for letting us all know the details.

But, since the rebuild, I note that my reputation has dropped down to only three bars. I'm sure I had at least a dozen prior to the failure....

Well done to all involved.
Good, so i'm not the only one suddenly, ahem, "losing" points!
__________________
The wise speak when they have something to say.
Fools speak when they have to say something.

Reality has been scientifically proven impossible.
Joushou is offline   Reply With Quote
Old 11-13-2009, 11:12 PM   #4
emrnyc
Member
emrnyc is on a distinguished road
 
emrnyc's Avatar
 
Join Date: Jun 2008
Location: NYC
Posts: 426
5 yr Member HT/PT Owner
Default Good work....

Good Work.... and Thank You

Quote:
Originally Posted by JohnG View Post
Our apologies for the unexpected downtime today. What follows is a technical service update for those of you who are interested.

At approximately 3:00pm ET, we were notified by a log service on one of the database servers that one of the drives was experiencing sudden, unexpected failures. In an attempt to recover the drive, it failed at 4:30pm ET.

It was not a catastrophic failure, however, so we were able to make an additional backup of the most recent data from all forums (otherwise the data would've come from a 4-day old backup). This took about an hour.

We then shutdown the server and replaced the faulty drive. A full OS reinstall took about 2 hours, and the restore of all backup data and cleaning of the database tables took about another hour and a half.

As you know, we take our uptime here very seriously and worked immediately on resolving this issue as soon as the drive started failing. Given the way it failed, we were fortunate to not suffer from any data loss.

We are back up and running 100%. Please let me know if you notice any problems or unexpected errors on the forums. Thank you.

Again, we apologize for the unexpected downtime and appreciated your patience.

Best,
JohnG
__________________

To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.

NYC's Only Segway Trained Level 1 & Level 2 Service Tech


To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.

8 EMS - 37 PAPD - 23 NYPD - 343 FDNY
Never forget our Brother & Sister Heroes 9/11/01.
It was not how they died that made them heroes.....
It was how they lived!

Check out Emergency Medical Rescue of New York City's website
To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.
emrnyc is offline   Reply With Quote
Old 11-14-2009, 11:23 PM   #5
SnickleFritz
One Kool Kat
SnickleFritz is on a distinguished road
 
Join Date: Feb 2009
Location: Segwaytown USA
Posts: 9
5 yr Member HT/PT Owner Segway Polo Player SegwayFest Attendee
Default

Quote:
Originally Posted by gbrandwood View Post
But, since the rebuild, I note that my reputation has dropped down to only three bars. I'm sure I had at least a dozen prior to the failure....
Feel'n your pain!
__________________
Thanks for asking, it is a Segway!
SnickleFritz is offline   Reply With Quote
Old 11-15-2009, 06:17 AM   #6
Bob.Kerns
Advanced Member
Bob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of light
 
Join Date: Aug 2008
Location: Marin County, CA
Posts: 3,783
5 yr Member HT/PT Owner
Default

Quote:
Originally Posted by JohnG View Post
Our apologies for the unexpected downtime today. What follows is a technical service update for those of you who are interested.

At approximately 3:00pm ET, we were notified by a log service on one of the database servers that one of the drives was experiencing sudden, unexpected failures. In an attempt to recover the drive, it failed at 4:30pm ET.

It was not a catastrophic failure, however, so we were able to make an additional backup of the most recent data from all forums (otherwise the data would've come from a 4-day old backup). This took about an hour.

We then shutdown the server and replaced the faulty drive. A full OS reinstall took about 2 hours, and the restore of all backup data and cleaning of the database tables took about another hour and a half.

As you know, we take our uptime here very seriously and worked immediately on resolving this issue as soon as the drive started failing. Given the way it failed, we were fortunate to not suffer from any data loss.

We are back up and running 100%. Please let me know if you notice any problems or unexpected errors on the forums. Thank you.

Again, we apologize for the unexpected downtime and appreciated your patience.

Best,
JohnG
John, I'm not one to criticize a volunteer effort. On the contrary, thank you, Frank, and anyone else involved in making all this possible!

However (you knew there had to be one, right?), 4-day-old backups, full OS reinstalls, restores, etc. are quite a bit less reliability than what is now achievable, given manpower, expertise, and a bit of money. This sort of recovery can be done in minutes for something operating in the Amazon EC2 cloud, for example. Frequent incremental backups can speed backup and reduce the interval between backups, reducing data loss in the event of a catastrophe.

So my question is, is it worth discussing ways to improve the situation? Reduce the risk of data loss, and/or reduce the load on the volunteer administrators in the event of a problem?

I imagine I'm not the only one here with some expertise in the area.

I don't know the trade-offs here between manpower and money
__________________
Bob Kerns:
To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.

Obviously, we can't have infinite voltage, or the universe would tear itself to shreds, and we wouldn't be discussing Segways.
Bob.Kerns is offline   Reply With Quote
Old 11-15-2009, 02:10 PM   #7
Gihgehls
Senior Member
Gihgehls is just really niceGihgehls is just really niceGihgehls is just really niceGihgehls is just really niceGihgehls is just really nice
 
Gihgehls's Avatar
 
Join Date: May 2006
Location: Galactic Sector ZZ9 Plural Z Alpha
Posts: 2,086
5 yr Member
Default

In my line of work we never allow a single failed drive to take down an entire machine, be it desktop or server. I'd be happy to discuss any ways to improve the reliability of the site.
__________________

To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.
"...if you insist on being imprecise in use and unique in definition, you should hardly be surprised that your attempts at communication are poorly understood." -a wise man
Gihgehls is offline   Reply With Quote
Old 11-17-2009, 11:14 AM   #8
JohnG
Uber Administrator
Wise Segway Elder
JohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to behold
 
JohnG's Avatar
 
Join Date: Sep 2002
Location: Greater Boston
Posts: 6,996
5 yr Member HT/PT Owner SegwayFest Attendee
Default

Always open to suggestions. Generalized ideas about how things could be better run are always nice for a read, but get me to specific strategies you'd recommend that are cost-effective and I'm listening.

Amazon EC2 is not something that I found particularly affordable or easy to implement, and it's not exactly had a stellar track record in terms of downtime so far. We do run Amazon S3 cloud services for static content, but for db operations, I'm not convinced it's there yet. Happy to be shown otherwise.

John
__________________
--
An original Segway employee, 2001-2005
JohnG is offline   Reply With Quote
Old 11-17-2009, 01:56 PM   #9
Bob.Kerns
Advanced Member
Bob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of lightBob.Kerns is a glorious beacon of light
 
Join Date: Aug 2008
Location: Marin County, CA
Posts: 3,783
5 yr Member HT/PT Owner
Default

Quote:
Originally Posted by JohnG View Post
Always open to suggestions. Generalized ideas about how things could be better run are always nice for a read, but get me to specific strategies you'd recommend that are cost-effective and I'm listening.

Amazon EC2 is not something that I found particularly affordable or easy to implement, and it's not exactly had a stellar track record in terms of downtime so far. We do run Amazon S3 cloud services for static content, but for db operations, I'm not convinced it's there yet. Happy to be shown otherwise.

John
I hesitate to throw a lot of specific ideas at you, because I don't know how you have things implemented now, nor your budget, current costs, available hardware, and perhaps most importantly, the relative importance of saving $$$ vs saving routine time, vs uptime, vs risk of losing data, vs risk of long recovery times.

Nor do I know your total traffic, the compute demands of the forum software, nor the load on the DB back-end.

However, let me throw one idea out there. Amazon EC2 isn't the only approach or idea, but let me pick a hybrid owned/EC2 scenario as my example.

Let's say you run your system on your own hardware, two boxes, one the front-end, one the back-end. Let's say you don't want to spend a lot of money, but you'd like to reduce downtime and risk of data loss, while making recovery be a low-stress operation.

So, one scenario would be to turn your OS images into an AWS AMI (Amazon Machine Image). To do this, you'd first separate your live data (including logs) and your OS, application, configuration, etc. Only the non-live stuff would be part of the AMI. The rest would live on another volume. This volume would then be replicated to an Amazon EBS volume.

Initially, your current configuration is master and live.

You also set up an S3 volume to receive database logs. This gets all the DB changes as they're made, and serves as your hot backup for the data itself.

You modify your DB AMI with a startup script that slurps the logs from the S3 volume, saves them for recovery purposes, and then goes live.

Then set up to launch your DB EC2 instance periodically, to slurp those logs, and then shut down once it has reached the live state.

You'll spend only a few bucks getting this set up -- you'll probably spend more on caffeine while you do it. (There's a bit of a learning curve).

Now, to recover, you just launch one or both of your EC2 instances. If you just do the back-end, you reconfigure the front-end to talk to the EC2 back-end. You can set it up with a VPN to be able to do the front-end and not the back-end, but I don't recommend it. You'll incur more IO charges.

Switch over your DNS, and you're back on the air. Take a full dump of your DB, and start spooling logs to an S3 volume, and you're now set with your local installation as the redundant piece. When you've recovered your local hardware, reverse the process, and shut down the EC2 instances.

Costs? Very low in normal operation -- mostly just the S3 storage and IO for the logs, plus a few bucks/month for the AMI and EBS storage. Maybe $1/mo for the periodic boots of the DB server.

That would jump considerably when you go live to the EC2 instances. I don't know how much IO you incur, so I'm going to make a wild guess, maybe $200-$500/mo. Assume you take your time recovering, order a new hard drive, and return to normal operation after a week. The cost of the outage would be $50-$125 plus the cost of the new hard drive, a few minutes of your time for the switchover, and whatever time you'd spend anyway on recovering your system. But you'd be able to do it without the downtime and accompanying pressure.

I spend about $75/mo on my personal setup, which I run 24/7, but I don't have a lot of IO, and only one EC2 instance.

As for reliability -- I don't have enough reliability data to fully address your concern. Anecdotally -- I haven't seen any EC2 outages yet, in several months of operation, nor on their status pages. But the key to EC2 is that if your instance fails, you can just launch a new one. You can snapshot your EBS volumes, and re-slurp your database logs, so even if you lose an EBS volume, you can recover quickly.

The big cost I see is the time and learning up front.

A big benefit is that you're protected against all forms of onsite failure.

There are approaches to improving reliability without going to the cloud. They involve more hardware, so more up-front out-of-pocket costs, but perhaps less setup time and learning, and perhaps less operating costs.

But Amazon has the advantage on operating costs -- they can operate the same hardware for less than you can. And buy it for less, too. You tend to win if you have surplus hardware lying around, and cheap electricity, and already enough bandwidth, rack space, etc.

Amazon also has a new Relational Data Service, which is basically MySQL that they run for you. I haven't explored it yet, so I don't know if it would be a viable alternative to running your own MySQL instance. But if you run MySQL, you could consider setting up an RDS slave to mirror your data.

Does any of this sound like it might be helpful?
__________________
Bob Kerns:
To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 5 or greater. You currently have 0 posts.

Obviously, we can't have infinite voltage, or the universe would tear itself to shreds, and we wouldn't be discussing Segways.
Bob.Kerns is offline   Reply With Quote
Old 11-17-2009, 05:58 PM   #10
JohnG
Uber Administrator
Wise Segway Elder
JohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to beholdJohnG is a splendid one to behold
 
JohnG's Avatar
 
Join Date: Sep 2002
Location: Greater Boston
Posts: 6,996
5 yr Member HT/PT Owner SegwayFest Attendee
Default

Thanks for specifics -- far more helpful than a generalized comment.

Much there that I can investigate further to examine how it might work in our current setup. Everything is always a cost+time/benefit ratio. Keeping in mind, too, this is a hobbyist website.

Despite the recent incident, we still run at 99.999% reliability and this was the first significant downtime in 4 years with zero data loss. Whenever there's zero data loss, I'm a happy camper.

John
__________________
--
An original Segway employee, 2001-2005
JohnG is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 05:34 PM.
Copyright © 2002-2023 SegwayChat.org.
All rights reserved. Not affiliated with Segway Inc.

FreshBlue vBulletin skin by
VayaDesign
Powered by vBulletin
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
SegwayChat Archive