January 8, 2011

Wibbly Wobbly Timey Wimey, or How to Gracefully Compensate for an Incorrectly Set Clock

by with 12 comments

Part of being a good sysadmin is being flexible. Quite often when something breaks you’re solving a new problem, and the developers, users, and management turn to you to find a solution. One day you’ll be sorting out IP misconfiguration, the other you’ll be debugging Python libraries, and another you’ll be working around yet another limitation of your web server. You wind up reading a lot of documentation and learning a little about everything.

We had a particularly puzzling problem on one of our servers the other day. Somehow, its clock had been set eight hours fast1. We were running NTP, but it wasn’t set to jump the clock on start2, so even though NTP was synced to an upstream server, the clock was still a long ways off.

Normally, this is pretty easy to fix — you set the clock back, and you’re done. For us, though, it was a serious problem because uploads are ordered by commit time, which is dependent on the server’s clock. The guy who wrote our nagios time check was summarily flogged, and Alan and I set out to find a solution. We came up with a few ideas:

  1. Wait eight hours for real time to catch up
  2. Rewrite all the logs and database entries to have correct times
  3. Find a way to slow down what our daemons thought was the correct time in order to gradually bring their time in sync with real time

#1 was right out — we weren’t going to stop uploads for that long. #2 was heinously difficult and fraught with peril, so we went with #3. Alan’s quick googling found libfaketime, one of those great utilities that shouldn’t exist, but does. Libfaketime works as a LD_PRELOAD hack to change the way certain time library calls work. It can create a time offset and/or speed up and slow down the passage of time relative to the system clock3.

After two hours pondering and hacking, we put our plan into action. We set our server’s time back to normal and modified our daemons’ time to pretend to be six hours in the future, running at half-speed. Twelve hours later, the times synced up, and we undid the hack. Problem solved, and I’ve added one more unmentionable hack to my toolbox. :)

1: I’m not entirely sure, but I think the reason for this is a recent motherboard swap. We set our system clocks to UTC, and the motherboard probably arrived with its clock set to Pacific time.

2: Debian/Ubuntu admins note: this option is turned off by default in the openntpd package.

3: Interestingly, Erlang provides similar behavior by default. If erl detects a jump in the system clock, it will slowly adjust its concept of current time to match.

  1. Maybe to make it future-proof you should build a timey-wimey detector that goes ding when there's stuff.

  2. Maybe to stop it happening again, you should sit watching the clock and Don't turn your back, don't look away, and don't blink!

  3. 1. I do hope that the server software refuses to start or immediately stops if time is found to go backwards then.
    2. perform a ntpdate at boot just before starting ntpd, to account for large offsets
    2. On some systems you have "date -a" to slowly adjust your clock back to normal.

  4. Probably time to remove the time dependency from your upload system. Ordering by system clock is unreliable in large environments. Have a look at Twitter's snowflake, for creating id's which creates scalable, monotonically increasing IDs without duplication, regardless of clock time.

  5. Twitter's snowflake sets the high bits of the ID from the system clock (hopefully synchronized over NTP or something similar) — it ain't "regardless of clock time." And their technique for generating monotonically increasing IDs in the presence of clock regression is to refuse to generate a new ID until the clock has retraced past its previous high value. For these guys, that would have meant holding everything up for eight hours.

  6. You might want to look into DJB's clockspeed package for managing your systems clocks. It corrects for skew, too.

  7. If you want to make $10-$150/hour and up to $3500/month of your free time working at
    home part-time then this is the most important message you’re ever
    going to read.

    Here’s why:

    “Businesses need ME! They’re paying me money – CASH – to a
    36-year-old work-at-home mother to get my opinion.”


    Guess what?

    They also need YOU!

    I’m not a marketing expert or a sales gal. In fact, I know very
    little about business. But I do know exactly what I like and what
    I don’t like. I know what products I would buy and what services
    I would use.

    Guess what? This is exactly what large companies are paying me
    for. They need to know what their average customer needs and
    wants. So these companies pay millions of dollars every month to
    the average person. In return, the average person, myself
    included, answers some questions and gives them their opinion.

    You’ve got to try this out. Just click on the link below and
    check it out yourself.


  8. Hi, This just came across my desk and I had to pass it on to you ASAP…
    Internet multi-millionaire Mack Michaels has a few new positions available right now…
    If accepted you can easily rake in $11, 917 per month starting from scratch.
    ==> http://www.maverickmoneymaking.info/maverick.html
    Once you’re accepted just follow the training Mack gives you. It’s really quite simple…
    Learn how Mack went from not being able to afford Christmas gifts to a millionaire lifestyle and how you can too!
    Due to the extremely high level of Hands-On time Mack spends with every new member he has to limit the number of positions that are open.
    Right now there are only 2 available in your area. If you’re interested you should move quickly.
    ==> http://www.maverickmoneymaking.info/maverick.html
    Your Friend, – Mike