A few colleagues and I were having difficulty accessing our SVN repository via https when working on our Macs at home. Specifically, an SVN checkout operation would hang. A subsequent SVN update operation would complete the checkout, but also hang prior to completion. We had just been living with this for a while but there comes a point when an engineer is annoyed so much that he or she feels compelled to do something about it! In this post I explain how I solved the problem.
Step 1: Enable Debug Logging
The first step of course is to use the web search engine of your choice (WSEOYC) to see if anyone else has encountered the problem. I tried this but to no avail. Another worthy port of call is to check your system logs for relevant information. On the Mac, I have lost count of the number of times I have forgotten to check the Console application for trace that ended up being the key to solving the problem, and it can even provide more useful text to feed back into the WSEOYC!
The next step is to vary the parameters of the problem. I found that I was successfully able to check out a codebase belonging to another company over https. But that didn’t help me fix our problem.
After much usage of the WSEOMC and checking of system logs I got no further. It was only then that I discovered from Dominic Mitchell’s Jabbering Giraffe blog that the command-line svn has a debug logging option for its network requests. Edit the file:
and you will find a line relating to the ‘neon debug mask’. Never one to do things in half-measures, I promptly updated this to enable all the logging features as follows:
[global] neon-debug-mask = 511
Then when I examined the debug trace from the (now very chatty) SVN checkout operation that was hanging, I noticed the following:
sess: Closing connection. ^C sess: Connection closed.
The hanging was occurring just after “Closing connection” was output to the log file.
Step 2: Identify the Faulty Component
I guessed from the debug trace that there was a problem with the networking side of things. But how could I debug this?
My first search revealed a useful program called DTrace, an incredibly powerful tool to investigate almost any aspect of the way a system is running, using a language called D. MacTech’s article Exploring Leopard with DTrace is a great introduction. Unfortunately it looked overkill for the problem at hand so I will have to save that one for a thornier problem another day!
But the networking clue allowed me to find the following very helpful page from Davey Shafik: how-to-fix-svn-apache-ssl-breakage-on-os-x. Although relating to a different problem, the page was incredibly helpful so I definitely owe Davey a beer!
In particular, his problem related to a bug with libneon, a library bundled with OS X that handles HTTP and WebDAV requests. It occurred to me that maybe my problem also related to libneon. So I decided to try his first suggestion, “Upgrade the system libneon (bad idea, as OS X can overwrite it in any update)“. I figured that if OS X updates this library then that might even fix the issue; and if it doesn’t then I could always restore my version of the library after the update.
Step 3: Fix or Replace the Faulty Component
So how to update libneon? I didn’t much fancy compiling it from source myself, but again, Davey’s page introduced me to Homebrew, a package manager for OS X. After ensuring that Xcode or the Command Line Tools for Xcode are installed first, Homebrew can be installed just by running:
ruby <(curl -fsSk https://raw.github.com/mxcl/homebrew/go)
After installing Homebrew I simply ran:
brew install neon
and then copied the newly compiled library over the system library:
cp -p /usr/local/Cellar/neon/0.29.6/lib/libneon.27.dylib /usr/lib/
after judiciously backing up the original library! This updated libneon from version 0.29.0 to 0.29.6 (see revision history). Then I made sure that the group, owner and permissions of the new library matched that of the old.
Step 4: Test the Fix
Et voilà! No more hangs when running svn from the command-line. Substituting the old library caused the issue to occur again, so it is seemingly definite that the libneon component was the issue.
The likely cause of the issue is the bug that was fixed in libneon 0.29.3: Change ne_sock_close() to no longer wait for SSL closure alert: fixes possible hang with IIS servers when closing SSL connection.
Now the problem is fixed once and for all which makes me a happy engineer as I have solved a technical problem that was causing us pain. It’s surprising what a little bit of applied effort can do even in the face of an initially completely mysterious problem!