comments Written By: Scott Jangro
June 12, 2007

Subversion Enlightenment

versioncontrolwithsubversion.gifI’ve been developing web sites for many years now. As my aspirations grow, so do the sizes of the projects. And along with that comes more people. For the past few years, I’ve been working side-by-side with my business partner Damien. Sometimes we’re working on our own projects, but many times we’re collaborating on the same project.

When two people are working on the same website, you immediately run into the age-old (computer age, anyway) problem of people stepping on other people’s work.

And now we’re introducing a third developer to the process. Three developers working in three locations means that we need a better solution. Fortunately, there have been thousands of people who have had this problem before us who have created version control tools like CVS and Subversion.

I’ve tried getting Subversion going previously, but never had the time to get my head around how it works and how we’d have to change our development process.

Here’s our current process …it is quite loose and probably seems pretty familiar to many of you.

We have development and production versions of a website, both of which live on a server somewhere. We directly edit the files on the development server and we generally don’t keep our own local versions. It is just easier to have a central database and web front-end to test on.

Until now, we’ve relied on caution, diligence, and word of mouth to make sure we’re not working on the same files. Our work-flow goes something like this:

1. I need to edit a file on the server, foo.php
2. I think for a second if there’s a chance that Damien is working on the same site.
3. If so, I’ll jump on IM and verbally claim the file, “Yo, I’m going to edit foo.php, k?”

Sometimes, though thankfully rarely, we edit the same file and one of us clobbers the other’s work. That totally sucks, but we’ve done ok. We talk a lot, so we tend to know what the other is doing.

We work like that, making all our edits on the dev server. After testing we’ll either move all files over to production or just one or two depending on the size of the change.

This process has served us pretty well. But adding a third person into the mix, it’s time to grow up.

Here’s what a new process will look like

We’re installing Subversion on the server and on all of our work systems.

I’ll create a Subversion repository on the server for each of our web projects. We’ll start with one or two to flesh out the process. Understanding how this will work was the key for me to embrace the process and move forward. (Writing this document is helping as well!)

The Subversion repository contains all of the code for a website. Since we’re adding version control to an existing site, it’s a snapshot in time of an existing project. The code for the website is imported into a repository which sets the mark at Version 1 for each and every file.

This repository is not the code used by the web server for either the production or development environments. Just like our own working copies, the web servers contain snapshots of code that live in their own directories. When we want to test the development server, we get the latest code out of the repository (via a “checkout” or “export” operation). When we want to move the code into production, we move the code there as well.

We’re NEVER editing the files directly in the web server directories. This is a big change.

When we do want to edit code, the process looks like this:

1. checkout the server code. This creates a copy of the code for editing.
2. make edits and test on our own personal copy of the web server
3. check code back in

If in step 3, we’ve changed a file that someone else has changed, Subversion will alert us and we must resolve those differences. How easy that will be remains to be seen.

A key difference is that we’ll each have our own personal version of the website running for development. Having a copy of a website running is quite simple and something I’ve done many times before. It’s easy to install an apache server on just about any computer.

It can get trickier, however, to have a useful working local copy of the database. Our databases tend to be quite large, hundreds of megabytes, even approaching a gigabyte in a few cases. That’s a lot of data to be moving around, even over broadband connections. So I think we’ll set up one or two copies of the database on the remote server and we’ll all test using the same remote database and our own local web servers.

Challenges

There are a few things that I need to work out, in my head, and then in practice.

1. Images and cache files: There are many files that live in the application structure that should not be under version control. These are typically cache files and graphics (user thumbnails, product images) that get updated by the server, not by a developer. We’ll all need these files on our own development systems, but we don’t want them to get checked out and in to version control. There are literally tens of thousands of these files and we cannot be checking them in and out.

2. Environment Files: These are things like configuration files that are unique to the specific web server environment. They should be under version control, but will differ depending on the server. Not sure how to handle these. Maybe by definition, they cannot be kept under version control.

This is the beginning of my journey into version control for website development. I’ll continue to document my experiences and processes as things progress.

If you enjoyed this post, please subscribe to my RSS feed

11 Comments... What do you think?


  1. AltJ said on June 12th, 2007 at 5:09 pm

    The best way I’ve found to handle environment or site-specific configuration files is to have a config.php.sample file that is checked into subversion with a generic group of configuration settings (and any required documentation for those settings.)
    Then in each site, I rename that file to config.php and make the site specific changes. If I need to make changes that require additional settings in the config.php file, I make those in the .sample file (which are propagated by subversion) and manually make the corresponding changes in each site’s config.php file. That way I don’t have to worry about subversion whacking some specific site settings when I push out an update.
    Another alternative is to use this method, but check-in a separate file for each site (e.g. config.php-site1, config.php-site2) and rename those files to config.php as needed for each site. One good thing about this method is you have a history of config changes for each site in subversion.

  2. Scott said on June 12th, 2007 at 8:38 pm

    AltJ, thanks. Fortunately, I do use a config file for things like database settings. I’ll try out your suggestions.

    Just so I’m clear, in both cases, you don’t have config.php checked into the repository, just config.php.sample or config.php-sitex. Any changes to the plain config.php in each site don’t get updated.

    It’s up to the person managing the releases to make sure the config.php is copied and configured properly. If things are done properly, it should never get overwritten. The only risk being new settings getting added to the base config file without person who manages the specific site realizing it.

  3. VB said on June 13th, 2007 at 3:36 am

    I don’t know if this helps but usually I will have an env variable in the webserver config (Apache config file in my case). On my dev server this is set to ‘SetEnv SERVERNAME dev’ and on my prod machine it is set to ‘SetEnv SERVERNAME prod’.
    Then in my conf files (in php) I can do the following:

    if (getenv(’SERVERNAME’) == ‘dev’) {

    };

  4. VB said on June 13th, 2007 at 3:39 am

    Sorry I accidentally posted the msg before I completed my example:

    if (getenv(’SERVERNAME’) == ‘dev’)
    {
    $dbhost = “localhost”;
    $dbuser = “root”;
    dbpass = “pass”;
    }
    elseif (getenv(’SERVERNAME’) == ‘prod’)
    {
    $dbhost = “prodhost”;
    $dbuser = “produser”;
    dbpass = “prodpass”;
    }

  5. Scott said on June 13th, 2007 at 8:14 am

    good stuff VB. I do something similar, but instead check the hostname like this.


    if ($_SERVER['HTTP_HOST'] == “dev.hostname.com”) {
    // dev settings
    } else {
    // default to prod
    }

    I like your method a little better as it doesn’t rely on the the hostname matching exactly (with problems like having a www or not on prod).

    I didn’t make the leap that this could serve well in this case too. It would require, however, that the config get cluttered up with each developer’s settings. I guess that’s not that awful and in fact does allow for version control on this file.

  6. Brenda said on June 13th, 2007 at 8:47 am

    Scott…”Three developers working in three locations means that we need a better solution.” We have experienced the same thing and it can be challenging. I hope your new system works for you.

  7. AltJ said on June 13th, 2007 at 12:51 pm

    Yes, Scott. It looks like you understood what I was trying to say.
    “The only risk being new settings getting added to the base config file without person who manages the specific site realizing it.” - I’ve done this in the past and my QA process quickly finds it.

    I like VB’s and your methods but there can be some security drawbacks.

    With previous employers (in the banking industry) we were prohibited from storing production access credentials anywhere but in production so we couldn’t use those methods. We were forced through a change control process that included updating the environment specific config file when necessary.

    (Let me get my tin-foil hat on.)
    I don’t necessarily have complete trust in some of my hosting providers (I know some admins can get nosey), so I don’t like to store any unnecessary usernames/passwords on my webhosts. This could also leak a list of other sites you have that are running off the same codebase.

    These comments are probably just a paranoid carryover from those previous employers.

  8. Scott said on June 14th, 2007 at 10:45 am

    AltJ, a little paranoia is a good thing, for sure.

    The more I look into this, the more I like the idea of keeping only a sample config file in the source control for security reasons.

    Not that I don’t trust people working on the project, but there’s really no need to have production database passwords being checked out and stored on their local computers.

    It’s becoming clear to me now why some open source applications distribute a config-sample.php which must be renamed to config.php and not simply a config.php that must be edited.

  9. Carsten Cumbrowski said on June 23rd, 2007 at 12:51 am

    You might whant to check out “Git” by Linus Torvald. He gave a presentation at Google in May. Here is the Video on YouTube. He slamms Svn :) (he is obviously not a big fan of how the folks at Subversion (and also CVS btw) are doing version management hehe).

  10. Daniel Skinner said on January 10th, 2008 at 7:14 am

    I am interested to know the best way to handle databases in version control. My application relies on a certain database structure that may change and I would like to VC. However the actual test data is not important.

    Do you keep an SQL dump of the basic structure in VC? When checking out each developer has their own version of the database or just executes the SQL file to get the latest structure?

    Anyone have any ideas on how best to handle this?

  11. [...] Register « Subversion Enlightenment [...]

Join the discussion by leaving a comment...

How do I change my avatar?

Go to gravatar.com and upload your preferred avatar.