Converting a CVS repository to SVN
In the nearly twenty years that CVS has been in use, there have been literally hundreds of thousands, (perhaps even millions) of CVS repositories created around the globe. Some of those have already converted to Subversion (SVN), while others are candidates for conversion. This posting explores how best to approach such a conversion.
When considering converting your CVS repository to another Source Code Repository Management (SCRM) scheme, you must answer these questions:
What problem (or problems) is the conversion attempting to solve?
If the conversion is not meant to address an actual problem or issue, why are you considering it? If there is simply a desire to move to a more modern SCRM system, it is well advised to first prepare a cost-benefit analysis.
Start listing the perceived benefits of converting from CVS, describing them using simple phrases like “Enable directory versioning“. Also list the costs of the conversion, including everything from time consumed in the conversion itself to time and training required for developers to use the new SCRM. Be prepared to abandon the conversion if the benefits don’t clearly outweigh the costs.
Could the problem(s) be addressed or resolved without a conversion?
There may be less disruptive solutions to your issue. Be sure to research and consider those, because converting your repository is effectively taking a ride down a one-way street. Once your repository has been converted to SVN and there is new activity in it, there is no practical method for going back to CVS other than simply creating a new CVS repository with no version history. As I said above, research and weigh your options.
Are there system dependencies on the CVS repositories?
If yes, what is the scope of the dependencies? If all of your build and release management automation is tied to interacting with a CVS repository or commavee files, you must scope the level of effort required to port the automation to Subversion.
If the LOE is low, then proceed. If the LOE is high, you must now determine whether the problem you’re trying to resolve with the conversion is serious enough - as measured by how costly it is to your organization to deal with it - to warrant a large, costly conversion of existing systems.
Are these commercial systems? If so, are there equivalent SVN-compatible versions available? If there are internally developed systems, is the original engineer(s) that created the system still at your company? If not, is the system documented well enough for someone else to port the dependencies to SVN?
What is the size and distribution of your development team?
This will determine the scale of your development process retrofit. Your engineers will need to be able to securely engage the SVN repository from their IDE of choice from the same geographic location they currently engage CVS. Since SVN uses HTTP or HTTPS as its transport protocol, you will no longer have pserver, sserver, or ssh protocols involved.
CVS to SVN - the cvs2svn.py script
The primary authors of subversion have created a conversion utility called cvs2svn. This is a Python script that will attempt to convert a well-formed CVS repository into a corresponding Subversion repository. This script has many command line options that allow for conversion type (full, no history, etc.), and targets.
My experience with cvs2svn is that is is useful for non-complex repositories that are relatively modern in age. Repositories that are either complex, for example those that contain deep branch structures, or “old” and have been in use for many years, are likely to contain anonamlistic files or version histories that cvs2svn cannot parse.
The script has two characteristics that impede its usefullness on very large, very old CVS repositories:
- Poor error flagging & reporting
- Inability to stop / restart a conversion
This means that when errors (usually obscure or inaccurate, see #1) are reported and the conversion stops, you have to start over (see #2). This is a potential show-stopper for large repository conversions, where you can be multiple hours into a conversion before stopping errors are encountered.
Specific Error Patterns and Causes
I have noticed three specific CVS repository anomalies that cause cvs2svn to fail and hard-stop. If you wish to use cvs2svn, you must deal with these issues either before using the script, or as the errors are encountered.
1. Malformed ,v file
CVS has been around for a long time. Like any tool that has (or had) a long, active development cycle, it had bugs that were fixed over time. Some of these early bugs had the potential to write malformed ,v files into the repository.
Even though CVS client operations would simply ignore these files, they remained in the repository. Now, when cvs2svn encounters these files outside a CVS client operation, the internal flawed structure is exposed and generates a fatal error.
2. Multiple identical tags on disparate revisions of the same file
For the same reasons as noted above, old bugs in CVS made it possible to apply quasi-identical tags to the disparate revisions of the same file. For example, an early bug in CVS involved poor transformation of line-ending characters.
A common occurrence was the generation of ^M characters in Windows-based editors. In normal use, Windows developers would edit files and commit them to the CVS repository.
The earliest users of CVS implemented their repositories on Unix systems. Similarly, the earliest engineers using CVS were developing source code on those same Unix systems. When the client-server CVS was introduced, this enabled developers to engage remote CVS repositories, and the line-ending issue emerged soon after.
As CVS usage grew, Windows developers began using remote CVS repositories. Often, connections to these remote CVS repositories were facilitated using either Samba, or later using pserver.
This wide-spread adoption led to a wave of line-ending issues, which were converted to ^M as they went into the repository, and were actually being saved in the ,v files. While this was annoying to developers, it was ultimately largely resolved in the CVS client’s improved handling of line-endings on the way into and out of the repository.
However, this line-ending issue had at least one serious side affect to the repository. Due to CVS’ buggy early handling of Windows line-endings, it was possible to apply tags of this form
TEST: 1.9
…
TEST^M: 1.0.2.14
to one or more files in the repository. While later clients fixed this bug, the malformed tag had been written to the repository and would be there forever. CVS would simply ignore the malformed tag because it was never requested by any subsequent client operation.
Cvs2svn.py, however, parses each ,v file and it appears to ignore, or drop, the ^M at the end of the tag name, effectively creating two same-named tags on disparate revisions.
This causes a fatal error in cvs2svn.
3. Same named files in both the parent directory and the Attic
Update: There is a cvs2svn option to deal with this;
–retain-conflicting-attic-files
EDIT: Apparently that’s a documentation bug, because the ‘retain-conflicting-attic-files’ is not a recognized option in the current verfsion of cvs2svn.
Once again, this issue is likely due to some old bug in CVS. Under normal usage, CVS would store a ,v file in either the parent of a directory node or in its Attic.
When a file is added to CVS, its position relative to the trunk (the HEAD) is determined. If the file is being added on the trunk, the corresponding ,v file is stored in the repository inside its directory node. For example, when
/foo/bar.java
is added to the trunk, the ,v file is stored at
$CVSROOT/foo/bar.java,v
Similarly, when bar.java is added to a branch, its ,v file is stored at
$CVSROOT/foo/Attic/bar.java,v
And in that ,v file the HEAD revision is listed as Dead. This indicates that the file exists only on the branch. When cvs client operations attempt to retrieve the HEAD, bar.java will not be included.
This is normal CVS behavior, but because of old bugs, CVS has in the past allowed same-named files to exist at both the parent node and in the Attic. Though the bug that allowed this to occur has been fixed, the operation that caused it in fact wrote the extra file to the repository and that file is till there (sometimes years later).
When cvs2svn encounters this, it throws a fatal error.
I’ll come to edit this post with final results of a very large CVS repository conversion I’m doing; for now, suffice it to say that cvs2svn.py is useful, but not 100% reliable for very large, very old CVS repositories.

Add New Comment
Viewing 5 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment