Converting CVS to subversion

This mini-HOWTO describes how to convert a CVS repository to a subversion repository.

The base of this conversion is performed with cvs2svn. This tool isn't very accurate however when it comes to MIME types and EOL conversions. This mini-HOWTO describes a method to perform this conversion in a much more controlled way.

First you need direct access to your CVS repository (this is the directory containing the CVSROOT directory.) If your CVS repository resides on Source Forge, then you can download it to your local harddisk as described here.

Set the CVSREPOS environment variable to the full path of the directory that now contains the CVSROOT directory. For example, if you used rsync to download the CVS repository to a directory cvsrepos in the current directory, you'd set:

$ CVSREPOS="$(pwd)/cvsrepos"

This should allow you to do,

$ (cd /; test -f $CVSREPOS/CVSROOT/rcsinfo,v && echo 'Ok!')
Ok!

Set the environment variable PROJECTNAME to the name of the directory (not CVSROOT) in $CVSREPOS containing the CVS module that you want to convert. If you downloaded the CVS repository from SourceForge, then this would be the project UNIX name.

Consider the example output of the following command:

$ ls $CVSREPOS
CVSROOT/  libcwd/
Then you'd set
$ PROJECTNAME=libcwd

The following makes a list of all file extensions in your repository, and all files without extension. This assumes you have a file /etc/mime.types with a format like

# Comment
text/x-c++hdr         h++ hpp hxx hh
text/x-c++src         c++ cpp cxx cc

In other words, with the format MIMETYPE EXTENSION-LIST, where the EXTENSION-LIST has no dots and is space separated.

For the conversion to successfully work, you need at least mime-types for each binary file in your repository.

Also note that these commands create temporary files in the current directory. The backslash means that the next line is really part of the same command (you should still be able to just paste the whole command to a shell in one go.) I added comments to prevent this from turning into black magic.

# Find all extensions. Also include filenames without extension.
# The E (extension) and S (slash) trick is to get GNU sort to separate
# them, although this is not really necessary. But note that at the
# same time it removes the leading slash from filenames without extension.
find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \
    sed -e 's%.*\([./][^.]*\),v$%\1%' -e 's/\./E/' -e 's/\//S/' | \
    sort -u | sed -e 's/^S//' -e 's/^E/./' > step1

# Compose an extended regular expression that matches any "extension"
# as found by the previous step.
EXT1="($(grep '^\.' step1 | xargs echo | sed -e 's/^\.//' -e 's/\+/\\+/g' -e 's/ \./|/g'))"

# Find all mime-types and related extensions that really exist.
egrep -i '^[[:alnum:]][^[:space:]]*[[:space:]]+([^[:space:]]+ )*'"$EXT1"'($| )' /etc/mime.types > step2

# Extract the list of extensions from the previous step,
# filtering out the extensions that we don't have.
for ext in $(sed -re 's/^[^[:space:]]*[[:space:]]+//' step2); do echo $ext; done | \
    egrep -i '^'"$EXT1"'$' | sort -u > step3

The file step3 now contains a list of all extensions found in your repository for which we know one or more MIME types.

# Compose an extended regular expression from the previous step.
EXT2="($(cat step3 | xargs echo | sed -e 's/ /|/g'))"

# Find all "extensions" that weren't really extensions
# (or for which we don't know a MIME type).
grep '^\.' step1 | egrep -iv '^\.'"$EXT2"'$' > step4

# And turn it into an extended regular expression.
EXT3="($(sed -e 's/\./\\\\./' step4 | xargs echo | sed -e 's/ /|/g'))"

# Create a list of files for which no MIME type is known.
find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \
    sed -e 's%.*/\([^/]*\),v$%\1%' | egrep -i "$EXT3"'$' | sort -u > step5

At this point, the file step5 contains a list of filenames in your repository for which no MIME types match their extension. If you see any binary files in there then you must add their extension to the mimes.types file and repeat the steps above. For text files, this is not necessary; files without MIME type are treated as text by SVN.

# Create a map from extension to MIME type. If a MIME type that starts
# with 'text' exist, use that - otherwise use application/octet-stream
# when there is more than one MIME type, or use the single known MIME type.
for f in $(cat step3); do \
    MIMETYPES=$(egrep -i '[[:space:]]'$f'( |$)' step2 | sed -e 's/[[:space:]].*//'); \
    echo $f: $MIMETYPES; done | \
    sed -e 's%:.* \(text/[^ ]*\).*%: \1%' -e 's%: [^ ].* .*%: application/octet-stream%' > step6

Edit the file step6 if it contains MIME types that you do not agree with. For example, if you have any .doc files and you used a standard mime.types file, it will contain doc: application/msword. This would be treated as a binary file. When those files aren't msword files then you would need to change this into doc: text/plain. Likewise, you might have .js files that are correctly mapped to a MIME type application/x-javascript but you still need to change that into js: text/plain because you don't want svn to treat those files as binary! Every MIME type that doesn't start with text/ will be treated as binary.

# This could be used to turn the (editted) file 'step6' into a mime.types file
# for use with cvs2svn's '--mime-types=FILE' option, but we WON'T use this.
sed -e 's/\([^:]*\): \(.*\)/\2 \1/' step6 > mime.types

# Instead, build a props file for use with the '--auto-props=FILE' option.
echo "[auto-props]" > propsfile
sed -e 's%: \(text/.*\)% = svn:eol-style=native;svn:keywords=Author Date Id Revision;svn:mime-type=\1%' \
    -e 's/^/*./' -e 's/: / = svn:mime-type=/' step6 >> propsfile

Also this file you might want to edit at this point, in particular the keywords that will be substituted. In the SVN Book you can find more information on svn:keywords. If you want the result to be as much as possible as it would be when you had just added each file with svn add over time, then in fact - you should remove the svn:mime-type properties for text/* MIME types, because those are normally not set.

The next command adds the properties for all other files (for which no MIME type could be found). Note that also here the svn:keywords property is set. You might want to change it. We don't set the svn:mime-type property for the other files, this has the same effect as them being text/plain anyway - and usually svn doesn't set a mime-type unless it detects a binary file (in which case it sets the mine-type to application/octet-stream).

# Finish the generation of propsfile
sed -re 's/(.*\.([^.]*$))/\2 \1/' -e 's/^([^.]*)$/\1 \1/' step5 | sort | \
    sed -e 's/^[^ ]* //' -e 's%$% = svn:eol-style=native;svn:keywords=Author Date Id Revision%' >> propsfile

At this point all results are collected in one file: propsfile. Edit it and make any changes you think are appropriate. For example, you can delete lines for files that have the same extension and replace them by one that uses a wildcard.

What is important here is that text files have the appropriate svn:eol-style property and no mime-type, or have a svn:mime-type that starts with text/*, and binary files have have no svn:eol-style and have a svn:mime-type that does not start with text/*. For example,

*.png = svn:mime-type=image/png
*.cpp = svn:eol-style=native;svn:keywords=Author Date Id Revision

Now we are ready to create the SVN dumpfile with cvs2svn.

cvs2svn --dumpfile=svndump-$PROJECTNAME --keywords-off --no-default-eol \
    --auto-props-ignore-case --auto-props=propsfile $CVSREPOS/$PROJECTNAME

Next you can use svndumpfilter to clean up your dumpfile a bit. For example, here is how I dropped several tags and a branches:

cat svndump-$PROJECTNAME | svndumpfilter exclude tags/stable_head tags/gdbbug00 tags/gdbbug01 \
    tags/gdbbug02 branches/branch-threading > svndump

you can get a list of all tags and branches by grepping the dumpfile:

egrep -a '^Node-path: /?(tags|branches)/' svndump-$PROJECTNAME | \
    sed -re 's%^Node-path: ((tags|branches)/[^/]*).*%\1%' | sort -u

Uploading the dumpfile to sourceforge is described here (don't forget to rename it to svndump before compressing and uploading it).

Creating a new repository on your local harddisk from a dumpfile is described here.