This mini-HOWTO describes how to convert a CVS repository to a subversion repository.
The base of this conversion is performed with cvs2svn. This tool isn't very accurate however when it comes to MIME types and EOL conversions. This mini-HOWTO describes a method to perform this conversion in a much more controlled way.
First you need direct access to your CVS repository (this is the directory containing the CVSROOT directory.) If your CVS repository resides on Source Forge, then you can download it to your local harddisk as described here.
Set the CVSREPOS environment variable to the full path of the directory that now
contains the CVSROOT directory. For example, if you used rsync to download the CVS repository to
cvsrepos in the current directory, you'd set:
This should allow you to do,
$ (cd /; test -f $CVSREPOS/CVSROOT/rcsinfo,v && echo 'Ok!') Ok!
Set the environment variable PROJECTNAME to the name of the directory (not CVSROOT) in $CVSREPOS containing the CVS module that you want to convert. If you downloaded the CVS repository from SourceForge, then this would be the project UNIX name.
Consider the example output of the following command:
$ ls $CVSREPOS CVSROOT/ libcwd/Then you'd set
The following makes a list of all file extensions in your repository, and all files without extension.
This assumes you have a file
/etc/mime.types with a format like
# Comment text/x-c++hdr h++ hpp hxx hh text/x-c++src c++ cpp cxx cc
In other words, with the format
where the EXTENSION-LIST has no dots and is space separated.
For the conversion to successfully work, you need at least mime-types for each binary file in your repository.
Also note that these commands create temporary files in the current directory. The backslash means that the next line is really part of the same command (you should still be able to just paste the whole command to a shell in one go.) I added comments to prevent this from turning into black magic.
# Find all extensions. Also include filenames without extension. # The E (extension) and S (slash) trick is to get GNU sort to separate # them, although this is not really necessary. But note that at the # same time it removes the leading slash from filenames without extension. find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \ sed -e 's%.*\([./][^.]*\),v$%\1%' -e 's/\./E/' -e 's/\//S/' | \ sort -u | sed -e 's/^S//' -e 's/^E/./' > step1 # Compose an extended regular expression that matches any "extension" # as found by the previous step. EXT1="($(grep '^\.' step1 | xargs echo | sed -e 's/^\.//' -e 's/\+/\\+/g' -e 's/ \./|/g'))" # Find all mime-types and related extensions that really exist. egrep -i '^[[:alnum:]][^[:space:]]*[[:space:]]+([^[:space:]]+ )*'"$EXT1"'($| )' /etc/mime.types > step2 # Extract the list of extensions from the previous step, # filtering out the extensions that we don't have. for ext in $(sed -re 's/^[^[:space:]]*[[:space:]]+//' step2); do echo $ext; done | \ egrep -i '^'"$EXT1"'$' | sort -u > step3
step3 now contains a list of all extensions
found in your repository for which we know one or more MIME types.
# Compose an extended regular expression from the previous step. EXT2="($(cat step3 | xargs echo | sed -e 's/ /|/g'))" # Find all "extensions" that weren't really extensions # (or for which we don't know a MIME type). grep '^\.' step1 | egrep -iv '^\.'"$EXT2"'$' > step4 # And turn it into an extended regular expression. EXT3="($(sed -e 's/\./\\\\./' step4 | xargs echo | sed -e 's/ /|/g'))" # Create a list of files for which no MIME type is known. find $CVSREPOS/$PROJECTNAME -type f -name '*,v' ! -name '.cvsignore,v' | \ sed -e 's%.*/\([^/]*\),v$%\1%' | egrep -i "$EXT3"'$' | sort -u > step5
At this point, the file
step5 contains a list of filenames
in your repository for which no MIME types match their extension.
If you see any binary files in there then you must add their extension
to the mimes.types file and repeat the steps above. For text files, this is not
necessary; files without MIME type are treated as text by SVN.
# Create a map from extension to MIME type. If a MIME type that starts # with 'text' exist, use that - otherwise use application/octet-stream # when there is more than one MIME type, or use the single known MIME type. for f in $(cat step3); do \ MIMETYPES=$(egrep -i '[[:space:]]'$f'( |$)' step2 | sed -e 's/[[:space:]].*//'); \ echo $f: $MIMETYPES; done | \ sed -e 's%:.* \(text/[^ ]*\).*%: \1%' -e 's%: [^ ].* .*%: application/octet-stream%' > step6
Edit the file
step6 if it contains MIME types that you do not agree with.
For example, if you have any .doc files and you used a standard mime.types file,
it will contain
doc: application/msword. This would be treated as a binary
file. When those files aren't msword files then you would need to change this
doc: text/plain. Likewise, you might have .js files that are correctly
js: text/plain because you don't want svn to treat those files as binary!
Every MIME type that doesn't start with
text/ will be treated as binary.
# This could be used to turn the (editted) file 'step6' into a mime.types file # for use with cvs2svn's '--mime-types=FILE' option, but we WON'T use this. sed -e 's/\([^:]*\): \(.*\)/\2 \1/' step6 > mime.types # Instead, build a props file for use with the '--auto-props=FILE' option. echo "[auto-props]" > propsfile sed -e 's%: \(text/.*\)% = svn:eol-style=native;svn:keywords=Author Date Id Revision;svn:mime-type=\1%' \ -e 's/^/*./' -e 's/: / = svn:mime-type=/' step6 >> propsfile
Also this file you might want to edit at this point, in particular the keywords that will
be substituted. In the SVN Book you can find more information on
If you want the result to be as much as possible as it would be when you had
just added each file with
svn add over time, then in fact - you should remove
svn:mime-type properties for
text/* MIME types, because
those are normally not set.
The next command adds the properties for all other files (for which no MIME type could be found).
Note that also here the svn:keywords property is set. You might want to change it. We don't
svn:mime-type property for the other files, this has the same effect as
them being text/plain anyway - and usually svn doesn't set a mime-type unless it detects a
binary file (in which case it sets the mine-type to application/octet-stream).
# Finish the generation of propsfile sed -re 's/(.*\.([^.]*$))/\2 \1/' -e 's/^([^.]*)$/\1 \1/' step5 | sort | \ sed -e 's/^[^ ]* //' -e 's%$% = svn:eol-style=native;svn:keywords=Author Date Id Revision%' >> propsfile
At this point all results are collected in one file:
Edit it and make any changes you think are appropriate. For example, you can delete lines
for files that have the same extension and replace them by one that uses a wildcard.
What is important here is that text files have the appropriate
property and no mime-type, or have a
svn:mime-type that starts with
text/*, and binary files have have no
svn:eol-style and have
svn:mime-type that does not start with
*.png = svn:mime-type=image/png *.cpp = svn:eol-style=native;svn:keywords=Author Date Id Revision
Now we are ready to create the SVN dumpfile with cvs2svn.
cvs2svn --dumpfile=svndump-$PROJECTNAME --keywords-off --no-default-eol \ --auto-props-ignore-case --auto-props=propsfile $CVSREPOS/$PROJECTNAME
Next you can use svndumpfilter to clean up your dumpfile a bit. For example, here is how I dropped several tags and a branches:
cat svndump-$PROJECTNAME | svndumpfilter exclude tags/stable_head tags/gdbbug00 tags/gdbbug01 \ tags/gdbbug02 branches/branch-threading > svndump
you can get a list of all tags and branches by grepping the dumpfile:
egrep -a '^Node-path: /?(tags|branches)/' svndump-$PROJECTNAME | \ sed -re 's%^Node-path: ((tags|branches)/[^/]*).*%\1%' | sort -u
Uploading the dumpfile to sourceforge is described here
(don't forget to rename it to
svndump before compressing and uploading it).
Creating a new repository on your local harddisk from a dumpfile is described here.