Gitea/Dump
From charlesreid1
Via Gitea documentation: https://docs.gitea.io/en-us/command-line/
Contents
What Dump is For
Running the gitea dump command will dump the entire contents of Gitea's internal database and reposiitories to disk and zip it up.
The way I have described installing Gitea, it is important to run this command as the correct user, in this case the user git:
chmod 777 /temp/ cd /temp/ sudo -H -u git /www/gitea/bin/gitea dump --verbose
This will begin the backup process, which will take a few minutes.
Format of Output
Notes on formatting of output:
- The gitea dump dumps out the log files, and a zip file containing all of the repositories.
The directory structure is as follows:
- The repositories folder contains all repositories.
- Within the repositories folder, there is a folder for each user and organization.
- Inside each user or organization folder, there is one folder for each repository that user owns.
- The folders are called <reponame>.git, and the contents are the contents of the .git folder in that repo
Just as the .git folder stores the entire history and log of the repository normally, so too the zip file of resulting repositories can be used to get all historical information about all repositories in gitea. Most of the git utilities are designed to work with .git directories with arbitrary names in arbitrary locations, so that works to our advantage.
Location of Output and -t Flag
Let's talk about how to control where the dump file goes.
The gitea dump command has a -t flag, which is described as:
--tempdir value, -t value Temporary dir path (default: "/tmp")
Let's clear up what this means for the gitea dump process.
Gitea Dump Process
The dump process is as follows:
- Create a gitea temporary directory to dump things in
- Zip up the gitea temporary directory
- Move the zip file to the directory where the gitea dump command was run
- Leave the temporary gitea directory in place
The gitea temporary directory is inside of the temporary directory specified by the -t flag. As mentioned, this is /tmp by default. The gitea temporary directory will have the name gitea-dump-<random number> (e.g., gitea-dump-110153823)
The contents of the dump directory are the gitea database dump and all the repositories: gitea-db.sql, gitea-repo.zip
Extracting Git Log Info
Our goal was to extract log information from each repository, specifically to create a count of commits and the date and time of each one so that those could finally be assembled into a D3 calendar. To do that, the git log command was run from each repository directory, and the output processed to assemble CSV data about git commits.
Formatting git log using preconfigured format
Git already provides several formats for dumping out the logs:
- oneliner (commit and summary only)
- short (commit, author, title)
- medium (commit, author, date, title, commit message)
- full (commit, author, committer, title, commit message)
- email (from sha, from author, from date, subject, commit message)
Alternatively, you can customize the output format exactly by using output strings to control what text goes where. See below for examples.
Formatting git log using string
To use a custom string to format the output of git log, pass a format string to pretty:
git log --pretty=%H %ai %s
Formatting git log as JSON
There are many, many formatting stings to choose from, and you can get very fancy with the output. For example, this technique uses the pretty format to print each git commit item as a JSON in curlybrackets {}, and then wraps the resulting output in square brackets [] using perl.
git log \
    --pretty=format:'{%n  "commit": "%H",%n  "author": "%aN <%aE>",%n  "date": "%ad",%n  "message": "%f"%n},' \
    $@ | \
    perl -pe 'BEGIN{print "["}; END{print "]\n"}' | \
perl -pe 's/},]/}]/'
Via https://gist.github.com/textarcana/1306223