Pages

2018-04-24

Export Go Packages via 'go get' From Your Own Server

Self-Hosting Go Packages With Support For go get


[NOTE: Since originally posting I've clued in that what's documented here is only one way of achieving what's commonly referred to as 'vanity URLs' or 'vanity imports'. Adding this note here just to help anyone searching find this post more easily. -R.]

Go has a really neat package import tool, go get, to fetch packages from upstream sources into one's own local $GOPATH package tree. The 'big' sites like github.com, gitlab.io and others support use of go get from their project hosting spaces, which is cool, but they charge extra for hosting private code repos, or having more than a small fixed number of contributors, or other annoying limitations. Understandably these sites need some way to monetize their cloud offerings but for individuals or those with their own infrastructure there should be other ways that don't depend on the 'cloud' (ie., someone else's servers).

While the collaborative aspects of these sites and web-based features are their main draw (encouraging public pull requests for distributed development), perhaps you or your company want the convenience of using go get for your own repositories, but don't want to entrust your code repositories to one of these external entities.

Note: If you're considering moving off of github and self-hosting your repos, consider Gogs.io. It's really easy to set up and feels very familiar if you're used to github. Also, see my other post for notes on how to let Gogs.io refer to your legacy repos whilst preserving traditional access to your old repos in their original locations. 

The go get command and its import mechanism is described in the go command documentation, but to be frank, the docs for the go import mechanism aren't too clear on exactly how to set up one's own server to support it. One can't just go get a repo that is available via git clone without a lot of setup first.

Basic requirements:

  • Proper DNS 'A' record info for your package server
  • A common webserver (ie., apache v2 is used here but others are supported)
  • HTTPS enabled (ie., a properly-configured, authority-signed server cert -- sorry, self-signed won't work)
  • The git-http-backend helper tool (included with most git distributions)
  • Properly configured web server rewrite rules for calling git-http-backend when requests from go get are seen by your server


All these bits need to be set up 'just so' for the go get command to work smoothly, and the go docs don't really spell out the full setup, probably due to the myriad platforms and web servers out there.

I'll show here my setup, which isn't the most common, but should with ease adapt to other systems: Funtoo Linux + Apache v2. With some path adjustments this should apply to Ubuntu and other popular Linux distros.

I pieced together this tutorial from the following sources:

https://askjong.com/howto/automatically-enable-https-on-your-website-with-effs-certbot-deploying-lets-encrypt-certificates
https://kasunh.wordpress.com/2011/01/15/git-over-https/
https://www.creang.com/howtoforge/howto_set_up_git_over_https_with_apache_on_ubuntu/

I also studied the verbose output of go get -d -v to see just what the command was assuming when it tried to fetch things.

Basic Theory of 'go get'


The go get command works over SSH, HTTP or HTTPS, though it refuses to use plain HTTP unless one specifies the -insecure flag. This means generally you'll want to get your server's HTTPS cert setup working to avoid having to specify this every time, and, in the case of private repositories, to protect your proprietary source code from travelling over the open internet whenever go get is run.

The tool looks for files with special <meta> tags, which specify where to redirect the partial URI given by the go get command to the git-http-backend tool. In this way, one can store the actual repositories nearly anywhere on the system and move them around, without breaking the package URI published to users.

go get can fetch packages contained in each <meta> tag via either the ssh:// or https:// protocols. The ssh:// protocol will require a shell account on the hosting server for each of your contributors -- they'll be prompted for their password before go get will proceed to pull anything. This is good for private groups wishing to share both read (pull) and commit (push) access. For public repos or projects where you want team members to submit patches via other means like email or an external review tool, the https:// method is appropriate -- however it will require a web server with valid authority-signed cert to allow HTTPS.

Proper DNS 'A' record setup


You'll need to ensure your domain allows proper HTTP/HTTPS access with the bare domain (ie., foo.com should redirect to www.foo.com). go get and package imports in go source code expect just a domain name, not a host.domain syntax, eg. the Go source import statement

import   "example.org/go/mylib"

... implies one has previously performed

$ go get example.org/go/mylib

... which expects the server at example.org to resolve web requests with no host prefix. If you serve regular web content from the same server, you'll probably already have an 'A' record for www.example.org, but go get will require an 'A' record also for plain example.org. While you're doing this you might as well add a permanent redirect from example.org to www.example.org if you don't already have it.

Check your DNS configuration (if you control it yourself) or ask your admin to ensure there's an 'A' record for example.org  which maps to the same IP address as www.example.org. Sometimes this is named the '@' entry.

Apache modules required: mod_rewrite, mod_cgi, mod_alias, mod_env


The web server needs to do some URL rewriting and CGI operations in order to send go get requests to git-http-backend (ie., fetching git repos with the http:// or https:// prefix). For this you'll need to ensure the following Apache modules are enabled: mod_rewrite, mod_cgi, mod_alias, mod_env.

Enable the above modules by adding LoadModule directives in whatever manner your server  expects, eg., /etc/apache2/httpd.conf;  then add the  following .htaccess    rule   to   your   web   root   (mine,    using    apache2,    is in /var/www/localhost/htdocs/.htaccess):

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

This rule rewrites requests of the form

http://example.org/foo

to

http://www.example.org/foo


Configuring RewriteRule to allow proper <meta> tags per-repo


Now you need to somehow let Apache distinguish regular web traffic from 'go get' queries, which implicitly look for files served with a <meta> tag that is unique per package.

I experimented for a while without success, adding multiple  <meta>  tags, one for each repo, to my webroot's index.html <head> section, until I realized that 'go get' was only looking at the first <meta> tag it found. It turns out 'go get' expects there to be only one <meta> tag in a file, so each exported go package must have its own file with its own <meta> tag.

The solution is not to put <meta> tags into the webroot index.html at all, but rather to use another mod_rewrite rule to distinguish 'go get' requests by the repo name and point them to a unique URL for each. These URLs should reside within a subfolder of the webroot.

Add this line to the .htaccess file in your webroot (see 1. above, mine was /var/www/localhost/htdocs/.htaccess):

RewriteRule ^go/(.*)$ pkg/$1 [QSA]

The [QSA] means (I think) 'Query String Append', which keeps any CGI-style GET
params in the original URL and puts them back onto the end of the re-written
URL, which may be important for 'go get' as it sends a '?go-get=1' param for its own purposes.

Now, with the above rule, let's say you have a file structure like this in your webroot:

/var/www/localhost/htdocs/pkg/
/var/www/localhost/htdocs/pkg/foo
/var/www/localhost/htdocs/pkg/bar
/var/www/localhost/htdocs/pkg/private-baz

.. and git repositories served by your git-daemon in /var/git/foo.git, /var/git/bar.git, and /var/git/private-baz.git. You can set up files in the webroot that contain <meta> tags pointing to each:

[/var/www/localhost/htdocs/pkg/foo]
<meta name="go-import" content="example.com/go/foo git https://example.com/git/foo">

[/var/www/localhost/htdocs/pkg/bar]
<meta name="go-import" content="example.com/go/bar git https://example.com/git/bar">

[/var/www/localhost/htdocs/pkg/private-baz]
<meta name="go-import" content="example.com/go/private-baz git ssh://example.com/var/git/private-baz">

The files themselves don't need to be html files. They can be text files with just the <meta> tag.

NOTE 1: In the examples above, each go get exported repo is within a go/ subdirectory. This is required to give the apache2 server a pathname root to 'hook onto' for its RewriteRule, otherwise there's no way to tell other requests within your web server's URI space apart from ones specifically meant for go get. The sub-directory doesn't need to be named 'go', it could be anything; just as github places repos under your username, eg. github.com/ThisUser/that-repo.

NOTE 2: make sure your git-daemon has  --export-all, or  a file named git-daemon-export-ok in each public git repo. Test with regular git clone commands to verify each is fetchable before trying to use go get with <meta> tags. Repositories exported with ssh:// appear to use the git-daemon-export-ok file when determining whether a repo is available via go get, whilst ones exported in the <meta> tag via https:// listen to the Apache SetEnv statements (see below) which set the export permissions, since they're being served via the git-http-backend helper rather than via ssh.

More on Public vs. Private Package Repos

If you have some private packages that are not yet ready for the public eye, make note of the above example: the repo named 'private-baz' was exported in the <meta> tag via ssh://, not https://, so it will ask for authentication via ssh (password, phrase or host-key).

Exporting via <meta> tags, but using ssh:// in the git repo URI, doesn't require your webserver to have HTTPS set up, but will require the -insecure flag to 'go get' to convince it to even fetch the <meta> redirection info so it's still annoying and worth going full HTTPS on your webserver even if you're not publishing anonymous read (pull) go packages.

Finally, note the ssh:// URI for git repos usually has a slightly different path than git:// or https:// read-only URIs (note the /var/git/ path component in the third private-baz repo).

You can even serve out multiple users' repos via 'go get' this way, since using git  with  the  ssh:// (git+ssh) URI  syntax  lets  a  git-daemon  otherwise configured  to serve  public  repos from  /var/git or wherever, to also serve out individual users' private repos from their home dirs. For example I have
public repos in my /var/git/ and private repos in ~user/git/,  and both can be served to the 'go get' command via  appropriate  <meta> tags defined as above, with private ones doing authentication as expected.

git-http-backend Setup


In your main apache2 config (eg., httpd.conf or similar) add this:

SetEnv GIT_PROJECT_ROOT /var/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

RewriteCond %{QUERY_STRING} service=git-receive-pack
#[OR]
#RewriteCond %{REQUEST_URI} /git-receive-pack$
RewriteRule ^/git/ - [E=AUTHREQUIRED:yes]
<LocationMatch "^/git/">
  #apache 1.x# Deny from env=AUTHREQUIRED

  AuthType Basic
  AuthName "Git Access"
  Require all granted
  #apache 1.x# Require group committers
  #apache 1.x# Satisfy Any
</LocationMatch>


LetsEncrypt


Now, after all of the above, I discovered go get refuses to import packages with a self-signed
cert! What a pain.

If you don't already have HTTPS with a certificate-authority signed cert on your server, you'll need to get one. Either consult your business IT department for the server hosting all of this, or set up EFF's certbot utility. Thankfully the EFF has made it relatively easy for regular people to get a free certificate with valid signing for personal servers.

On Gentoo or Funtoo, the steps to install a LetsEncrypt cert are (as root):

# emerge app-crypt/certbot app-crypt/certbot-apache
#
# certbot certonly --webroot -w /var/www/localhost/htdocs/ -d example.com -w /var/www/localhost/htdocs/ -d www.example.com

Now, verify the Apache configuration from all previous steps and restart the web server:

# apache2ctl configtest
# rc-config restart apache2

Now test out your fancy go get-able package server!

[from some other host or account]
$ go get example.com/go/foo
$ ls $GOPATH/src/example.com/go/foo

This is the minimum setup just to get HTTPS working with Apache v2 for your primary domain, to make go get happy. If you have multiple 'vhost' domains or other complex requirements, you're on your own.. I'm still trying to get my server to server full HTTPS for all of the domains it hosts.

Conclusion

While the go get command is the preferred way for golang programmers to fetch external packages into their working $GOPATH tree, the documentation is not extremely helpful in setting up all of the server-side bits that are required to support it. Individuals or organizations may want a mixture of public (read-only) as well as private/group read/write (pull/push) repos exported via go get without the risks or costs associated with hosting via an external party.

A self-hosted golang package server supporting the standard go get command can be implemented by configuring a webserver with proper type 'A' domain records, HTTPS plus a valid authority-signed certificate, proper git-http-backend tool configuration, URL rewrite rules and package export <meta> tags placed within the webroot on a per-package basis.

No comments:

Post a Comment