Are you a cat owner who has ever used catnip? Please take a short survey.

created: 20 Mar 2011; modified: 03 Jun 2014; status: finished; belief: highly likely

This Haskell tutorial was written in early March 2011, and while the below code worked then, it may not work with Github or the necessary Haskell libraries now. If you are interested in downloading from GitHub I suggest looking into the GitHub API or the Haskell github library.

Along the lines of Archiving URLs, I like to keep copies of Haskell-related source-code repositories1 because the files & history might come in handy on occasion, and because having a large collection of repositories lets me search them for random purposes.

(For example, part of my lobbying for Control.Monad.void2 was based on producing a list of dozens of source files which rewrote that particular idiom, and I have been able to usefully comment & judge based on crude statistics gathered by grepping3 through my hundreds of repositories.)

Previously I wrote a simple script to download the repositories of the source repository hosting site Patch-tag.com. Patch-tag specializes in hosting Darcs repositories (usually Haskell-related). GitHub is a much larger & more popular hosting site, and though it does not support Darcs but git (as the name indicates), it is so popular that it still hosts a great deal of Haskell. I’ve downloaded a few repositories out of curiosity or because I was working on the contents of the repository (eg. gitit), but there are too many to download manually. I needed a script.

Patch-tag was nice enough to supply a URL which provided exactly the URLs I needed, but I couldn’t expect such personalized support from GitHub. GitHub does supply an API of sorts for developers and hobbyists, said API provides no obvious way to get what I want: ‘URLs for all Haskell-related repos’. So, scraping it is - I’d write a script to munge some GitHub HTML and get the URLs I want that way.

# Archiving GitHub

## Parsing pages

The closest I can get to a target URL is https://github.com/languages/Haskell/created. We’ll be parsing that. The first thing to do is to steal TagSoup code from my previous scrapers, so our very crudest version looks like this:

import Text.HTML.TagSoup

main = do html <- openURL "https://github.com/languages/Haskell/created"
linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts]

We run it and it throws an exception! *** Exception: getAddrInfo: does not exist (No address associated with hostname)

Oops. We got all wrapped up in parsing the HTML we forgot to make sure that downloading worked in the first place. Well, we’re lazy programmers, so now, on demand, we’ll investigate that problem. The exception thrown sounds like a problem with the openURL call - ‘hostname’ is a networking term, not a parsing or printing term. So we try running just openURL "https://github.com/languages/Haskell/created" - same error. Not helpful.

We try a different implementation of openURL, mentioned in the other scraping script:

import Network.HTTP (getRequest, simpleHTTP)
openURL = simpleHTTP . getRequest

Calling that again, we see:

> openURL "https://github.com/languages/Haskell/created"
Right HTTP/1.1 301 Moved Permanently
Server: nginx/0.7.67
Date: Tue, 15 Mar 2011 22:54:31 GMT
Content-Type: text/html
Content-Length: 185
Connection: close
Location: https://github.com/languages/Haskell/created

Oh dear. It seems that the HTTP package just won’t handle HTTPS; nor does the description mention HTTPS nor any of the module names seem connected. Best to give up entirely on it.

If we google ‘Haskell https’, one of the first 20 hits happens to be a Stack Overflow question/page which sounds promising: “Haskell Network.Browser HTTPS Connection”. The one answer says to simply use the Haskell binding to curl. Well, fine. I already had that installed because Darcs uses the binding for downloads. I’ll use that package.4

We go to the top level module hoping for an easy download. Scrolling down, one’s eye is caught by a curlGetString, which while not necessarily a promising name, does have an interesting type: URLString -> [CurlOption] -> IO (CurlCode, String).

Note especially the return value - from past experience with the HTML package, one would give a good chance that the URLString is just a type synonym for a URL string and the String return just the HTML source we want. What CurlOption might be, I have no idea, but let’s try simply omitting them all. So we load the module in GHCi (:module + Network.Curl) and see what curlGetString "https://github.com/languages/Haskell/created" [] does:

(CurlOK,"<!DOCTYPE html>
<html>
<meta charset='utf-8'>
<meta http-equiv=\"X-UA-Compatible\" content=\"chrome=1\">
<title>Recently Created Haskell Repositories - GitHub</title>
href=\"/opensearch.xml\" title=\"GitHub\" />
...")

Great! As they say, ‘try the simplest possible thing that could possibly work’, and this seems to. We don’t really care about the exit code, since this is a hacky script5; we’ll throw it away and only keep the second part of the tuple with the usual snd. It’s in IO so we need to use liftM or fmap before we can apply snd. Combined with our previous Tagsoup code, we get:

import Text.HTML.TagSoup
import Network.Curl (curlGetString, URLString)

main :: IO ()
main = do html <- openURL "https://github.com/languages/Haskell/created"

openURL :: URLString -> IO String
openURL target = fmap snd $curlGetString target [] linkify :: String -> [String] linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts] ## Spidering (the lazy way) What’s the output of this? ["logo boring","https://github.com","/plans","/explore","/features","/blog", "/login?return_to=https://github.com/languages/Haskell/created", "/languages/Haskell","/explore","explore_main","/repositories","explore_repos","/languages", "selected","explore_languages","/timeline","explore_timeline","/search","code_search", "/tips","explore_tips","/languages/Haskell","/languages/Haskell/created", "selected","/languages/Haskell/updated", "/languages","/languages/ActionScript/created", "/languages/Ada/created","/languages/Arc/created","/languages/ASP/created", "/languages/Assembly/created", "/languages/Boo/created","/languages/C/created","/languages/C%23/created", "/languages/C++/created","/languages/Clojure/created", "/languages/CoffeeScript/created", "/languages/ColdFusion/created","/languages/Common%20Lisp/created","/languages/D/created", "/languages/Delphi/created","/languages/Duby/created", "/languages/Eiffel/created", "/languages/Emacs%20Lisp/created","/languages/Erlang/created","/languages/F%23/created", "/languages/Factor/created","/languages/FORTRAN/created", "/languages/Go/created", "/languages/Groovy/created","/languages/HaXe/created","/languages/Io/created", "/languages/Java/created","/languages/JavaScript/created", "/languages/Lua/created", "/languages/Max/MSP/created","/languages/Nu/created","/languages/Objective-C/created", "/languages/Objective-J/created","/languages/OCaml/created", "/languages/ooc/created", "/languages/Perl/created","/languages/PHP/created","/languages/Pure%20Data/created", "/languages/Python/created","/languages/R/created", "/languages/Racket/created", "/languages/Ruby/created","/languages/Scala/created","/languages/Scheme/created", "/languages/sclang/created","/languages/Self/created", "/languages/Shell/created", "/languages/Smalltalk/created","/languages/SuperCollider/created","/languages/Tcl/created", "/languages/Vala/created","/languages/Verilog/created", "/languages/VHDL/created", "/languages/VimL/created","/languages/Visual%20Basic/created","/languages/XQuery/created", "/brownnrl","/brownnrl/Real-World-Haskell","/joh", "/joh/tribot","/bjornbm","/bjornbm/publicstuff","/codemac","/codemac/yi","/poconnell93", "/poconnell93/chat","/jillianfu","/jillianfu/Angel","/jaspervdj","/jaspervdj/sup-host","/serras", "/serras/scion-ghc-7-requisites","/serras","/serras/scion","/iand675","/iand675/cgen","/shangaslammi", "/shangaslammi/haskeroids","/rukav","/rukav/ReplayTrace","/jaspervdj","/jaspervdj/wol","/tomlokhorst", "/tomlokhorst/wol","/bos","/bos/concurrent-barrier","/jkingry","/jkingry/projectEuler","/olshanskydr", "/olshanskydr/xml-enumerator","/lorenz","/lorenz/fypmaincode","/jaspervdj", "/jaspervdj/data-object-json","/jaspervdj","/jaspervdj/data-object-yaml", "/languages/Haskell/created?page=2","next","/languages/Haskell/created?page=3", "/languages/Haskell/created?page=4","/languages/Haskell/created?page=5", "/languages/Haskell/created?page=6","/languages/Haskell/created?page=7", "/languages/Haskell/created?page=8","/languages/Haskell/created?page=9", "/languages/Haskell/created?page=208","/languages/Haskell/created?page=209", "/languages/Haskell/created?page=2","l","next","http://www.rackspace.com","logo", "http://www.rackspace.com ","http://www.rackspacecloud.com","https://github.com/blog", "/login/multipass?to=http%3A%2F%2Fsupport.github.com","https://github.com/training", "http://jobs.github.com","http://shop.github.com", "https://github.com/contact","http://develop.github.com","http://status.github.com", "/site/terms","/site/privacy","https://github.com/security", "nofollow","?locale=de","nofollow","?locale=fr","nofollow","?locale=ja", "nofollow","?locale=pt-BR","nofollow","?locale=ru","nofollow","?locale=zh", "#","minibutton btn-forward js-all-locales","nofollow","?locale=en","nofollow","?locale=af", "nofollow","?locale=ca","nofollow","?locale=cs","nofollow","?locale=de", "nofollow","?locale=es","nofollow","?locale=fr","nofollow","?locale=hr", "nofollow","?locale=hu","nofollow","?locale=id","nofollow","?locale=it", "nofollow","?locale=ja","nofollow","?locale=nl","nofollow","?locale=no", "nofollow","?locale=pl","nofollow","?locale=pt-BR","nofollow","?locale=ru", "nofollow","?locale=sr","nofollow","?locale=sv","nofollow","?locale=zh", "#","js-see-all-keyboard-shortcuts"] Quite a mouthful, but we can easily filter things down. “/languages/Haskell/created?page=3” is an example of a link to the next page listing Haskell repositories; presumably the current page would be “?page=1”, and the highest listed seems to be “/languages/Haskell/created?page=209”. The actual repositories look like “/jaspervdj/data-object-yaml”. The regularity of the “created” numbering suggests that we can avoid any actual spidering. Instead, we could just figure out what the last page is, the highest page, and then generate all the page names in between because they follow a simple scheme. Assume we have the final number, n, we already know we get the full list through [1..n]; then we want to prepend “languages/Haskell/created?page=”, but it’s a type error to simply write map ("languages/Haskell/created?page="++) [1..n]. There is only one type-variable in (++) :: [a] -> [a] -> [a]. To convert the Integers to a proper String, we do map show, so that gives us our generator: listPages :: [String] listPages = map (\x -> "https://github.com/languages/Haskell/created?page=" ++ show x) [1..] (This will throw a warning using -Wall because GHC has to guess whether the 1 is an Int or Integer. This can be quieted by writing (1::Int) instead.) But what is x? We don’t know the final, highest, oldest page. We don’t know how much of our infinite lazy list to take. It’s easy enough to filter the list to get only the index: filter (isPrefixOf "/languages/Haskell/created?page="). Then we call last, right? (Or something like head . reverse if we didn’t know last or if we didn’t think to check the hits for [a] -> a). But if you look back at the original scraping output, you see an example of how a simple approach can go wrong; we read “/languages/Haskell/created?page=209” and then we read “/languages/Haskell/created?page=2”! 2 is less than 209, of course, and is the wrong answer. GitHub is not padding the numbers to look like “created?page=002”, so our simple-minded approach doesn’t work. So we need to extract the number. Easy enough: the prefix is statically known and never changes, so we can hardwire some crude parsing using drop: drop 32. How to turn the remaining String into an Int? Hopefully one knows about read, but even here Hoogle will save our bacon if we think to look through the list of hits for String -> Int - read turns up as hit #10 or #11. Then, now that we have turned our [String] into [Int], we could sort it and take the last entry, or again go to the standard library and use maximum (like read, it will turn up for [Int] -> Int, if not as highly ranked as one might hope). Tweaking the syntax a little, our final result is: lastPage :: [String] -> Int lastPage = maximum . map (read . drop 32) . filter ("/languages/Haskell/created?page=" isPrefixOf) If we didn’t want to hardwire this for Haskell, we’d probably write the function with an additional parameter and replace the Int with a runtime calculation of what to remove: lastPageGeneric :: String -> [String] -> Int lastPageGeneric lang = maximum . map (read . drop (length lang)) . filter (lang isPrefixOf) So let’s put what we have together. The program can download an initial index page, parse it, find the name of the last index page, and generate the URLs of all index pages, and print those out (to prove that it all works): import Data.List (isPrefixOf) import Network.Curl (curlGetString, URLString) import Text.HTML.TagSoup main :: IO () main = do html <- openURL "https://github.com/languages/Haskell/created" let lst = lastPage$ linkify html
let indxPgs = take lst listPages
print indxPgs

openURL :: URLString -> IO String
openURL target = fmap snd $curlGetString target [] linkify :: String -> [String] linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts] lastPage :: [String] -> Int lastPage = maximum . map (read . drop 32) . filter ("/languages/Haskell/created?page=" isPrefixOf) listPages :: [String] listPages = map (\x -> "https://github.com/languages/Haskell/created?page=" ++ show x) [(1::Int)..] So where were we? We had a [String] (in a variable named indxPgs) which represents all the index pages. We can get the HTML source of each page just by reusing openURL (it works on the first one, so it stands to reason it’d work on all index pages), which is trivial by this point: mapM openURL indxPgs. ## Filtering repositories In the TagSoup result, we saw the addresses of the repositories listed on the first index page: ["/brownnrl","/brownnrl/Real-World-Haskell","/joh","/joh/tribot","/bjornbm","/bjornbm/publicstuff", "/codemac","/codemac/yi","/poconnell93","/poconnell93/chat","/jillianfu","/jillianfu/Angel", "/jaspervdj","/jaspervdj/sup-host","/serras","/serras/scion-ghc-7-requisites","/serras","/serras/scion", "/iand675","/iand675/cgen","/shangaslammi","/shangaslammi/haskeroids","/rukav","/rukav/ReplayTrace", "/jaspervdj","/jaspervdj/wol","/tomlokhorst","/tomlokhorst/wol","/bos","/bos/concurrent-barrier", "/jkingry","/jkingry/projectEuler","/olshanskydr","/olshanskydr/xml-enumerator","/lorenz", "/lorenz/fypmaincode","/jaspervdj","/jaspervdj/data-object-json","/jaspervdj", "/jaspervdj/data-object-yaml"] Without looking at the rendered page in our browser, it’s obvious that GitHub is linking first to whatever user owns or created the repository, and then linking to the repository itself. We don’t want the users, but the repositories. Fortunately, it’s equally obvious that this is true: no user page has two forward-slashes in it, while all repository pages have two forward-slashes in it. So we want to count the forward-slashes and keep every address with exactly 2 forward-slashes. The type for our function takes a list, a possible entry in that list, and returns a count. This is easy to do with primitive recursion and an accumulator, or perhaps length combined with filter; but the base library already has functions for a -> [a] -> Int. elemIndex annoyingly returns a Maybe Int, so we’ll use elemIndices instead and call length on its output: length (elemIndices '/' x) == 2. This is not quite right. If we run this on the original parsed output, we get ["https://github.com","/languages/Haskell","/languages/Haskell","/plategreaves/unordered-containers", "/vincenthz/hs-tls-extra","/aculich/fix-symbols-gitit", "/sphynx/euler-hs","/DRMacIver/unordered-containers","/hamishmack/yesod-slides","/GNUManiacs/hoppla", "/DRMacIver/hs-rank-aggregation","/naota/hackage-autoebuild", "/magthe/hsini","/dagit/gnuplot-test","/imbaczek/HBPoker","/sergeyastanin/simpleea","/cbaatz/hamu8080", "/aristidb/xml-enumerator", "/elliottt/value-supply","/gnumaniacs-org/hoppla","/emillon/tyson","/quelgar/hifive", "/quelgar/haskell-websockets","http://www.rackspace.com", "http://www.rackspace.com ","http://www.rackspacecloud.com", "/login/multipass?to=http%3A%2F%2Fsupport.github.com","http://jobs.github.com", "http://shop.github.com","http://develop.github.com", "http://status.github.com","/site/terms","/site/privacy"] It doesn’t look like we mistakenly omitted a repository, but it does look like we mistakenly included things we should not have. We need to filter out anything beginning with a “http://”, “https://”, “/site/”, “/languages/”, or “/login/”.6 We could call filter multiple times, or use a tricky foldr to accumulate only results which don’t match any of the items in our list ["/languages/", "/login/", "/site/", "http://", "https://"]. But I already wrote the solution to this problem back in the original WP RSS archive-bot where I noticed that my original giant filter call could be replaced by a much more elegant use of any  where uniq :: [String] -> [String] uniq = filter (\x ->not$ any (flip isInfixOf x) exceptions)

exceptions :: [String]
exceptions = ["wikimediafoundation", "http://www.mediawiki.org/", "wikipedia",
"&curid=", "index.php?title=", "&action="]

In our case, we replace isInfixOf with isPrefixOf, and we have different constants defined in exceptions. To put it all together into a new filtering function, we have:

repos :: String -> [String]
where  uniq :: [String] -> [String]
uniq = filter count . filter (\x -> not $any (isPrefixOf x) exceptions) exceptions :: [String] exceptions = ["/languages/", "/login/", "/site/", "http://", "https://"] count :: String -> Bool count x = length (elemIndices '/' x) == 2 Our new minimalist program, which will test out repos: import Data.List (elemIndices, isPrefixOf) import Network.Curl (curlGetString, URLString) import Text.HTML.TagSoup main :: IO () main = do html <- openURL "https://github.com/languages/Haskell/created" print$ repos html

openURL :: URLString -> IO String
openURL target = fmap snd $curlGetString target [] linkify :: String -> [String] linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts] repos :: String -> [String] repos = uniq . linkify where uniq :: [String] -> [String] uniq = filter count . filter (\x -> not$ any (isPrefixOf x) exceptions)
exceptions :: [String]
exceptions = ["/languages/", "/login/", "/site/", "http://", "https://"]
count :: String -> Bool
count x = length (elemIndices '/' x) == 2

The output:

["/plategreaves/unordered-containers","/vincenthz/hs-tls-extra","/aculich/fix-symbols-gitit",
"/sphynx/euler-hs","/DRMacIver/unordered-containers","/hamishmack/yesod-slides",
"/GNUManiacs/hoppla","/DRMacIver/hs-rank-aggregation","/naota/hackage-autoebuild","/magthe/hsini",
"/dagit/gnuplot-test","/imbaczek/HBPoker",
"/sergeyastanin/simpleea","/cbaatz/hamu808a0","/aristidb/xml-enumerator","/elliottt/value-supply",
"/gnumaniacs-org/hoppla","/emillon/tyson",
"/quelgar/hifive","/quelgar/haskell-websockets"]

## Shelling out to git

That leaves the ‘shell out to git’ functionality. We could try stealing the spawn (call out to /bin/sh) code from XMonad, but the point of spawn is that it forks away completely from our script, which will completely screw up our desired lack of parallelism.7 I ultimately wound up using a function from System.Process, readProcessWithExitCode. (Why readProcessWithExitCode and not readProcess? Because if a directory already exists, git/readProcess throws an exception which kills the script!) This will work:

shellToGit :: String -> IO ()
shellToGit u = do (_,y,_) <- readProcessWithExitCode "git" ["clone", u] ""
print y

In retrospect, it might have been a better idea to try to use runCommand or System.Cmd. Alternatively, we could use the same shelling out functionality from the original patch-tag.com script:

mapM_ (\x -> runProcess "darcs" ["get", "--lazy", "http://patch-tag.com"++x]
Nothing Nothing Nothing Nothing Nothing) targets

Which could be rewritten for us (sans logging) as

shellToGit :: String -> IO ()
shellToGit u = runProcess "git" ["clone", u] Nothing Nothing Nothing Nothing Nothing >> return ()
-- We could replace return () with Control.Monad.void to drop IO ProcessHandle result

Now it’s easy to fill in our 2 missing lines:

          ...
repourls <- mapM getRepos indxPgs

## The script

The final program, clean of -Wall or hlint warnings:

import Data.List (elemIndices, isPrefixOf, sort)
import Network.Curl (curlGetString, URLString)
import System.FilePath (dropExtension)
import Text.HTML.TagSoup

main :: IO ()
main = do html <- openURL "https://github.com/languages/Haskell/created"
let lst = lastPage $linkify html let indxPgs = take lst listPages repourls <- mapM getRepos indxPgs let gitURLs = map gitify$ sort $concat repourls mapM_ shellToGit gitURLs openURL :: URLString -> IO String openURL target = fmap snd$ curlGetString target []

linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts]

lastPage :: [String] -> Int
lastPage = maximum . map (read . drop 32) . filter ("/languages/Haskell/created?page=" isPrefixOf)

listPages :: [String]
listPages = map (\x -> "https://github.com/languages/Haskell/created?page=" ++ show x) [(1::Int)..]

repos :: String -> [String]
where  uniq :: [String] -> [String]
uniq = filter count . filter (\x -> not $any (isPrefixOf x) exceptions) exceptions :: [String] exceptions = ["/languages/", "/login/", "/site/", "http://", "https://"] count :: String -> Bool count x = length (elemIndices '/' x) == 2 getRepos :: String -> IO [String] getRepos = fmap repos . openURL gitify :: String -> String gitify x = "https://github.com" ++ x ++ ".git" shellToGit :: String -> IO () shellToGit u = do (_,y,_) <- readProcessWithExitCode "git" ["clone", u, dropExtension$ drop 19 u] ""
print y

This, or a version of it, works well. But I caution people from mis-using it! There are a lot of repositories on GitHub; please don’t go running this carelessly. It will pull down 4-12 gigabytes of data. GitHub is a good FLOSS-friendly business by all accounts, and doesn’t deserve people wasting its bandwidth & money if they are not even going to keep what they downloaded.

### The script golfed

For kicks, let’s see what a shorter, more unmaintainable and unreadable, version looks like (in the best scripting language tradition):

import Data.List (elemIndices, isPrefixOf, sort)
import Network.Curl (curlGetString)
import System.FilePath (dropExtension)
import Text.HTML.TagSoup

main = do html <- openURL "https://github.com/languages/Haskell/created"
let i = take (lastPage $linkify html)$
repourls <- mapM (fmap (uniq . linkify) . openURL) i
mapM_ shellToGit $map (\x -> "https://github.com" ++ x ++ ".git")$ sort $concat repourls where openURL target = fmap snd$ curlGetString target []
linkify l = [x | TagOpen "a" atts <- parseTags l, (_,x) <- atts]
lastPage = maximum . map (read . drop 32) .
filter ("/languages/Haskell/created?page=" isPrefixOf)
uniq = filter count . filter (\x -> not $any (isPrefixOf x) ["/languages/", "/login/", "/site/", "http://", "https://"]) count x = length (elemIndices '/' x) == 2 shellToGit u = do { (_,y,_) <- readProcessWithExitCode "git" ["clone", u, dropExtension$ drop 19 u] ""; print y }

14 lines of code isn’t too bad, especially considering that Haskell is not usually considering a language suited for scripting or scraping purposes like this. Nor do I see any obvious missing abstractions - count is a function that might be useful in Data.List, and openURL is something that the Curl binding could provide on its own, but everything else looks pretty necessary.

1. Once one has all those repositories, how does one keep them up to date? The relevant command is git pull. How would one run this on all the repositories? In shell script? Using find? From a crontab?8
2. In the previous script, a number of short-cuts were taken which render it Haskell-specific. Identify and remedy them, turning this script into a general-purpose script for downloading any language for which GitHub has a category. (The language name can be accessed by reading an argument to the script by standard functions like getArgs.)

1. This page assumes a basic understanding of how version control programs work, Haskell syntax, and the Haskell standard library. For those not au courant, repositories are basically a collection of logically related files and a detailed history of the modifications that built them up from nothing.

2. From my initial email:

I’d think it [Control.Monad.void] [would] be useful for more than just me. Agda is lousy with calls to >> return (); and then there’s ZMachine, arrayref, whim, the barracuda packages, binary, bnfc, buddha, bytestring, c2hs, cabal, chesslibrary, comas, conjure, curl, darcs, darcs-benchmark, dbus-haskell, ddc, dephd, derive, dhs, drift, easyvision, ehc, filestore, folkung, geni, geordi, gtk2hs, gnuplot, ginsu, halfs, happstack, haskeline, hback, hbeat… You get the picture.

4. Besides curl, there is the http-wget wrapper, and the http-enumerator package claims to natively support HTTPS. I have not tried them.
7. A quick point: in the previous scripts, I went to some effort to get greater parallelism, but in this case, we don’t want to hammer GitHub with a few thousand simultaneous git clone invocations; Haskell repositories are created rarely enough that we can afford to be friendly and only download one repository at a time.
8. Example cron answer: @weekly find ~/bin -type d -name ".git" -execdir nice git pull \;