- Example retrospective: Debian
- Judging Haskell SoCs
- See Also
- External links
The Haskell Summer of Codes have often produced excellent results, but how excellent is excellent? Are there any features or commonalities between successful projects or unsuccessful ones?
In 2009, a blogger & Debian developer wrote a four part retrospective series on the 2008 Debian Summer of Code projects. The results are interesting: some projects were a failure and the relevant student drifted away and had little to do with Debian again; and some were great successes. I don’t discern any particular lessons there, except perhaps one against hubris or filling unclear needs. I decided to compile my own series of retrospectives on the Haskell Summers of Code.
Google describes SoC as
…a global program that offers students stipends to write code for open source projects. We have worked with the open source community to identify and fund exciting projects for the upcoming summer.1
…a global program that offers student developers stipends to write code for various open source software projects. We have worked with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together over 4500 successful student participants and over 3000 mentors from over 100 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.2
It is intended to produce source code for the
use and benefit of all; it is not meant to produce academic papers, code curiosities, forgotten blog posts, groundwork for distant projects, but
exciting new production code. This is the perspective I take in trying to assess SoC projects: did it ship anything? If standalone, are the results in active use by more than a few developers or other codebases? If a modification to an existing codebase, was it merged and now is actively maintained3? And so on. Sterling Clover argues that this is far too demanding and does not consider whether an involved student is energized by his contribution to go on and contribute still more4; I disagree about the former, and I have not done the latter because it would be too labor-intensive to track down every student and assess their later contributions, which would involve still more subjective appraisals5. (Perhaps in the future I or another Haskeller will do that.)
Haskell wasn’t part of the first Summer of Code in 2005, but it was accepted for 2006. We start there
The 2006 homepage lists the following projects:
Unsuccessful. This ultimately resulted in the HsJudy library (
Fast Mutable Collection Types for Haskell; Caio Marcelo de Oliveira Filho, mentored by Audrey Tang
fast mutable collectionhere meaning
array). HsJudy was apparently used in Pugs at one time, but no more.
Successful. Haddock has used the GHC API ever since.6
Port Haddock to use GHC; David Waern, mentored by Simon Marlow
Successful? Was initially unsuccessful, but seems to’ve been picked up again.
A model for client-side scripts with HSP; Joel Björnson, mentored by Niklas Broberg
Successful. The GHCi debugger was accepted into GHC HEAD, and is in production use.
GHCi based debugger for Haskell; José Iborra López, mentored by David Himmelstrup
Unsuccessful. HaskellNet is dead, was noted to be
HaskellNet; Jun Mukai, mentored by Shae Erisson
uncompleted, and none of it has propagated elsewhere. (I’m not entirely sure what happened with the HaskellNet code - I know of two repos, but that’s about it.) Shae tells me that this poor uptake is probably due to a lack of advertising, and not any actual defect in the HaskellNet code.
Unsuccessful. According to Don Stewart’s outline of the 2006 SoC, this project was not completed.
Language.C - a C parser written in Haskell; Marc van Woerkom, mentored by Manuel Chakravarty
Unsuccessful. See the Language.C SoC
Implement a better type checker for Yhc; Leon P Smith, mentored by Malcolm Wallace
Successful. Code lives on as cabal-install, which we all know and love.
Thin out cabal-get and integrate in GHC; Paolo Martini, mentored by Isaac Jones
Storable a => ByteString a; Spencer Janssen, mentored by Don Stewart
4 successful; 2 unsuccessful; and 2 failures.
The 2007 homepage lists:
Successful. The work was successful in almost completely getting rid of the exponential conflict bug, and has been in released Darcs for years
Darcs conflict handling; Jason Dagit, mentored by David Roundy
Successful. The auto build and doc generation are long-standing and very useful parts of Hackage.
Automated building of packages and generation of Haddock documentation; Sascha Böhme, mentored by Ross Paterson
Successful? According to the TMR writeup, the type-checker code has made it into YHC. (I add a question mark because YHC is so little used.)
Rewrite the typechecker for YHC and nhc98; Mathieu Boespflug, mentored by Malcolm Wallace
Successful. Cabal configurations are very useful for enabling/disabling things and are extremely common in the wild.
Cabal Configurations; Thomas Schilling, mentored by Michael Isaac Jones
Unsuccessful. The update apparently happened, since the Hat homepage says
Update the Hat tracer; Kenn Knowles, mentored by Malcolm Wallace
Version 2.06 released 2nd Oct 2008, but it is described as unmaintained, and I can’t seem to find any examples of people actually using Hat.
Successful?. The performance is still so terrible that few people use it.
Generalizing Parsec to ParsecT and arbitrary input (ByteStrings); Paolo Martini, mentored by Philippa Jane Cowderoy
Successful. The situation is unclear to me, but I know that for some period dynamic linking worked for some platforms. However, it’s 2010 and I still have static linking, although GHC 6.12 apparently gets dynamic linking; so I’m going to chalk this one up as a mixed success.
Shared Libraries for GHC; Clemens Fruhwirth, mentored by Simon Marlow
Unknown The archived homepage homepage and repository indicate that the package name was curl and indeed a curl binding of that name exists - but none of the metadata points to Bąk as either author or maintainer; if it is the same package, it is pretty successful with 158 reverse dependencies
Libcurl; Mieczysław Bąk, mentored by Bryan O’Sullivan
Extending GuiHaskell: An IDE for Haskell Hackers; Asumu Takikawa, mentored by Neil David Mitchell
Unsuccessful. GuiHaskell does not exist in any usable form. (The homepage summarizes the situation thusly:
Warning: This project is fragile, unfinished, and I do not recommend that anyone tries using it.)
6 successes; 2 unsuccessful; 1 unknown.
The 2008 homepage isn’t kind enough to list all the projects, but it does tell us that only 7 projects were accepted by Google.
So we can work from the code.google.com page which lists 6:
Successful. The first try failed, but the second won through, and now people are doing things like parsing the Linux kernel with it.
C99 Parser/Pretty-Printer; by Benedikt Huber, mentored by Iavor Diatchki
Unsuccessful. GMap is on Hackage, but there are 0 users after 3 years.
GMap - Fast composable maps; by Jamie Brandon. mentored by Adrian Charles Hey
Successful. The improved performance and search capability have made it into Hoogle releases, and Hoogle is one of the more popular Haskell applications (with 1.7m web searches).
Haskell API Search; Neil Mitchell, mentored by Niklas Broberg
Unsuccessful. (His code wound up becoming hbuild, which is not on Hackage or apparently used by anyone.)
Cabal; Andrea Vezzosi, mentored by Duncan Coutts
Unsuccessful? As of January 2010, the patch adding plugins functionality has yet to be accepted & applied; as of February 2011, the ticket remains open and the code unmerged. The code is apparently not yet bitrotten by the passage of 3 years but how long can its luck last? The code was finally merged in 4 August 2011; the docs do not list any users.
GHC plugins; Maximilian Conroy Bolingbroke, mentored by Sean Seefried
Unsuccessful. It seems to be finished but no use made of the actual engine that I can see mentioned on the engine’s blog. (I would give reverse dependency statistics, but Hpysics seems to have never been uploaded to Hackage.)
Data parallel physics engine; Roman Cheplyaka, mentored by Manuel M. T. Chakravarty
GHC API; Thomas Schilling, mentored by Simon Marlow
Unsuccessful. Schilling’s fixes went in, but they were in general minor changes (like adding the GHC monad) or bug-fixes; the GHC API remains a mess.
2 successful, 5 unsuccessful.
Don Stewart writes in reply to the foregoing:
"We explicitly pushed harder in 2008 to clarify and simplify the goals of the projects, ensure adequate prior Haskell experience and to focus on libraries and tools that directly benefit the community.
And our success rate was much higher.
So: look for things that benefit the largest number of Haskell developers and users, and from students with proven Haskell development experience. You can’t learn Haskell from zero on the job, during SoC."
- The Monad.Reader’s Issue 12
5 projects were accepted this year; Darcs tried to apply in its own right was rejected.
In general, these looked good. Most of them will be widely useful – especially the Darcs and Haddock SoCs – or address longstanding complaints (many criticisms of laziness revolve around how unpredictable it makes memory consumption). The only one that bothers me is the EclipseFP project. I’m not sure Eclipse is common enough among Haskellers or potential Haskellers to warrant the effort7, but at least the project is focused on improving an existing plugin than writing one from scratch. The 5 were:
- hashed-storage exists and is used in Darcs, but from watching the bugtracker traffic, it’s unclear whether Darcs saw a net gain from it.
Successful. Niklas added a large number of patches but it’s unclear to mean what practical benefit it adds besides handling comments now (which was useful for hlint). Speaking practically, haskell-src has 104 reverse dependencies, and haskell-src-exts has 223; so the latter seems to have indeed surpassed its predecessor.
haskell-src-exts -> haskell-src; by Niklas Broberg; mentored by Neil Mitchell
Successful?. Dupree’s patches have been applied to head and apparently make cross-package links usually work.
Haddock improvements; by Isaac Dupree; mentored by David Waern
Successful. hp2any seems quite alive and usable.
Improving space profiling experience; by Gergely Patai; mentored by Johan Tibell
Extend EclipseFP functionality for Haskell; by Thomas ten Cate; mentored by Thomas Schilling
Unsuccessful. See Cate’s summing-up.
3 successful, 1 unknown, 1 unsuccessful.
7 projects were accepted:
Improvements to Cabal’s test support; Thomas Tuegel, mentored by Johan TibellSuccessful? The functionality is now in a released version of
cabal-installand a number of packages use the provided test syntax.8
Infrastructure for a more social Hackage 2.0; Matthew Gruen, mentored by Edward KmettUnknown. Gruen’s blog was last updated October 2010, and Hackage still hasn’t switched over and gotten the new features & benefit of the rewrite. But the code exists and there is a running public demo, so this may yet be a success.
A high performance HTML generation library; Jasper Van der Jeugt, mentored by Simon MeierSuccessful blaze-html has been released and is actively developed; version 0.4.0.0 has 50 total reverse dependencies and blaze-builder has 97 reverse dependencies though there’s much overlap. (This site is built on hakyll, which uses blaze-html.)
Improvements to the GHC LLVM backend; Alp Mestanogullari, mentored by Maximilian Bolingbroke
Unsuccessful. Dan Peebles in #haskell says that Alp’s SoC never got off the ground when his computer died at the beginning of the summer; with nothing written or turned in, this can’t be considered a successful SoC, exactly. But could it have been?The LLVM backend is still on track to become the default GHC backend9, suggesting that it’s popular in GHC HQ (and the DDC dialect), and it seems to also be popular among Haskell bloggers. The scope is restricted to taking a working backend and optimizing it. In general, it seems like a decent SoC proposal, and better than the next one:
Implementing the Immix Garbage Collection Algorithm; Marco Túlio Gontijo e Silva, mentored by Simon Marlow
Unsuccessful. The GHC repository history, as of 4 February, contains no patches adding Immix GC. Silva writes in his blog’s SoC summary that
Although the implementation is not mature enough to be included in the repository, I’m happy with the state it is now. I think it’s a good start, and I plan to keep working on it.(His new blog, begun in August 2010, contains no mention of Immix work.) The GHC wiki says that
it’s functional, doesn’t have known bugs and gets better results than the default GC in the nofib suite. On the other hand, it gets worse results than the default GC for the nofib/gc suite.Marco said in a Disqus comment on this page:
Hi. I wondered about continuing my work on the Immix GC collector, but Simon Marlow, my mentor, thought it was not a good idea to invest more effort on Immix. So I dropped it, and started working on other things. Greetings.
Successful. This replaced a previous proposal to write a Haskell binding to the GObject library, which never started. Looking through the Darcs repository history, I see a number of new tests related to the global cache, but no major edits to cache-related modules. The Darcs wiki reports it as a successful and closing some bugs.
Improving Darcs Performance; Adolfo Builes, mentored by Eric Kow
Improving Darcs’s network performance; Alexey Levan, mentored by Petr Rockai
Successful. Levan divided his SoC into 2 parts, improving Darcs’s performance in fetching the many small files that make up a repository’s revision history, and writing
a smart server that can provide clients with only files they need in one request. The
smart serverseems to have been abandoned as not being worthwhile, but the fetching idea was implemented and will be in the 2.8 release.
The basic idea is to combine all the small files into a single tarball which can be downloaded at full speed, and avoid the latency of many roundtrips. The 2.8 release description claims that when
darcs optimize --httpwas used on the Darcs repository, a full download went from 40 minutes to 3 minutes. This feature would not be enabled by default, but the gain for larger repositories would be large enough that I feel comfortable classifying it as a successful SoC.
Most of the 7 SoCs are laudably focused on an existing application. You don’t need to justify a speedup of normal Darcs operations because there’s an installed base of Darcs users that will benefit; a new GC for GHC or a LLVM backend will benefit every Haskeller; better Cabal support for testing may go unused by many package authors who either have no tests or don’t want to bother - but a fair number will bother, and it will get maintained as part of Cabal, and similarly for the Hackage 2.0 project.
The Immix GC strikes me as a very challenging summer project; a GC is one of the most low-level pieces of a functional language and is intertwined with all sorts of code and considerations. It would not surprise me if that project wound up just getting a little closer to a working Immix GC but not producing a production-quality GC scheduled to come to compilers near you.
2 in particular concern me as potentially falling prey to sins #2 & 3: the GObject-binder tool, and the high-performance HTML library:
- Let’s assume that the HTML library does wind up as being faster than existing libraries, and as useful - that compromises don’t destroy its utility. Who will use it? It will almost surely have an API different enough from existing libraries that a conversion will be painful. There are roughly 42 users of the existing xhtml-generating library; will their authors wish to embrace a cutting-edge infant library? Is HTML generation even much of a bottleneck for them? (Speaking just for Gitit, Pandoc and its HTML generation are not usually a bottleneck.)
- The case against the GObject project makes itself; GTK2Hs isn’t as widely used as one would expect, and this seems to be due to the difficulty of installation and its general complexity. So there are few users of existing libraries; would there be more users for those libraries no one has bothered to bind nor yet clamored for? (This project might fall afoul of sin #1, but I do not know how difficult the GObject data is to interpret.)
As of February 2010, I grade the 7 SoC for 2010 as follows: 4 successes, 1 unknown, and 2 unsuccessful. (One unknown, Hackage 2.0, will probably turn out to be a success if it ever goes live as the main Hackage site; as of 1 January 2013, it has not.) As one would hope, the results seem to be better than the results for 2008 or 2009.
Of my original predictions, I think I was right about the Immix GC & GObject & Darcs optimizations, semi-right about Hackage 2.0 & Cabal testing support, somewhat wrong about the LLVM work, and completely wrong about the HTML/
blaze SoC. (I am not sure why I was wrong about the last, and don’t judge myself harshly for not predicting the exogenous failure of the LLVM SoC.)
Haskell.org got 7 projects again for 2011. They are:
Improve EclipseFP; Alejandro Serrano, mentored by Thomas Schilling
"Eclipse is one of the most popular IDEs in our days. EclipseFP is a project developing a plug-in for it that supports Haskell. Now, it has syntax highlighting, integration of GHCi and supports some properties of Cabal files. My idea is to extend the set of tools available, at least with:
- Autocompletion and better links to documentation,
- A way to run unit tests within Eclipse,
- More support for editing Cabal files visually, including a browser of the available packages."
Simplified OpenGL bindings; Alexander Göransson, mentored by Jason Dagit
Modernize and simplify OpenGL bindings for Haskell. Focus on safety, shaders and simplicity.
Interpreter Support for the Cabal-Install Build Tool; anklesaria, by Duncan Coutts
This project aims to provide cabal-install with an
cabal ghci?] command by adding to the Cabal API. This would allow package developers to use GHCi and Hugs from within packages requiring options and preprocessing from Cabal.
"For Haskell projects handling Unicode text, the
textlibrary offers both speed and simplicity-of-use. When it was written, benchmarks indicated that UTF-16 would be a good choice for the internal encoding in the library. However, these (rather artificial) benchmarks were did not take into account the time taken to
- decode the
Real Worlddata and
- encode it to write it back.
- benchmark and
- convert the library to UTF-8 if it is a faster choice for
- decode the
Build multiple Cabal packages in parallel; Mikhail Glushenkov, by Johan Tibell
Cabal is a system for building and packaging Haskell libraries and programs. This project’s aim is to augment Cabal with support for building packages in parallel. Many developers have multi-core machines, but Cabal runs the build process in a single thread, only making use of one core. If the build process could be parallelized, build times could be cut by perhaps a factor of 2-8, depending on the number of cores and opportunity of parallel execution available.
Darcs Bridge; Owen Stephens, by Ganesh Sittampalam
My proposed project is to create a generic bridge that will enable easy interoperability and synchronisation between Darcs and other VCSs. The bridge will be designed to be generic, but the focus of this project will be Darcs2 ↔ Git and Darcs2 ↔ Darcs1. The bridge should allow loss-less, correct conversion to and from Darcs repositories, allowing users to use the tool that suits them and their project best, be that Darcs as it currently exists, or another tool.
Darcs, a revision control system, uses so-called patches to represent changes to individual version-controlled files, where the
primitivepatches are the lowest level of this representation, capturing notions like
hunks(akin to what
diff(1)produces), token replace and file and directory addition/removal. I propose to implement a different representation of these primitive patches, hoping to improve both performance and flexibility of darcs and to facilitate future development.
Which seem like good selections for SoC, and which seem less appropriate?
- #1 is the second EclipseFP SoC, after a failed 2009 attempt; why should we think this one will do better?
- With #2, the fear is that the result will not be used; there is an OpenGL binding already, after all, and I haven’t heard that there are very many people who want to do OpenGL graphics but were deterred by complexity or danger in it.
cabal ghciis a long-requested Cabal feature, and it sounds as if all the groundwork and experimentation has been done. I have no problem with this one.
- Benchmarking sounds quite doable, and
textis increasingly used; but if I had to criticize it, I would criticize it for underambition, for sounding too modest and not a good use of a slot.
- #5 is a second crack at the parallel compilation problem (building on a 2008 SoC) and is troubling in the same way the EclipseFP SoC is.
- There are multiple existing Darcs->other VCS programs, so the task is quite doable. An escape hatch would be very valuable for users (even if rarely used).
This one sounds tremendously speculative to me.
I respect Ročkai & Kow, but in idling on
#darcsand reading the occasional Darcs-related emails & Reddit posts, I don’t know of any fully worked out design for said patch design, which makes it a challenging theoretical problem (patch theory being general & powerful), a major implementation issue (since the existing primitive patches are naturally assumed all throughout the Darcs codebase), and difficult to verify that it will not backfire on users or legacy repositories. All in all, #7 sounds like the sort of project where the best case scenario is a repository branch/fork somewhere that few besides the author understands, which is better on some usecases and worse on others, but not actually in general use. That might be a success by the Darcs’s team’s lights, but not in the sense I have been using in this history.
To summarize my feelings:
- #1 seems a bit doubtful but is more likely to succeed (because presumably most of the heavy lifting was done previously).
- I predict #2 & #7 will likely fail
- I would be mildly surprised if both #3 & #5 succeed - since they’re challenging and long-request Cabal features - but I expect at least one of them to succeed. Which, I am not sure.
- I expect with confidence that #4 & #6 will succeed.
Successful. The coding was finished, to the author’s apparent satisfaction, and the work was included in the 2.1.0 release.
Improve EclipseFP; Alejandro Serrano, mentored by Thomas Schilling
Unsuccessful. Jason Dagit says Alexander never started for unknown personal reasons and so no work was ever done (no
Simplified OpenGL bindings; Alexander Göransson, mentored by Jason Dagit
OpenGLRawNicelibrary exists, a post-August 2011 Google search for
Alexander Göransson OpenGLis dry, nothing on Hackage seems to mention OpenGL 4.0 support, etc.).
Unsuccessful? anklesaria’s final post,
Interpreter Support for the Cabal-Install Build Tool; anklesaria, by Duncan Coutts
Ending GSoC, says the work is done and provide a repository with patches by
[email protected]- but no patches by that email appear in the Cabal repository as of 10 December 2011; nor does there appear to be any discussion in the cabal-dev ML archives.
Successful. Jasper published 2 posts on benchmarking the converted
Convert the; Jasper Van der Jeugt, by Edward Kmett
textpackage to use UTF-8 internally
textagainst the original (
Text/UTF-8: Initial results&
Text/UTF-8: Studying memory usage); discussing the results in
Text/UTF-8: Aftermath, the upshot is that the conversion has a real but small advantage, potentially would cause interoperability problems, requires considerable testing, and won’t be merged in (the fork will be maintained against hopes of future GHC optimizations). Jaspers says the benefits wound up being a bigger & cleaner test/benchmark suite, and some optimizations made for the UTF-8 version can be applied to the original. Since Edward Kmett seems pleased, I have marked it a success (although I remain dubious about whether it was a good SoC).
Successful Glushenkov reported in
Build multiple Cabal packages in parallel; Mikhail Glushenkov, by Johan Tibell
Parallelising cabal-install: Resultsthat the patches were done and people could play with his repository; the comments report that it basically works and does offer speedups. However, as before, no patch by him appears in the mainline Cabal, and the last discussion was 6 November 2011 where he provides a patch bundle. No one commented; Mikhail says the patches may be
too invasiveand need reworking before merging.10 The code was ultimately released as part of cabal-install 1.16 and is reportedly working well.
Successful? Owen’s blog posts conclude with
Darcs Bridge; Owen Stephens, by Ganesh Sittampalam
GSoC: Darcs Bridge - Resultssummarizing the final features: he succeeded in most of the functionality. Brent Yorgey tells me that he has successfully used the tool to convert repositories to put onto Github, but says there are
some critical bugsand use is still
clunky(eg. currently requiring Darcs HEAD; see the usage guide on the Darcs wiki). Whether the bugs will be fixed and the package polished to the point where it will be widely used remains to be seen.
Darcs: primitive patches version 3; Petr Ročkai, by Eric Kow
Since my last report, I have decided to turn somewhat more radical again. The original plan was to stick with the darcs codebase and do most (all) of the work within that, based primarily on writing tests for the testsuite and not exposing anything of the new functionality in a user-visible fashion. I changed my mind about this. The main reason was that the test environment, as it is, makes certain properties hard to express: a typical test-suite works with assertions (HUnit) and invariants (QuickCheck). In this environment, expressing ideas like
the displayed patches are aesthetically pleasingor
the files in the repository have reasonable shapeis impractical at best. An alternative would have been to make myself a playground using the darcs library to expose the new code. But the fact is, our current codebase is entrenched in all kinds of legacy issues, like handling filenames and duplicated code. It makes the experimenter’s life harder than necessary, and it also involves rebuilding a whole lot of code that I never use, over and over. All in all, I made a somewhat bold decision to cut everything that lived under
Darcs.Patch(plus a few dependencies, as few as possible) into a new library, which I named
patchlib, in the best tradition of
fslib. At that point, I also removed custom file path handling from that portion of code, removed the use of a custom
Printer(a pretty-printer implementation) module and a made few other incompatible changes.
The remaining work?
"The obvious future work lies in the conflict handling. There are two main options in this regard: either re-engineer a patch-level, commute-based representation of conflicts (in the spirit of mergers and conflictors), as V3
compositepatches, or alternatively, use a non-patch based mechanism for tracking conflicts and resolutions. It’s still somewhat early to decide which is a better choice, and they come with different trade-offs. Nevertheless, the decision, and the implementation, constitute a major step towards darcs 3. The other major piece of work that remains is the repository format: in this area, I have done some research in both the previous and this year’s project, but there are no definitive answers, even less an implementation. I think we now have a number of good ideas on how to approach this. We do need to sort out a few issues though, and the decision on the conflict layer also influences the shape of the repository.
Each of these two open problems is probably about the size of an ambitious SoC project. On top of that, a lot of integration work needs to happen to actually make real use of the advancements. We shall see how much time and resources can be found for advancing this cause, but I am relatively optimistic: the primitive level has turned out fairly well, and to me it seems that shedding the shackles of legacy code sprawl can boost the project as a whole significantly forward."
As I wrote before, the Darcs team will disagree with my assessment, but I believe marking it
Unsuccessfulis most consistent with how all previous SoCs have been judged11.
So of the 7 2011 SoCs:
- 3 were unsuccessful (2 possibly not)
- 4 were successful (1 possibly not)
My predictions were in general accurate; I remained hopeful that at least one of the Cabal SoCs would be merged in, which would give me a clean sweep and also render the final 2011 SoC record as good as the 2010 SoC record. (The parallel build was eventually merged in during 2012.)
It troubles me that the Cabal SoCs took so long to be merged in (if at all), in line with the historical trend for big Cabal SoC improvements to be partially done but never go into production. Duncan Coutts says they are in the queue, but if neither gets merged in before the 2012 SoC starts, the lesson seems to be that Cabal is too dangerous and uncertain to waste SoCs on.
In 2012, Haskell.org was bumped to 8 slots:
The goal of this project is to speed up the darcs changes and darcs annotate commands using a cache called
Scoutess - a build manager for cabal projects; DMcGill, mentored by Alp Mestanogullari
Scoutess is a tool for package maintainers and automates a lot of the hassle of dealing with dependencies and multiple versions of libraries. It will create a sandboxed environment simulating a fresh Haskell Platform install, attempt to build your project using Cabal and highlight any problems while also tracking changes or updates to dependencies located in remote repositories so these can be tested against as well.
Concurrent data structures for Haskell are currently a work in progress, and are necessary for parallel and high-performance computing. A few data structures, such as wait-free lists, already have Haskell implementations. One that does not yet is a thread-safe hash table. I propose to implement one as a library available under the new BSD license.
Accelerating Haskell Application Development; mdittmer, mentored by Michael Snoyman
A project for improving performance of
lively developer modeenvironments that require fast rebuild-and-redeploy routines.
Sandboxed builds and isolated environments for Cabal; Mikhail Glushenkov, mentored by Johan Tibell
The aim of this project is to integrate support for sandboxed builds into Cabal, a system for building Haskell projects. There are several different third-party implementations of this functionality already available, but it would be beneficial (from the points of ease of use and focusing the community efforts) to have a unified and polished solution integrated into Cabal itself. Additionally, this project is a step in the direction of solving the infamous
dependency hellproblem of Cabal.
Enable GHC to use multiple instances of a package for compilation(proposal); Philipp Schuster, mentored by Andres Löh
People are running into dependency hell when installing Haskell packages. I want to help move in the direction of solving it.
multiuser browser-based interactive ghci, hpaste.org meets tryhaskell.org, for improved teaching of those new to Haskell.(proposal); Shae Erisson, mentored by Heinrich Apfelmus
Many new users learn Haskell from the
#haskellirc channel. lambdabot’s mueval is good for interactive teaching, but only allows short code snippets. hpaste allows large snippets to be shared, but not into an interactive ghci. Chris Done’s
http://www.tryhaskell.orgallows larger snippets to be loaded, but is not explicitly multiuser. If tryhaskell allowed multiple users to view the same interpreter state, and allowed users to paste in new code, teaching and debugging would be much easier for people new to Haskell.
Haskell-Type-Exts; Shayan Najd, mentored by Niklas Broberg
Following the proposal by Niklas Broberg , I am highly eager to expand the existing typechecker  for Haskell-Src-Exts  to support most of the features available in Haskell 2010 with the major extensions like GADTs, RankNTypes and Type-Functions. It is done by following the guidelines of
Typing Haskell in Haskell as the basis; adding support for RankNTypes ; and then introducing GADTs and Type-Functions by local assumptions .  http://hackage.haskell.org/trac/summer-of-code/ticket/1620  http://hackage.haskell.org/package/haskell-src-exts  http://hackage.haskell.org/package/haskell-type-exts  M. P. Jones. Typing Haskell in Haskell  D. Vytiniotis, S. Peyton Jones, T. Schrijvers, M. Sulzmann. OutsideIn(X) - Modular type inference with local assumptions
According to the proposal, the core patch index code has already been implemented & benchmarked by the student, who has worked on Darcs before (flying 9 hours into England for a hacking meetup). The rest of the work sounds reasonable, and the project is not overreaching at all. I fully expect this to work out (even ifThis SoC will be judged successful if it is in Darcs HEAD, or at least scheduled for application, by my usual deadline: 1 January 2013.
darcs annotateis not a command I use every month, much less day). The main risk seems to be life events, but SoCs failing due to personal issues are relatively rare and affect <20% of past projects.
No proposal was publicly available. I am not familiar with DMcGill, and Googling for Haskell material related toJudgment: tools are always hard to judge. This one will be the usual subjective
McGill, I don’t see any past work. It sounds relatively ambitious in the short abstract - replicating
cabal-devand adding in considerable other functionality? I’ve previously noticed that Cabal-related SoCs seem to be unusually blighted. Adding that all up, I am left with dubious feelings.
is it being used by a good fraction of its potential userbase?criteria.
The student has completed a SoC before, and is a graduate student in an AI/machine learning program; both of which bode well for him completing another SoC. I’m not actually sure how many Haskell applications need a concurrent hashtable - the existing hashtables package has 3 users, andIt is unreasonable to expect it to supersede
Data.HashTablemodule is used by perhaps 10-20 code repositories (judging from grepping through my local archive).
base, which has had something like a decade to gain users, but equaling the obscure
hashtablespackage seems reasonable. Judgment will be whether there are >=3 reverse dependencies.
As stated, I have no idea what this SoC is about. I don’t know the student, although Snoyman seems to write a great deal of code and successful code at that, which is a good sign - if he agreed to mentor it, surely the idea can’t be that bad?Since I don’t know what it is, I cannot specify a judgment criteria in advance.
Both student & mentor are experienced Haskell hackers, and have worked with the Cabal codebase. As the abstract says, sandboxed builds are not a novel feature.Judgment; sandboxed build functionality in Cabal HEAD or scheduled to be applied soon.
cabal-devis popular among developers, so it stands to reason that a polished version inside Cabal itself would be even more popular. I see little reason this could not be successful, aside from the general challenge of working with Cabal.
Shae is an experienced Haskeller & professional developer (to the extent I was very surprised to hear that he had applied). The proposal seems like a very reasonable addition, and I do not think it is too difficult to modify theJudgment: whether multi-user sessions have gone live.
Here again I regret the absence of a public proposal. I’m not sure how useful this one is, how hard it is, or how much progress the prototype library on Hackage represents, nor do I know any comparable libraries I could check for a reverse dependency count. I don’t know the student, but Broberg is a capable Haskeller.
Judgment criteria: punting to checking for >=3 reverse-dependencies/users.
As of 1 January 2013:
Darcs patch indexMerged into Darcs HEAD without apparent issue (documentation). Project was successful.
scoutess: As of August 15, McGillicuddy was reporting that scoutess was complete (repository). In Haskell-cafe, there is one off-hand mention of scoutess by someone using a different continuous integration program; there are a few discussions on Reddit of progress but the most recent post is a theoretical discussion of scoutess’s architecture. There are no tools or libraries depending on it in Hackage because scoutess has never been uploaded to Hackage. Indeed, as far as I can tell, no one is actually using it, and stepcut agreed with this assessment when I asked him.I specified in April 2012 that my judgment criterion would be
is scoutess being used by a good fraction of its potential userbase?; in this light, scoutess was unsuccessful.
concurrent hashtable/hashmapEdward Kmett tells me that Loren ran into personal issues and was removed from SoC by the midpoint with no delivered library. Unsuccessful.
Edward Kmett tells me that the student left for a job around the midpoint and was removed from SoC at the last milestone. Unsuccessful. eegreg argues that while incomplete, the first goal of the SoC (a file-watching library) has since been fulfilled and the library been put to use by the Yesod ecosystem of Web libraries & applications.
Accelerating Haskell Application Development
Sandboxed buildsCompleted and in Cabal HEAD; per my criteria, this is successful.
Multiple packages support in GHCThe latest information I can find is a GHC documentation page which summarizes the material as:
It is possible to install multiple instances of the same package version with my forks of cabal and ghc. Quite a few problems remain.A set of slides says
Quite a few problems remain therefore nothing is merged yet.The code is not in HEAD for either Cabal or GHC, and given the many problems, may never be. Unsuccessful.
Better tryhaskell.orgErisson finished in August with a Hackage upload and some nice slides. Unfortunately, there is no live server where one can actually use ghcLiVE; someone suggested that Erisson might’ve given up on the sandboxing aspects which would have made it usable on the public Internet (per his original proposal). One wonders how many people will ever use it, given how much Haskell instruction is done remotely, but maybe it would be useful in offline university classes. In any case, my criterion was clear:
whether multi-user sessions have gone live; and so despite my high hopes, I must mark this unsuccessful. (Edward Kmett disagrees with this assessment; Erisson sort of agrees and disagrees13.)
Haskell type-checker library
This one is a little confusing. The Hackage library remains untouched since April 2012, although there is a largely complete library (main missing feature is records support, which is important but not a huge gap) available on Github. Another blog post implies that it is but a small part of a grander research scheme entirely, and that my reverse dependencies judgment criteria is simply off-base entirely although it suggests the SoC was unsuccessful. (The obvious alternative, looking at whether it is pushed to the HEAD of haskell-type-exts, would also suggest unsuccessful.) I am not sure whether this should be considered successful or unsuccessful.
- unclear: 1
- successful: 2
- unsuccessful: 5
2 of the 5 unsuccessful projects were due to problems on the student’s end (hashtable,
accelerating); 2 were too ambitious in general (scoutess, multiple-packages); and the last 1 was not too ambitious but in my opinion was left somewhat incomplete (ghcLiVE).
How successful were my predictions? Employing a proper scoring rule (log scoring; for additional discussion of scoring rules, see 2012 election predictions) and comparing against a 50-50 random guesser where >0 means I outperformed the random guesser14:
logBinaryScore = sum . map (\(p,result) -> if result then 1 + logBase 2 p else 1 + logBase 2 (1-p)) logBinaryScore [(0.80, True), (0.40, False), (0.60, False), (0.40, False), (0.75, True), (0.65, False), (0.80, False)] ~> -0.3693261451031018
I performed worse than random, in part because 2012 was such a bad year. In particular, I placed great weight on Erisson succeeding (without that prediction, I would score 0.95). In retrospect, I am also disappointed that I assigned the GHC project a high as 65% when I knew GHC projects are as dangerous as Cabal projects and the multiple packages work was a lot of low-level problems with minimal foregoing work.
So, what lessons can we learn from the past years of SoCs? It seems to me like there are roughly 3 groups of explanations for failure. They are:
- Hubris. GuiHaskell is probably a good example; it is essentially a bare-bones IDE, from its description. It is expecting a bit much of a single student in a single summer to write that!
- Unclear use. HsJudy is my example here. There are already so many arrays and array types in Haskell! What does HsJudy bring to the table that justifies a FFI dependency? Who’s going to use it? Pugs initially did apparently, but perhaps that’s just because it was there - when I looked at Pugs/HsJudy in 2007, certainly Pugs had no need of it. (The data parallel physics engine is probably another good example. Is it just a benchmark for the GHC developers? Is it intended for actual games? If the former, why is it a SoC project, and if the latter, isn’t that a little hubristic?)
- Lack of propaganda. One of the reasons Don Stewart’s bytestring library is so great is his relentless evangelizing, which convinces people to actually take the effort to learn and use Bytestrings; eventually by network effects, the whole Haskell community is affected & improved15. Some of these SoC projects suffer from a distinct lack of community buy-in - who used HaskellNet? Who used Hat when it was updated? Indifference can be fatal, and can defeat the point of a project. What good is a library that no one uses? These aren’t academic research projects which accomplish their task just by existing, after all. They’re supposed to be useful to real Haskellers.
There are 2 major collections of ideas for future SoC projects, aside from the general frustrations expressed in the annual survey:
Let’s look at the first 12 and see whether they’re good ideas, bad ideas, or indifferent.
- port GHC to the ARM architecture: It would be a good thing if we could easily compile our Haskell programs for ARM, which is used in many cellphones, but an even better idea would using the LLVM backend to crosscompile. It would be somewhat tricky, but LLVM already has fairly solid cross-compilation support, and making GHC capable of using it seems like a reasonable project for a student to tackle.
Implement overlap and exhaustiveness checking for pattern matching: this seems both quite challenging and also a specialized use. I use GADTs rarely, but I suspect that those writing GADT code rarely make overlap or omission errors.
- Incremental garbage collection: this may be a good idea depending on how much of the code was already written. But I fear that this would go the way of the Immix GC SoC and would be a bad idea.
ThreadScope with custom probes: I don’t understand the description and can’t judge it.
A simple, sane, comprehensive Date/Time API: having puzzled over date-time libraries before, I’m all for this one! It’s a well-defined problem, within the scope of a summer, and meets a need. Its only problem is that it doesn’t sound sexy or cool.
Combine Threadscope with Heap Profiling Tools: Uncertain. Going by the Arch download statistics, Threadscope is downloaded more often than one would expect, so perhaps integration would be useful.
Haddock with embedded wiki feature, a la RWH, so we can collaborate on improving the documentation: This is a bad idea mostly because there are so many diverging ideas and possible implementations - it’s just not clear what one would do. Is it some sort of Haddock server? A Gitit wiki with clever hooks? Some lightweight in-browser editor combined with Darcs?
HTTP Library Replacement: A good idea, assuming the linked attempts and alternate libraries haven’t already solved the issue.
Using Type Inference to Highlight Code Properly: The difficult part is accessing the type information of an identifier inside a GHCi sessions - a problem probably already solved by scion. Colorizing the display of a snippet is trivial. So this would make a bad SoC.
Transformation and Optimisation Tool: This initially sounds attractive, but previous refactoring tools have been ignored. The tools that have gotten uptake are things like GHC’s
-Wall(which warns about possible semantic issues) and hlint (which warns about style issues and redundancy with standard library functions) - not like Hera.
Webkit-based browser written in Haskell, similar in [plugin] architecture to Xmonad: This is probably the worst single idea in the whole bunch. A web browser these days is an entire operating system, but worse, one in which one must supply and maintain the userland as well; it is a thankless task that will not benefit the Haskell community (except incidentally through supporting libraries), nor a task it is uniquely equipped for. It is an infinite time sink - the only thing worse than this SoC failing would be it succeeding!
Add NVIDIA CUDA backend for Data Parallel Haskell: DPH is rarely used; a CUDA backend would be even more rarely utilized; CUDA has a reputation for being difficult to coax performance out of; and difficulties would likely be exacerbated by the usual Haskell issues with space usage & laziness. (DPH/CUDA use unboxed strict data, but there are interface issues with the rest of the boxed lazy Haskell universe.) All in all, there are probably better SoCs16.
It’s difficult to quantify how
useful a package is; it’s easier to punt and ask instead how
popular it is. There are a few different sources we can appeal to:
- Don Stewart provides, for Arch Linux, a status page which includes Arch download numbers
- The Debian (and Ubuntu) Popularity Contest offers limited popularity data; eg. xmonad
- some 2006-2009 Hackage statistics are available by month & ranking; live Hackage statistics is an open bug report which will be closed by Matthew Gruen’s Hackage 2.0 (2010 SoC)
Reverse dependencies can be examined several ways:
Searching for mentions, blog posts, and unreleased packages elsewhere; key sites to search include:
The Haskell ecosystem evolves fast, and strong static typing means that a package can quickly cease to be compilable if not maintained.↩
From the 11 February 2011 Haskell-cafe thread,
Haskell Summers of Code retrospective (updated for 2010):
There was some discussion of this on Reddit. Below is a slightly cleaned-up version of my comments there.
I really appreciate this roundup. But I think the bar is set somewhat too high for success. A success in this framework seems to be a significant and exciting improvement for the entire Haskell community. And there have certainly been a number of those. But there are also projects that are well done, produce results that live on, but which aren’t immediately recognizable as awesome new things. Furthermore, GSoc explicitly lists a goal as inspiring young developers towards ongoing community involvement/open source development, and these notes don’t really take that into account.
For example, I don’t know of any direct uptake of the code from the HaskellNet project, but the author did go on to write a small textbook on Haskell in Japanese. As another example, Roman (of Hpysics [sic]) has, as I understand it, been involved in a Russian language functional programming magazine.
So I think there needs to be a slightly more granular scale that can capture some of these nuances. Perhaps something like the following:
- [ ] Student completed (i.e. got final payment)
- [ ] Project found use (i.e. as a lib has at least one consumer, or got merged into a broader codebase)
- [ ] Project had significant impact (i.e. wide use/noticeable impact)
- [ ] Student continued to participate/make contributions to Haskell community
A few more detailed comments about projects that weren’t necessarily slam dunks, but were at the least, in my estimation, modest successes:
- GHC-plugins – Not only was the work completed and does it stand a chance of being merged, but it explored the design space in a useful way for future GHC development, and was part of Max becoming more familiar with GHC internals. Since then he’s contributed a few very nice and useful patches to GHC, including, as I recall, the magnificent TupleSections extension.
- GHC refactoring – It seems unfair to classify work that was taken into the mainline as unsuccessful. The improvement weren’t large, but my understanding is that they were things that we wanted to happen for GHC, and that were quite time consuming because they were cross-cutting. So this wasn’t exciting work, but it was yeoman’s work helpful in taking the GHC API forward. It’s still messy, I’m given to understand, and it still breaks between releases, but it has an increasing number of clients lately, as witnessed by discussions on -cafe.
- Darcs performance – by the account of Eric Kow & other core darcs guys, the hashed-storage stuff led to large improvements (and not only in performance) – the fact that there’s plenty more to be done shouldn’t be counted as a mark against it.
Further criticisms by sclv from 2013:
I no longer think these summaries are even modestly useful, because the judgement criteria are too harsh and too arbitrary. They reflect a bias towards
successof a gsoc project as something measurable directly measurable in uptake and users within a relatively short span of time.
GSoC projects are chosen, on the other hand, with an eye towards long-term payoff in Haskell infrastructure.
The criteria that would yield us
high successprojects in the sense judged here would also yield us projects that weren’t very interesting, useful, or important.
For example, how long must a student
continue to participate/make contributions to Haskell community? Spencer Janssen, a successful 2006 SoC student, went on to be one of the 2 main developers on the popular Xmonad window manager, but then wound down his Haskell contributions and stopped entirely ~2009 (much to my dismay as an Xmonad developer). Is he a success for SoC?↩
As of 18 March 2011, I have local copies of 8 repositories which seem to make use of the new syntax:
angle, cabal, concurrent-extra, hashable, rrt, safeint, spatialIndex, unordered-containers, wai-app-static.↩
11 December 2011, Google+:
Regarding the parallel cabal-install patches - Duncan is concerned that my changes are too invasive. I hope to get them merged in during the next few months after some reworking (we’re currently discussing what needs to be done).
From my conversation in
#darcswith Eric Kow and other Darcs developers:
< kowey> mornfall [Petr Ročkai] and I did discuss the proposal beforehand... one thing to clear up first of all is that this is very specifically about the primitive patch level and not a wider patch theory project < kowey> the difference being that it's easier to do in a SoC project < owst> Also, mornfall has the advantage of being very experienced with the Darcs code-base, and its concepts - he's not going to require time to "get used to it" so I'd argue he's certainly not the average SoC student... < kowey> I think mornfall has also put a good show of effort into thinking about (A) building off previous thinking on the matter (see his proposal), (B) fitting into the Darcs agenda -- particularly in aiming for this work to happen in mainline with the help of recent refactors and also to result in some cleanups and (C) making the project telescope < gwern> owst: well, in a sense, that's a negative for the project as well as a positive implementation-wise - SoCs are in part about bringing new people into communities < kowey> by telescope I mean, have a sane ordering of can-do to would-be-awesome < Heffalump> gwern: yeah, though the Haskell mentors didn't see it that way < kowey> (the mental image being that you can collapse a telescope) < gwern> owst: I didn't mention that because I'm trying to not be unrelentingly negative, and because investigating backgrounds of everyone would require hours of work < kowey> (sorry, I misread and see now that gwern did catch that this was primpatch specific) < owst> gwern: in part, but not in full - they are ultimately also about "getting code written" for a project and that's certainly going to happen for mornfall's project! < gwern> owst: that's the same reason I don't also judge SoCs by whether the student continued on in the community - because it'd be too damn much work < owst> gwern: sure, I thought as much. < gwern> owst: even though the student's future work would probably flip a number of projects from failure to success and vice-versa (eg. what has Spencer Janssen been doing lately? how many of the SoC students you see on the page did that and have not been heard from since like Mun of Frag?) < gwern> so, I just judge on whether the code gets used a lot and whether it did something valuable < kowey> it's a project that has long-term value for Darcs < kowey> I think I agree with the last line of your prediction, "That might be a success by the Darcs's team's lights, but not in the sense I have been using in this history." < kowey> although I'm certainly hoping for something better in the middle bit: code that winds up in darcs mainline plus specifications on the wiki
muevalbut tryhaskell.org uses a fork which takes expressions over a pipe as opposed to being a one-shot CLI tool.↩
Ah, ghcLiVE isn’t designed to have a server hosted, it’s designed to run outside a sandbox. Not that I would mark this successful myself, but mostly because it uses Yesod and cabal dependency hell means very few people ended up using it…I’d like to port ghclive to scotty, which has far fewer build dependencies, I think people would actually use it then
Guessing 50% simplifies the calculation, and isn’t too far off: Doing a quick sum of all the non-2012 successful/unsuccessful ratings, I get a chance of being successful at
(4+6+2+3+3+4) / ((4+2+5+1+2+4) + (4+6+2+3+3+4)) = 0.55, which isn’t terribly different from a guess of 0.5.↩
Many good and worthwhile projects suffer this fate because of their academic origins. There’s no reward for someone who creates a great technique or library and gets the wider community to adopt it as standard. As far as the Haskell community is concerned, one Don Stewart is worth more than a dozen of Oleg Kiselyov; Oleg’s work is mindblowingly awesome in both quantity and quality, everyone acknowledges, but how often does anyone actually use any of it?
(Iteratees may be the exception; although there are somewhere upwards of 5 implementations by Oleg and others leading to a veritable Tower of Iteratee situation, the original iteratee has picked up 4 reverse dependencies, its most popular successor 33 and iteratees in general may one day become as widely used as bytestrings.)↩