Exploring SymPy or: What, Really, is the Purpose of Computer Algebra Systems?

I recently got interested in singular perturbation theory , and to get help with the algebra, I turned to SymPy. I had tried to use Mathematica in graduate school in the early 90s, but the experience had been sufficiently frustrating that I had steered clear of computer algebra systems since.

SymPy presents itself as a “friendly”, less intimidating alternative, with a more familiar and conventional language and operating model.

SymPy: First Impressions

This was my first serious encounter with SymPy: I read the tutorials and tried to apply what I learned to my specific problem.

At first glance, SymPy leaves a distinctly mixed impression:

  • It is, without doubt, amazing what has been accomplished! There is a computer algebra system, which works, and which supports all the functionality that one would expect (series, derivatives, integrals, the lot). Impressive!

  • It is also welcome that SymPy is built on and uses Python: a familiar, comfortable language; not some strange and unfamiliar dialect of Lisp, Haskell, or some stand-alone ad-hoc language.

  • At the same time, the Python heritage implies a rather ambivalent legacy. It becomes clear pretty quickly that the whole symbolic set of operations has been flanged somewhat awkwardly onto the underlying Python environment, as opposed to having a language and computing environment that understands symbolic operations natively. For example, “variables” for use in algebraic expressions must be defined as Python variables, using syntax like x, y, z = symbols( "x y z" ). These entities then live a kind of hybrid life: they are Python variables that represent algebraic variables and have values that (typically) represent their names in the equations… it works, somehow, but it feels like a hack.

  • It is not quite clear how seriously SymPy takes itself. The mailing list, at least, seems to be dominated by excited undergraduates doing their homework. I myself tried to use SymPy for some more serious algebraic manipulations (perturbation theory!), where the ability to select parts of an expression and operate on them is critical. It turns out that SymPy isn’t well equipped for this sort of thing; this is clearly not the type of application that the community tends to think much about.

As a first visit to SymPy-land, this was definitely a success: I did manage to solve my problem, and SymPy indeed was a great help with algebra that would have been tedious, if not infeasible, to do on paper.

SymPy: Reconsidered

I am less certain how much time I want to invest in it going forward. I am not convinced that the project aims to be a solution for the kinds of problems that I would like to apply it to; or that it even understands or is aware of such problems!

As I already said, the (fairly high volume) traffic on the mailing list seems to be dominated by undergraduates, with undergraduate problems, and undergraduate skill sets. Nothing wrong with that — but it limits the project’s aspirations first, and its capabilities second.

On a larger scale, the project seems directionless. Its roadmap seems to boil down to everything, everywhere, all at once: “any form of purely symbolic mathematical computation is in scope”. That about sums it up.

This is a shame, because all the traffic on the mailing list points to tremendous visibility and interest, and should amount to a powerful resource. But without guidance and direction, it is not going to improve the project. Simply adding random features contributed by random people is a mistake.

It is a fact that a large API is a liability for a software project, not an asset. Professional software developers understand this in their blood, amateurs (in software engineering), as a rule, do not.

Maintaining a sense of “core functionality” and keeping it stable and high-quality is essential. Browsing the list of SymPy’s open issues on Github, the number of (reportedly) incorrectly handled edge cases makes me uncomfortable: precisely because edge cases in symbolic computation are difficult, they should be a priority.

Another important question for the project is whether it wants to be a serious tool for serious algebraic computation, or essentially a “toy”. At least some of the individual items on the roadmap point to the latter (interactive plotting in the browser!), while “support for large-scale algebraic work” or “manipulating complex algebraic expressions” is notably absent. Improving quality and documentation are listed, but as generic afterthought. It should be the other way around!

What is the Purpose of Computer Algebra Systems?

Computer algebra systems have been around since the 1970s, and at least at times have had rather high, flashy visibility (Mathematica in the 90s!). But I perceive an absence of true “success stories”: either high-profile ones, or the everyday success of being a standard part of every working scientist’s toolbox. The absence of textbooks on the topic is particularly striking (compare numerics!).

What, then, is the purpose of computer algebra systems: the problems that only they can solve? And what is their “sweet spot”, where they become the favored choice, among existing alternatives?

  1. There are ghee-whiz demos aplenty, but that can’t be the point. Binomial coefficient of 23 in 511! Series expansion of $\exp(\sin(x))$ to 48th order! So what? This isn’t useful. (Here is a test: would anybody even notice if all these results were dead-wrong?)

  2. A much more legitimate area is to view computer algebra systems as “pocket calculator”, but for symbolic problems. The point here is that these are problems one could do by hand, but maybe one doesn’t want to — in the same way that one could calculate the cube root of 1729.03 arithmetically, or divide 152003/17 by long division, but is unlikely to do so.

    Typical examples of these kinds of problems are closed-form integrals, straightforward differential equations, some small matrix problems.

    I can also see uses in areas that frequently require essentially trivial, but tedious operations, such as partial fraction decomposition, which arises frequently in Control Theory, or when working with generating functions.

  3. A difficult topic is presented by the problems where one would expect computer algebra systems to dominate: large-scale algebraic computations where the amount of algebra, the sheer number of terms, becomes so overwhelming that doing it with pencil and paper becomes tedious and off-putting at best, infeasible at worst — and that’s not even mentioning the potential for errors, which grows, rather quickly, with the problem size.

    The quintessential or archetypal problem for this kind of application is high-order perturbation theory. The number of terms grows combinatorially, making anything beyond first or second order a challenging “project”, all by itself.

    For these problems, a computer algebra system is not going to produce “the solution”, and if it did, it would not be useful. (What do you do with several pages of symbolic gibberish?) Instead, it should help to perform each step in the calculation. Stringing steps together, by cleverly selecting and substituting parts of expressions is critical: this is facilitated by the fact that in this type of problem, although the sheer number of terms tends to be quite large, the structure of the terms tends to follow some predictable pattern, making it possible to automate some of the work. Obviously, this is not easy, and requires a fairly deep understanding of the problem and the nature of the solution, and all its intermediate steps.

    The details of this kind of work seem to be rarely discussed. A set of “best practices” or “lessons learned” would be highly desirable.

  4. Furthermore, applications seem to exist in specialized niche areas, typically using comparably specialized tools: groups and finite fields, abstract algebra more generally, number theory — anything else? Tools that are mentioned repeatedly in this context include GAP and PARI/GP, both of which seem to enjoy rather healthy and active communities.

    What’s interesting is that this is pure and fairly advanced mathematics: clearly a game for professionals.

  5. There may be unconventional ideas and alternative workflows. For instance, I read about a group somewhere that does all their calculations by hand, then feeds original and transformed version into computer, expands them, and checks that they are, in fact, equal. In this model, the computer is not used to do the work, but only to check that it is correct. This is clearly useful, but different.

Am I missing something?

Personally, I never had use for item 2 (the only integrals Physicists do are Gaussians, and you don’t need computer algebra systems for that — that’s the whole point!). But I do acknowledge that it is a legitimate application. It’s a different question whether USD2500.00 (the approximate price point for Mathematica or Maple) are money well spent on a “pocket calculator”. Maybe, depending on circumstances.

I did wish for help with item 3 more than once, but found it exceedingly difficult to actually get productive use out of the computer in these cases. This should be possible, but it’s not clear that the effort to teach the computer to do it is actually less than doing it oneself, with paper and pencil.

The breakeven point (learning effort vs. productivity gain) seems only to be reached when repeatedly doing very extensive algebraic manipulations. It doesn’t seem to work well for casual, but non-trivial work. This may be a matter of practice and experience; more research and guidance on best practices would be welcome.

I wonder whether there are additional alternative ways of utilizing computer algebra systems, that are different from trying to have the computer do what a human would do with paper and pencil. Besides the (single) mention of using it to check results, I am not aware of original ideas.

Somehow this seems sobering: the concept of computer algebra is stunning, and the systems are impressive. But I can’t help feeling that the results, so far, do not seem live up to the promise.

Am I missing something?