blog google python world domination

Google + Python = world domination?

Since its initial release in 1991, the Python programming language has steadily grown in popularity. Designed by Dutch programmer Guido van Rossum (a.k.a. Python’s Benevolent Dictator for Life), its conciseness, power, flexibility, portability, and large ecosystem of readily available libraries have gained it many converts. Today it is used in a wide variety of diverse environments—Google is powered by Python, as is the popular Web development framework Zope. On UNIX/Linux it is also viewed by many as a modern replacement for Perl, which has historically been the "heavy lifting" scripting tool on those platforms. Where I work, we use Python for a wide variety of intraweb, scripting, and automation tasks, including a large regression testing framework we use to test our software.

Python is not without its issues, however. The most widely deployed implementation is interpreted (i.e. it doesn’t compile all the way down to native machine code), which has significant performance implications. It also does not handle multithreaded code well, since parts of the Python interpreter are inherently single-threaded. Python’s lack of full multithreading support has become increasingly problematic as multi-core CPUs have become the norm. Developers have typically worked around these limitations by coding performance-critical portions of the application in C or C++; the compiled C/C++ code is then called by the Python script. While certainly workable (and well supported by Python), this approach increases development times, complicates debugging, and hurts portability of high-performance Python applications.

Fast-forward to 2009. Guido is now a Google employee, and a team at Google has decided to take on the ambitious task of replacing much of the underpinnings of the Python language, with the goals of removing the interpreter performance bottlenecks and making true multithreaded Python applications possible. All without hurting backward compatibility with existing Python applications… a rather tall order! The project is code-named Unladen Swallow, and the team already has a detailed plan outlining its intended approach, which will occur in stages.

By mid-year, they intend to replace the existing Python interpreter with a more efficient one based on LLVM. This has some immediate benefits—eliminating the relatively inefficient stack-based architecture of the current Python interpreter with the more efficient LLVM register-based architecture should result in significant performance gains. Longer term, it paves the way for compilation to optimized native machine code, which has the potential to make Python performance comparable to that of other languages that compile to native machine code (like C/C++). Once the transition to LLVM is complete, the team plans to tackle other optimizations and enhancements, including the multithreading issue.

Speaking as a long-time (quarter of a century!) C/C++ developer and relatively recent convert to Python, I find the Unladen Swallow project a very exciting development. If successful, I believe it will position Python to compete directly with C/C++ in the systems, applications, and gaming markets. It could also accelerate Python’s displacement of other scripting languages like Perl and PHP in the server infrastructure and Web application markets. Furthermore, the portability of Python (and its supporting libraries) has the potential to make it much easier for developers to port applications between platforms: moving apps between Windows, Mac OS, and Linux could literally become as simple as recompiling a set of Python modules. If Unladen Swallow pans out, I think it is entirely plausible that, in the not-too-distant future, we could even see a commercial cross-platform FPS game implemented entirely in Python—something that would be unthinkable with the current Python implementation.

Kudos to Google for its willingness to fund R&D efforts to improve a technology that has the potential to benefit everyone!

0 responses to “Google + Python = world domination?

  1. Finally Python and LLVM and no single word in comments about this great event!

    Guys, don’t give up with this goal and stay in contact with Chris and his team!

    LLVM is a brilliant basis also in long-term

  2. Python dominating the world? The Ruby fanatics might have something to say about that.

  3. Well, as they say… if you want to make an omelet, you need to break a few eggs. Getting to the point where you can be truly productive in C/C++ requires that you climb a fairly steep learning curve.

    I agree that bringing near-C/C++ levels of performance to Python is an exciting possibility. (Well, that was kind of the whole point of this blog post.)

  4. My understanding is that a substantial chunk of Google’s infrastructure is implemented in Python. A more efficient Python could potentially save them millions (billions? how much does Google have invested in servers anyway?) of dollars on servers, and the electricity to operate them.

    LLVM is already cross-platform. According to their web site, they support “X86, X86-64, PowerPC, PowerPC-64, ARM, Thumb, SPARC, Alpha, and IA-64”. So assuming that Google fully leverages LLVM’s cross-platform capabilities, Unladen Swallow should be very portable.

  5. I don’t understand what the benefit to Google would be apart from their own software which they already give away for the most part. What are they taking this on? I will have to look at Python, though most of my work is either down to the iron (C) or something which so far has require multi-threading.

    Sounds like Microsoft and Visual Studio may need to watch out for this if Python is cross-platform when they are done.

  6. Oh wow, this is exciting. I love Python (tried C++, but I had troubles with compiling and makefiles…).

  7. Threading a workload can be a pain in the rear entrance though, depending on the task.

  8. Well yeah sure you can do anything in C/C++ given sufficient time and determination. The current Python interpreter is implemented in C, so I suppose you could argue that it all reduces to a C programming exercise at some level. 😉

    But by this line of reasoning, assembler/machine code is the pinnacle of flexibility, since /[

  9. I like some stuff in Python like how it works with sequence types and late binding, but absolutely hate its syntax :\

  10. Google thought that the modern browsers ran web applications poorly, were too complex, and were too slow. Because more people using the web means more people using Google, they tried to release a browser that handles web applications well, is simple, and is faster. They “fixed” what they wanted changed in mass-market browsers.

    Likewise, Google thought there wasn’t a strong OS available for mobile internet devices… big smartphones up through small netbooks. I guess it had a problem with Apple’s OS being closed, and with Windows Mobile being horrible. So they set out to build one, so more people would buy MIDs, so more people would use MIDs, and so more people would use Google while on the Internet.

  11. I find C/C++ quite flexible. I can even create my own datatypes if I really need to.

  12. As computers have gotten more powerful the level of abstraction available to software developers without perceptible lose in performance has grown:

    1GL – 11010001000101101010001
    2GL – We get basic compilers & assembly language
    3GL – The life blood of computing at a system level (BASIC, C, C++, C#, Pascal, and Java)
    4GL – Accessing data made even easier from a programming perspective. Also if the program compiles then there will be no runtime errors (e.g. divide by Zero, or invalid ops). Assuming no bugs in the 4GL kernel. Mostly used for Database/RAD programming.
    5GL – Works with AI and having AI solve problems based on constraints. The program must run in between those constraints, but exactly how inside them isn’t designed.

    For me all scripting languages are 3½GL’s, they all have an extra level of abstraction missing from a normal 3GL, but don’t have the fixed structure of a 4GL. This structure does have its limits; being fixed flow and non-dynamic (e.g. not using data as executable code).

    Scripting languages being used to build games isn’t new as there are a number of games already doing this, but mostly just for the high level stuff (i.e. in game) with the core still be ASM/C/C++ libs being called by higher level functions.

    Out of order execution is already available in many 3GL compilers, so it makes sense to move this up the chain where possible. The Holy Grail, easy multithread programming. There are a couple of languages in development that do inherent multithreading though the syntaxical structure of the program language. It’s by design structured to increase the level of parallelism from resulting code generated by the compiler.

    A better compiler only makes things slightly faster, but real speed increases come though redesign. Some problems can’t be broken down into parallelizable units of work. It’s up to the Developer to analyses the problem at hand and design an algorithm that can have as many parts executing at the same time, but the problem here is the data being processed mustn’t be so interdependent that execution stalls occur so often that the net performance is only slightly increased as parallel solution’s add design complexity making program modifications overly costly.

    What a computer language needs is to make development easy, not having to solve silly stuff, reinventing the wheel. Meaning I shouldn’t have to right a sort function every time I want to display a list of items. In the past 2/3GL developers had their “code libraries” of pre-rolled solutions to commonly reoccurring problems. I remember when these used to be sold by companies as development aids. This was the birth of scripting as people decided it would be easier just to change the base language syntax into something easier to read/maintain increasing productivity.

    Really enjoyed this, a nice change of pace; then seeing more competing GFX cards that are basically identical since all computer games are (except Crysis2) are designed for consoles.

  13. That brings back memories, it’s been years since I’ve hand coded some POV Ray, I remember it taking 6 hours to render my 640×480 scene on a 386 16Mhz SX 🙁

  14. JBI: Great post, but there’s a few issues. 1) The only place Python has parallelism issues is with threads, you can use multipleprocesses fine(see Jesse Noller’s concurency in Python talk from pycon at Unladen is being worked on by several guys who are also python core committers, and some of their work from this has already made it’s way upstream.

  15. Maybe it’s just me, but the first time I read “Unladen Swallow” I thought it sounded like an excellent porn name.

    ….But that’s probably just me

  16. What exactly did Chrome “fix”, or Android bring that you couldn’t live without or get elsewhere?

  17. “All without hurting backward compatibility with existing Python applications”

    Didn’t they just do that anyway?

  18. LOL… yeah, something like that crossed my mind.

    They do state in their project plan that their goal is to eventually get it merged back into the “official” version of Python. But doing that without breaking existing apps is going to be tricky, so it may very well take a while.

  19. No, not fail… it just means that Python is not currently appropriate for implementing the core algorithms of performance-critical applications, or for applications where fine-grained multi-threading is needed.

    Many general scripting tasks, web middleware, and a whole host of other applications are bottlenecked by other parts of the system (network connection, disk I/O, database engine, etc.). So using Python in these kinds of situations adds a negligible amount of additional overhead, relatively speaking. The savings in applications development time is more than worth it.

    And just because the current interpreter isn’t multithreaded doesn’t mean it can never take advantage of multiple cores. If you’re running two independent Python scripts, they each get their own interpreter, and will execute on separate cores if you’re running on a multi-core system. For typical server workloads, this corase-grained use of multiple cores is all you really need.

  20. there’s so much flexibility in Python it seems unlikely they’ll be very successful, or it’s going to optimize “common” code and not more advanced (possibly more elegant) programs. Part of why C/C++ is nice is that it’s simple enough to understand why an algorithm could be slow or not. That being said, it’s definitely a good [much better] idea to use Python if speed is not an issue. Having closures, rich data types (lists, dicts), and well thought out classes is very nice.

  21. Update 2019: Google announces Unladen Swallow may move out of beta status by the end of the year.

  22. “The most widely deployed implementation is interpreted (i.e. it doesn’t compile all the way down to native machine code), which has significant performance implications. It also does not handle multithreaded code well, since parts of the Python interpreter are inherently single-threaded”