After Python 3.3 was released, I begin to embrace Python 3 in a big way last year. I use it in my personal projects whenever possible, when I do not have to worry about compatibility with existing code base. I use it as the default REPL everyday.
While I am not an early adopter, I am still ahead of many other Python programmers to actually make production use of it. Unfortunately, my experience so far is not favorable (note I have not used unicode string much). I ran into miscellaneous problems. At first I thought it is some learning curve I have to overcome. Slowly I come to realize some design decision are really problemetic.
Recently there are more discussions around the problems related to porting to Python 3, notably from Armin Ronacher of Flask. I want to add my experience to the discussion. Many have been written about the core issue of unicode string handling or the death of binary string. My grievance is from a different area, the systematic replacement of list result by iterators.
Iterator and generator are some of the best feature of Python. In many cases, a list is interchangeable with its corresponding iterator. When the result is large, iterator is certainly more memory efficient. If you want to loop 100 millions times, you probably don't want to build a list of 100 millions items using range, but rather use the iterator version of xrange.
Since iterators are so great, they argue, from now on Python 3 should only provide iterator result.
This turn out to be a huge annoyance for me.
For the start, when I run a function in the interactive console, I don't see the result anymore. I get an iterator object. No big deal, the Python 3 people say, you can render the result by wrapping it in the list function. Fine, it is an inconvenience, a minor inconvenience you may argue. But you just cannot spin it as a positive change. When I first starting using Python when coming from a Java background, I was delighted to find how easy thing is in Python. Why bother to build a chain of stream handler to read a file in Java? The Pythonic way is to build the entire input in memory. 95% of time the data is so small it does not matter. This was the Python magic and this is beginning to lose.
Using the list will be a tolerable workaround if it is exceptional, if it is only needed in a small number of cases. But I have bumped into the wall so many times I begin to think the reverse is more true.
Other than want to see the result in REPL, there are actually a long list of use cases of a list that cannot be conveniently expressed with iterator, like
- building nested data structure
- processes that need the length of the collection
- take just one element, say the first, from the collection
- many libraries anticipate a list rather than iterator (like numpy)
For example, I want to parse a CSV like input into a nested data structure.
In : INPUT = """\ ....: 1,2 ....: 3,4 ....: """.splitlines() In : [map(int, line.split(',')) for line in INPUT] Out: [<builtins.map at 0x321f270>, <builtins.map at 0x321f090>]
Ouch, I got a list of iterators instead. It used to be easy in Python 2 like this.
In : [map(int, line.split(',')) for line in INPUT] Out: [[1, 2], [3, 4]]
The problem is you almost never want a nested data structure with iterators inside. When I accidentally did that, it usually causes a bug a few lines down. I have to dig hard into the data structure to find out what has done wrong.
Trying to pull a value from a dictionary gives me further insult. Sometimes I want inspect a value in a dictionary. Which one does not matter, I just need one. With Python 2, it is d.items(). It will be dumb to write a for loop in Python 3 to do this. As an experienced programmer, I know I can use next(). But this gives me an exception?! How about d.items().next()? Fail. How about d.items().__next__(). It fails too. I spent hours before I found out in Python 3, d.keys() correspond not to iterkeys() but an unfamiliar viewkeys() of Python 2. To get any values, I have to first turn it into an iterator, only then can I apply next. When you apply an extra function like list once, it is an inconvenience. When you have to do it twice or more, it becomes a big clutter and big annoyance.
Python 3 renders the map function nearly useless because of the extra list needed. In Python 2, we often have two alternative to express a similar construct, with the map function or list comprehension. Usually I choose map when there is a function readily available, like int in my example above. But because of the extra clutter of list needed with map, the balance has tipped toward list comprehension decisively in Python 3. I should be thankful because they could have remove the list comprehension too and force me to use generator expressions and list.
The bottom line is this change is strictly feature removal. With Unicode, it is a necessary pain to go through and we gain a predictable unicode handling as a bargain. With iterator, there is no new feature to be gain. Existing code are broken for nothing. All the Python 3 people tell you is just to wrap you function with a list, no big deal.
Enough to say I am not convinced. To me this is torture.
Feature Removal Pain
Just a few days ago I was bitten by another feature removal issue. The sort method used to have a cmp feature that's removed in Python. Oh, it is dumb to use cmp anyway because the implementation using key is faster. Except in my case, I was working on a bioinformatics problem that required sorting all suffix substring of a long string. With a string that's millions of characters long, generating millions of substrings quickly exhaust all memory. This trick is to use cmp to generate and compare the substrings on demand. This may be slower, but it works. Removing cmp not only cause inconvenience, the algorithm breaks with no easy workaround.
I solved the problem by going back to Python 2.
Python 3 is the dead end
The official story line is Python 2 is a dead end. Python 3 is the future. I begin to see it differently. Python 2 is actually alive and well. The development of the language and the interpreter has stalled. But innovation continues in third party library and tools. For example, Pandas is a big progress for Python in the data analysis space.
It is Python 3 we should worry about. I fear it would become a facto dead end because of lack of adoption. Outside of my personal use, there are absolutely no proposal from my workplace about moving to Python 3. Two companies and hundred of programmers I have worked with recently are cranking out Python 2 code everyday, not Python 3. At various time I was considering to championing Python 3 at work. I am not considering this anymore in the near future.
Sorry for the critical opinion. I just wish to open up some honest discussion about the merit of Python 3.
2014.01.23 comments -