tungwaiyip.info

home

about me

links

Blog

< September 2006 >
SuMoTuWeThFrSa
      1 2
3 4 5 6 7 8 9
10111213141516
17181920212223
24252627282930

past articles »

Click for San Francisco, California Forecast

San Francisco, USA

 

blog maintenance

I have been slow in maintaining my blog. At last I have added tags to my blog entries. I am glowing tiring of the blogging software PyBlosxom. End up I couldn't used the tagging plugin because some slightly different extension of mine. It is not difficult to write the code myself. But I think PyBlosxom get thing more difficult than they need to be.

Anyway I think I have made my site slightly better today.

2006.09.12 [] - comments

 

The Problem With Developer Unit Testing

It was a while back when I attended the Developer Testing Forum. The were many good presentations about what developer testing can do for us. It was closed with a captivating presentation by Kent Back. They were all good. Yet I left somewhat unsatisfied.

For one thing, most of us are already developer testing enthusiasts. It was really preaching to the converted. Most topics are rather uncontroversial. We all want more unit testing. Even pointy haired boss would say so.

The real question is, why doesn't everybody do it?

Here are a few common objections and my responses.

  1. We have no time - that's sad, and it will probably be paid later.
  2. Our software is monolithic and tightly coupled, making it unsuitable for unit testing - we know this is no way to build software, loosely couple is the way to go
  3. Unit testing is difficult

Unit testing is difficult? As a developer testing advocate, I maintain it is simple and beneficial. On the other hand there is also some difficulties I do not want to downplay. Perhaps this is the right time for me to lay them out.

1. It is hard to write program that can be controlled by another program.

It is hard enough to find programmers who can write good program that functions according to a spec. I think it is a lot harder to expect programmer to write program that be also be controlled by another program, which is what unit testing requires.

There is one class of program that can be easily controlled by another program, that is utilities. This is not surprising because by definition utilities is a small piece of code to be called by another program to carry out some functions. There is also little surprise that many literatures focus on testing utilities, which is the low hanging fruit in unit testing.

Unfortunately most of the code written are not utilities. They interact with users, read and write to the database and the disk, control the flow of other processes. They react to and change system states. Testing these code requires design and instrumentation and demand more sophistication from programmers.

2. Techniques are in its infancy and are not well publicized

Since utilities are easy to test, one thing to do is to identify logic embedded in main body of code that is better isolated into utility functions. Queuing, sorting, searching and various algorithm are everywhere. It will a big win to isolate them just by the virtue that a feature can be tested much more easily.

The point I want to make is there are techniques like above for better unit testing. But they are often in its infancy and are not well publicized. Just like good programming, good unit testing is not easy to come by.

3. Extra level of abstraction can complicates design

Since we cannot always replicate the exact context for invoking the tested code, we often parameterize the code so that we can pass in a mocked environment. For example fetch_record() may means fetch from the database in the module. But since we rather not use the database in unit testing we parameterize it to take an extra resources parameter. In runtime context it will be the database. But in unit test context it will be an artificial data source.

In some case we find the added layer of abstraction actually improve the code because it is no longer hard coded to work in a single context. On the other hand flexibility come at a cost, no matter how small it may be. Having a few configuration variables and the complexity starts to multiply.

While complex and monolithic code is obviously undesirable, how is highly flexible, highly configurable code compares to a piece small and well integrated code? When simplicity is a virtue, flexibility can be a liability.

4. Test to death

Unit testing is a key tool in agile programming. You can be bold in refactoring when there is a suite of unit tests to back you up. Ironically extensive set of unit testing can also become the cement that harden your code and make changes difficult. If you diligently add tests for every subprocess of your code, chances are a lot of them will break when you change the code. It doesn't necessary mean your code has problem, it only means you have to maintain the test code and adapt it to the new version. It is arguably more difficult than changing to code itself because first you have to distinguish between a real problem from symptoms than merely need up keeping. On one hand unit testing give you the confidence to refactor. On the other hand too detailed testing can bog you down.

Finding the right level of test that cover the function but not so rigid that it breaks easily is also an art to master.


I have cited these few problems I have experienced not to suggest developer testing is not valuable. Instead I want to bring up a discussion that there are skills and technique required for successful testing. Also many issues can be mitigated by better tools and better programming language like AOP.

2006.04.15 [] - comments

 

Open Source Development Platform

As a software developer I am a strong advocate of open source software. They are used extensively both at my work and for my private projects. In retrospect, open source platform, often referred to as LAMP, has long past the stage being just an useful extension to proprietary software. It has become my dominant development platform. If I were to build a server application today, it would make very little sense for me to consider Microsoft servers. Why choose a framework that leave you with a single tools vendor. LAMP has proven to be technical viable, cost nothing to experiment and distribute, and more importantly I trust them because of its openness. Nowadays I need little justification to pick LAMP over Microsoft.

What a big leap from just a few years ago when it looks like Microsoft is going to take over the world.

2005.11.01 [, ] - comments

 

Python's half open index notation

Beginner programmers often wonder about Python's sequence indexing and slicing notation. Array index starts from 0. Slicing uses half open notations, where L[a:b] is a subsequence with index x where a <= x < b.

Why is the endpoint excluded? Isn't it more intuitive if array index starts from 1 and the endpoint is included, so that a 3 elements array is referenced as L[1:3] with items L[1], L[2], L[3]?

It turns out this notation is an elegant and deliberate design and it has some excellent properties.

We write programs to operate on arrays, to find their length, traverse the subsequences, split them or join them. The half open notation always show a simple pattern. But the inclusive notation often requires adding 1 or substracting 1 to the indexes in many operations. Thus it is more vulnerable to off-by-one-error. This article One True Way of array indexing discuss this at length. I have reproduce its example (with corrections) below:

Operation Half open Inclusive
length of a slice L[a:b] b-a (b-a+1)
first n characters of L L[:n] L[1:n]
last n characters of L L[-n:] L[len(L)-n+1:]
The identity slice L == L[0:len(L)] == L[:] L[1:len(L)]
The empty slice L[a:a] is empty for any a. perhaps L[a:a-1]?
A slice of length n, from point a L[a:a+n] L[a:a+n-1]
Split L[a:b] at index c L[a:b] == L[a:c]+L[c:b] L[a:c-1]+L[c:b]

Another important property is an empty sequence can be expressed by L[a:a], while there is no natural way to express an empty sequence with the inclusive notation. But do we really need to care about a special case? Absolutely! In fact failure to account for empty input is one of the most common error. Just like zero is a fundamental concept in mathematics, always think how you program can handle null input. An inferior approach is to represent empty sequence by None or null pointer. This creates a special case so that a variable need to be tested before dereferencing. Failure to do so contributes to unexpected exceptions. It is an elegant design that L[a:b] can also represent sequences with 0 length.

C++'s STL also choose this notation to represent a range. According to the literature this is crucial because "algorithms that operate on n things frequently require n+1 positions. Linear search, for example (find) must be able to return some value to indicate that the search was unsuccessful." I have seen so many people flunked link list or data structure exercises because they have trouble dealing with the end of a list. Often a good solution is shift the focus beyond the n concrete objects to the n+1 positions around them. I hope this help to make sense of the half open notation.

2005.06.16 [, ] - comments

 

PyCon2005 day 3

  • The third day's keynote is delivered by Greg Stein from Google. He gave some insight about evangelizing Python in his last few companies. Small companies are more readily to adopt Python and consider it a competitive advantage. Whereas large company would hold on until the support environment is present. Nevertheless he believes the growth of Python has passed the tipping point and it was never a problem to train any new programmer Python.

    He went on to describe the use of Python in Google and emphasized SWIG as a great glue for integrating code build using various languages.

  • Andi Vajda, whom's search engine PyLucene is what powers my MindRetrieve project, is giving a talk in PyCon. He outlined the challenges to compile a Java application into C executable and making it into Python extension library using GCJ and SWIG. The issues including different memory management, different thread model and cross language error reporting. The success of PyLucene draw a lot of interests in compiling other Java projects into executable and provide more language binding.

  • I enjoyed yesterday's lightning talks so much that I have stepped up to demonstrate my own MindRetrieve project today. Again the room was packed. I'm glad that I went thought the 5 minutes presentation reasonably well and as at least a few people seems to appreciate my idea.

    Geek biker Peter Kropf has made a cross country motorcycle trip. With the bike was a custom built hardware censors and cameras recording everything. He made his videos available on his website .

    Chris Tismer shown a web demo using stackless Python to maintain server state. Stateless Python sounds like a mystery. But his few lines of code is a great introduction.

I thoroughly enjoyed this three days of PyCon, met lots of great people and learned a whole lot. I cherish this supportive open source community and look forward to more exciting development in the coming year.

Read more about day 1, day 2 and day 3 of PyCon.

2005.03.25 [, ] - comments

 

PyCon2005 day 2

  • Guido van Rossum delivered the State of Python keynote on the second day. First he mentioned a security issue in the Python standard library was reported recently. While the scope of this issue is limited, this has prompted the development team to setup a structure to response to future security problems. He then described some incremental improvement proposed. This is followed by some contentious "optional static type checking proposals". We can expect Python would continue its slow growth policy with few major change in coming releases.

  • I am missing more formal sessions because of the continuous discussion of web development in python. Shannon Behrens is giving a improvised tutorial of his Aquarium web framework to a user. Using this fairly straightforward framework he has covered the essence of web development within an hour. The Aquarium framework is comprised of mere a few thousands lines of code. This gave another perspective to the framework proliferation problem. Python is so productive that it is well within a single talented developer's capability to build a complete framework.

    The open source movement give great opportunities to geeks to produce and contribute independently. But that could also leads to divergence is most apparent in Python's web development environment. A truly successful project will need not only technical excellency but also the ability to find consensus and to build coalitions.

  • Richard Jones has shown us the Roundup issue tracker. It seems to be well build and has rich functionalities. If you are starting a new project it is definitively an alternative to Bugzilla. Another similar project mentioned is trac with also has subversion integration.

  • PyCon has two sessions of Lightning talks make of of a series of informal 5 minutes presentations. This provides a low pressure environment and encourages people to show case smaller projects or ideas that might not warrant a full session. Given its unofficial nature I'm surprised to find the lightning talks is actually very well attended.

    Armin Rigo has demonstrated a neat collect class that build a sequence from iterator on demand.

    The Holger Krekel and Armin Rigo team has even more neat tools to show. The rlcomplete2 seems to be a must have command line completion tool. shpy enable people on two different computers to share screen and edit the same file simultaneous. That's what you call pair programming!

    Wayne Yamamoto from Rustic Canyon Partners come to solicit talents to build startups base specifically on Python technologies. So far the Python community seems to be remarkably uncommercial. Many being merely closet Pythonistas. I think we really need to do more to let the larger world know how incredibly productive these Python technologies really are.

  • A few more sessions worth mentioning. Christopher Gillett from Compete Inc described the use of Python for large scale data mining. Michael Salib try to save all of us from the software patent machine. He has built a US Patent Database using Xapian as the search engine. Anna Ravenscroft shown us some important libraries dealing with date and time including Dateutil and pytz.

Read more about day 1, day 2 and day 3 of PyCon.

2005.03.24 [, ] - comments

 

PyCon2005 day 1

I am really excited to go to PyCon for the first time. This is some notes about what happened in this 3 day conference in Washington D.C.

  • PyCon2005 starts with a keynote from Jim Hugunin from Microsoft, who started the IronPython project that ports Python on Microsoft's .NET platform.

    Coming from Microsoft automatically put one into defensive when confronted with the non-Microsoft community. Jim certainly knows when to crack Microsoft jokes and what to say when a demo crash. Putting this aside he did delivered some great demos and made a strong case about the value of python on .NET platform. On the other hand I can't help thinking about how much ill will and negative publicity Microsoft has created.

  • The next interesting session is Holger Krekel talking about a novel testing tool py.test. He find the JUnit inspired unittest.py clumsy to use. With his test tool, user create test cases just using the assert statement, instead of the function call based unittest module, which he find quite clumsy. He then done some clever analysis when there is exception and generate an informative report.

    He then went on to show another tool that bring a twist to RPC. Instead of the usual approach of transferring the objects to the remote host, he simply creates a two way channel and let the local and remote code communicates in their own way. Smart tool! Unfortunately the website http://codespeak.net/py seems to be down throughout the conference.

  • Next session Grig Gheorghiu cover a lot of ground about agile testing. He touches on various tools and the XP principles. Finally he demonstrated using wiki to let customers design test cases and provide instant testing and feedback. Don't you think all software should some have something like that? Check out FitNesse and Selenium.

  • I really love the PyWebOff presentation Michelle Levesque gave in the afternoon. She hit the nail on the head that having far too many web application frameworks in Python causes great confusion to the users. It was a fabulous and very entertaining presentation. The message is clear, users need a clear guidance on what framework to use given certain requirements.

  • Ian Bicking's talk about WSGI is exactly an effort to bring order to chaos about the proliferation the frameworks. While it is good to define a standard interface between certain layers, it is less clear to me if this effort would weed out the number of frameworks, at least not in the short run.

I think Python is missing the opportunity to establish itself as a premier web development platform due to these issues. Otherwise it could easily double or triple its user base. Instead it is losing market to some less capable tools like PHP. I was so passionate about this problem that I have spent most of the afternoon discussing this in open sessions rather than attending talks.

Read more about day 1, day 2 and day 3 of PyCon.

2005.03.23 [, ] - comments

 

David Ascher's paper on Dynamic Languages

For the past year I have been engaged with the Python language and have very much impressed by it. So it is excited to find a recent white paper from David Ascher to speak for dynamic languages, a term he coined for the class of languages such as Perl, Python, PHP, etc, which are often referred to as scripting languages. He observed that these languages are widely used beyond the scripting area and their dynamic nature is really what set them apart from the system language such as C++ and Java.

The most interesting part in his paper is he look more than the technical competence but also the social aspect as a defining characteristics. These languages all have primary implementation in open source model and have active grassroots participation. Being open source also make them fertile ground for experimentation in academic language research. Despite having nearly no formal budget, they are able to evolve and succeed against other corporate made development tools.

I believe dynamic language is going to play an important role in the future of computing. And I see this paper to serve as the "The Cathedral and the Bazaar" in the programming language domain.

2004.09.09 [, ] - comments

 

Search Using Lucene

After tinkering with the Lucene search engine a bit, I have setup a search button for this website. It does require some Java coding to get it to index and search. But it is a fun exercise. The layout of the website is getting complicated and will need some rearrangement.

2004.08.27 [] - comments

 

PDF Hacks

Just find out an upcoming O'Reilly book PDF Hacks. Like it or not PDF is a very common media format. The free acrobat reader is so limited and does not provide any editing functionality. There are so much I want to do to make the PDF publishing more usable. Check out hack 4: Speed Up Acrobat Startup. I follow the instruction and removed unnecessary plugins. Right away the launch speed improved dramatically. I highly anticipated the publishing of this book.

2004.08.11 [, , ] - comments

 

Bye bye Bash

Finally got rid of my backup script written in Bash and replaced it with a Python one. I have enough trouble dealing with Bash and I am not looking forward to use it again. Influenced by a passionate crowd, I started working with Linux enthusiastically one year ago. I came around to learn everything about Linux, Vi, Bash, anticipating that they would be great solution to many computing problems.[more...]

2003.12.22 [, ] - comments

 

Top language in Google Code Jam

Google Code Jam 2003 is a programming contest open to people around the world. Participants has a limited time to solve problems using one of these programming languages, C++, Java, C# or VB.NET. Correct solutions submitted soonest get the highest score.

From now to October 17 is a practice period. Several hundred participants have already tried to solve the sample problems. I collected some interesting statistics from the practice arena. Java was the top language choice used by almost half of the entries. Others chose C++ and C#. A few used VB.

LanguageUsed
C++ 26%
Java48%
C# 20%
VB 6%

However, the number for the top 20 scorers is quite different.

LanguageUsed
C++ 60%
Java20%
C# 20%
VB 0%

C++ programmers dominated the top 20. Java programmers shown disappointed result, while C# programmers got a fair share of top scorers.

Some of the best C++ solutions are great example of generic programming. STL is also lot more powerful than Java's collection utilities. While STL and template can be dauntingly complex, the C++ wizards show that they can be very productive tools.

2003.10.02 [] - comments

 

New Language Features in JDK 1.5

This slashdot posting points to an interview with Joshua Bloch on the New Language Features in JDK 1.5. Generics is a much awaited enhancement. Today, without the language support, I ask every programmer to minimally document the data type used in a comment, like [more...]

2003.05.09 [, ] - comments

 

BBC News

I was looking for a RSS feed from major new sources. There really isn't much available. I guess wired news is a revenue source for news agency so they may not want to provide it for free. Anyway BBC provides a news headline for wireless browsers. It is simple HTML and is rather close to RSS. I used this for a news box.[more...]

2003.04.23 [] - comments

 

RSS News feeder added

Thanks Mark Pilgrim, who has posted a Python RSS parser. With this parser, it is trivial to add news feed boxes to the home page. I added some news feed from O'Reilly to this web site for the start. Now the home page look more interesting! Still have to work on the scheduling of feeding.

2003.04.22 [, ] - comments

 

Python screen scrapper

At last I have built the application I always wanted, a weather and news email alert for my mobile phone. While my T-Mobile phone is capable of internet browsing, the connection is so poor that it make web surfing an unenjoyable experience. The email alert deliver the information and does not incur the painful delay in interactive browsing.

This is mostly a HTML screen scrapper in only 200 lines of Python code. Within a few hours I have success in the implementation. That is while I am still learning the language! If it is not so easy because of Python this would probably remain on the drawing board for a long time.

2003.04.21 [, , ] - comments

 

past articles »

 

BBC News

 

Prague gunman killed himself on roof as police approached (22 Dec 2023)

 

Bodycam footage shows police hunting Prague gunman (22 Dec 2023)

 

Alex Batty: Police launch abduction investigation into disappearance of British teen (22 Dec 2023)

 

Banksy stop sign drones art removed in London (22 Dec 2023)

 

Martin Kemp refunds disabled ticket after fans' difficulty with seller (22 Dec 2023)

 

Queues at Dover as Christmas getaway begins for millions (22 Dec 2023)

 

New £38,700 visa rule will be introduced in early 2025, says Rishi Sunak (22 Dec 2023)

 

UK at risk of recession after economy shrinks (22 Dec 2023)

 

Mohamed Al Bared: Student jailed for life for building IS drone (22 Dec 2023)

 

Andrew Tate denied request to visit ill mother in UK (22 Dec 2023)

more »

 

SF Gate

more »


Site feed Updated: 2023-Dec-22 09:00