tungwaiyip.info

 

home

about me

links

my software

Media

Yucatán Photos

St Lucia Photos

Photo Album

Videos

Blog

< July 2009 >
SuMoTuWeThFrSa
    1 2 3 4
5 6 7 8 91011
12131415161718
19202122232425
262728293031 

past articles »

Click for San Francisco, California Forecast

San Francisco, USA

 

ctype performance benchmark

I have done some performance benchmarking for Python's ctypes library. I am planning to use ctypes as an alternative to writing C extension module for performance enhancement. Therefore my use case is slight different from the typical use case for accessing existing third party C libraries. In this case I am both the user and the implementer of the C library.

In order to determine what is the right granularity for context switching between Python and C, I have done some benchmarking. I mainly want to measure the function call overhead. So the test functions are trivial function like returning the first character of a string. I compare a pure Python function versus C module function versus ctypes function. The tests are ran under Python 2.6 on Windows XP with Intel 2.33Ghz Core Duo.

First of all I want to compare the function to get the first character of a string. The most basic case is to reference it as the 0th element of a sequence without calling any function. The produce the fastest result at 0.0659 usec per loop.

  $ timeit "'abc'[0]"

  10000000 loops, best of 3: 0.0659 usec per loop

As soon as I build a function around it, the cost goes up substantially. Both pure Python and C extension method shows similar performance at around 0.5 usec. ctypes function takes about 2.5 times as long at 1.37 usec.

  $ timeit -s "f=lambda s: s[0]"  "f('abc')"

  1000000 loops, best of 3: 0.506 usec per loop

  $ timeit -s "import mylib" "mylib.py_first('abc')"

  1000000 loops, best of 3: 0.545 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
              "dll.first('abc')"

  1000000 loops, best of 3: 1.37 usec per loop

I repeated the test with a long string (1MB). There are not much difference in performance. So I can be quite confident that the parameter is passed by reference (of the internal buffer).

  $ timeit -s "f=lambda s: s[0]; lstr='abcde'*200000"
              "f(lstr)"

  1000000 loops, best of 3: 0.465 usec per loop

  $ timeit -s "import mylib; lstr='abcde'*200000"
              "mylib.py_first(lstr)"

  1000000 loops, best of 3: 0.539 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "lstr='abcde'*200000"
              "dll.first(lstr)"

  1000000 loops, best of 3: 1.4 usec per loop

Next I have make some attempts to speed up ctypes performance. A measurable improvement can be attained by eliminating the attribute look up for the function. Curiously this shows no improvement in the similar case for C extension.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd');
           -s "f=dll.first"
              "f('abcde')"

  1000000 loops, best of 3: 1.18 usec per loop

Secondary I have tried to specify the ctypes function prototype. This actually decrease the performance significantly.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "f=dll.first"
           -s "f.argtypes=[ctypes.c_char_p]"
           -s "f.restype=ctypes.c_int"
              "f('abcde')"

  1000000 loops, best of 3: 1.57 usec per loop

Finally I have tested passing multiple parameters into the function. One of the parameter is passed by reference in order to return a value. Performance decrease as the number of parameter increase.

  $ timeit -s "charAt = lambda s, size, pos: s[pos]"
           -s "s='this is a test'"
              "charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.758 usec per loop

  $ timeit -s "import mylib; s='this is a test'"
              "mylib.py_charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.929 usec per loop

  $ timeit -s "import ctypes"
           -s "dll = ctypes.CDLL('mylib.pyd')"
           -s "s='this is a test'"
           -s "ch = ctypes.c_char()"
              "dll.charAt(s, len(s), 1, ctypes.byref(ch))"

  100000 loops, best of 3: 2.5 usec per loop

One style of coding that improve the performance somewhat is to build a C struct to hold all the parameters.

  $ timeit -s "from test_mylib import dll, charAt_param"
           -s "s='this is a test'"
           -s "obj = charAt_param(s=s, size=len(s), pos=3, ch='')"
              "dll.charAt_struct(obj)"

  1000000 loops, best of 3: 1.71 usec per loop

This may work because most of the fields in the charAt_param struct are invariant in the loop. Having them in the same struct object save them from getting rebuilt each time.

My overall observation is that ctypes function has an overhead that is 2 to 3 times to a similar C extension function. This may become a limiting factor if the function calls are fine grained. Using ctypes for performance enhancement is a lot more productive if the interface can be made to medium or coarse grained.

A snapshot of the source code used for testing is available for download. This is also useful if you want a boiler plate for building your own ctypes library.

2009.07.16 [] - comments (3)

 

 

Comments (3)

Could you give some more rationale about why you wouldn't use other approaches like cython, SWIG, psyco etc.?

Posted by Paul at Mon Dec 21 20:44:54 2009



psyco is easy to use and give some instant performance boost. However in my experience the performance gain is typical around 2x. This fall far short of what we can archive with C.

cython, pyrex etc are good candidates to explore. I just haven't got around to learn their syntax yet.

Posted by Wai Yip Tung at Tue Dec 22 07:15:08 2009



Interesting article - you examine an important topic. I agree with the conclusion - indeed calls shouldn't be too fine-grained.

Posted by Eli at Tue Dec 22 20:04:02 2009



Please add your comment

Name:


E-mail (not shown):


Your website (optional):


Comment:


In order to deter spammers, we would like to ask you a question about this article:

Which city am I based in? (hint: check the weather box on the left column.)

past articles »

 

BBC News

 

Spending cuts 'to hit north harder' (09 Sep 2010)

 

Obama condemns Koran burning plan (09 Sep 2010)

 

MPs back new phone hacking probe (09 Sep 2010)

 

Defence firm BAE cuts 1,000 jobs (09 Sep 2010)

 

Chote set to head Budget office (09 Sep 2010)

 

Graduates 'will have to pay more' (09 Sep 2010)

 

UK interest rates remain at 0.5% (09 Sep 2010)

 

British hostage freed in Pakistan (09 Sep 2010)

 

Cameron tribute to 'amazing dad' (09 Sep 2010)

 

Planes nearly collide over London (09 Sep 2010)

more »

 

Slashdot News for nerds, stuff that matters

 

DHS CyberSecurity Misses 1085 Holes On Own Network (2010-09-09T13:24:00Z)

 

Rackspace Shuts Down Quran-Burning Church's Sites (2010-09-09T12:47:00Z)

 

IOS 4.1 Jailbroken Already (2010-09-09T11:57:00Z)

 

Scientists Cut Greenland Ice Loss Estimate By Half (2010-09-09T08:57:00Z)

 

DARPA Wants Extreme Wireless Interference Buster (2010-09-09T06:49:00Z)

 

Film Industry Hires Cyber Hitmen To Take Down Pirates (2010-09-09T04:28:00Z)

 

The Real "Stuff White People Like" (2010-09-09T03:27:00Z)

 

Biometric IDs For All India's Citizens (2010-09-09T02:09:00Z)

more »

 

TechPsychic Tech Rumors and Invented News

 

TechPsychic: AT&T: more money, says it's disruptive in funding from. (08 May 2010)

 

TechPsychic: I know that Apple is close to Apple Dominates, Hires ex-Googler - Yes, Android phones. (08 May 2010)

 

TechPsychic: AT&T says: Facebook Connect. (08 May 2010)

 

TechPsychic: Google's Nexus One of Google Chrome Release Adds Support subscriptions accounted for Amazon: Apple. (08 May 2010)

 

TechPsychic: Another stat: Twitter's Design of this is giving rise of BlackBerry Foursquare Map App store end. (07 May 2010)

 

TechPsychic: Like educational sales Up around Apple iPad makes money Plan costs half an Apple. (07 May 2010)

 

TechPsychic: Instead added extensions, social Networks than double, everyone jumps in Silicon Valley? (07 May 2010)

 

TechPsychic: So why iTunes App lets Social Networks Verizon Wireless Internet. (07 May 2010)

more »

 

SF Gate

 

Oakland's Koreatown roiled by parolee center (2010-09-09T13:13:51UTC)

 

BART connector to Oakland airport inches ahead (2010-09-09T12:40:24UTC)

 

Dems urge Brown to step up fundraising effort (2010-09-09T07:07:44UTC)

 

High court won't order state to defend Prop. 8 (2010-09-09T07:07:17UTC)

 

Regret, apology not part of BP's oil spill report (2010-09-09T09:41:49UTC)

 

Clinton, Gates denounce planned Quran burning (2010-09-08T23:18:40UTC)

 

Official: 2nd killed in flooding swamping Texas (2010-09-08T20:32:22UTC)

 

Muni riders welcome restoration of service cuts (2010-09-08T20:54:56UTC)

 

Fire at pallet company lights up Buffalo, NY, sky (2010-09-09T13:37:39UTC)

 

Stocks continue rally after drop in jobless claims (2010-09-09T13:36:39UTC)

 

Apple to publish guidelines for app approval (2010-09-09T13:34:03UTC)

 

Burger King buyer names future CEO (2010-09-09T13:33:34UTC)

 

Trade deficit narrows to .8 billion in July (2010-09-09T13:33:30UTC)

 

Official condemns German lawmaker comment on WWII (2010-09-09T13:33:21UTC)

more »

 

Asia Times Online

 

ASIA HAND : Exalted Aquino has far to fall (9 Sep 2010)

 

Hariri exonerates Syria over father's murder (9 Sep 2010)

 

There's another side to Obama's COIN (9 Sep 2010)

 

Taliban winning hearts - and more (9 Sep 2010)

 

Ahmadinejad envoys stir trouble at home (9 Sep 2010)

 

Tibetan hope for Obama's India visit (9 Sep 2010)

 

Rusal tries serving up yuan bonds (9 Sep 2010)

 

Turkish strength fragile in referendum run-up (9 Sep 2010)

 

Sri Lanka shuns West, finds solace in East (9 Sep 2010)

 

Searching for yield - at a cost (9 Sep 2010)

 

THE MOGAMBO GURU : Unlimited useless money (9 Sep 2010)

more »

 


Site feed Updated: 2010-Sep-09 07:00