tungwaiyip.info

home

about me

links

my software

Media

Yucatán Photos

St Lucia Photos

Photo Album

Videos

Blog

< July 2009 >
SuMoTuWeThFrSa
    1 2 3 4
5 6 7 8 91011
12131415161718
19202122232425
262728293031 

past articles »

Click for San Francisco, California Forecast

San Francisco, USA

 

ctype performance benchmark

I have done some performance benchmarking for Python's ctypes library. I am planning to use ctypes as an alternative to writing C extension module for performance enhancement. Therefore my use case is slight different from the typical use case for accessing existing third party C libraries. In this case I am both the user and the implementer of the C library.

In order to determine what is the right granularity for context switching between Python and C, I have done some benchmarking. I mainly want to measure the function call overhead. So the test functions are trivial function like returning the first character of a string. I compare a pure Python function versus C module function versus ctypes function. The tests are ran under Python 2.6 on Windows XP with Intel 2.33Ghz Core Duo.

First of all I want to compare the function to get the first character of a string. The most basic case is to reference it as the 0th element of a sequence without calling any function. The produce the fastest result at 0.0659 usec per loop.

  $ timeit "'abc'[0]"

  10000000 loops, best of 3: 0.0659 usec per loop

As soon as I build a function around it, the cost goes up substantially. Both pure Python and C extension method shows similar performance at around 0.5 usec. ctypes function takes about 2.5 times as long at 1.37 usec.

  $ timeit -s "f=lambda s: s[0]"  "f('abc')"

  1000000 loops, best of 3: 0.506 usec per loop

  $ timeit -s "import mylib" "mylib.py_first('abc')"

  1000000 loops, best of 3: 0.545 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
              "dll.first('abc')"

  1000000 loops, best of 3: 1.37 usec per loop

I repeated the test with a long string (1MB). There are not much difference in performance. So I can be quite confident that the parameter is passed by reference (of the internal buffer).

  $ timeit -s "f=lambda s: s[0]; lstr='abcde'*200000"
              "f(lstr)"

  1000000 loops, best of 3: 0.465 usec per loop

  $ timeit -s "import mylib; lstr='abcde'*200000"
              "mylib.py_first(lstr)"

  1000000 loops, best of 3: 0.539 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "lstr='abcde'*200000"
              "dll.first(lstr)"

  1000000 loops, best of 3: 1.4 usec per loop

Next I have make some attempts to speed up ctypes performance. A measurable improvement can be attained by eliminating the attribute look up for the function. Curiously this shows no improvement in the similar case for C extension.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd');
           -s "f=dll.first"
              "f('abcde')"

  1000000 loops, best of 3: 1.18 usec per loop

Secondary I have tried to specify the ctypes function prototype. This actually decrease the performance significantly.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "f=dll.first"
           -s "f.argtypes=[ctypes.c_char_p]"
           -s "f.restype=ctypes.c_int"
              "f('abcde')"

  1000000 loops, best of 3: 1.57 usec per loop

Finally I have tested passing multiple parameters into the function. One of the parameter is passed by reference in order to return a value. Performance decrease as the number of parameter increase.

  $ timeit -s "charAt = lambda s, size, pos: s[pos]"
           -s "s='this is a test'"
              "charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.758 usec per loop

  $ timeit -s "import mylib; s='this is a test'"
              "mylib.py_charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.929 usec per loop

  $ timeit -s "import ctypes"
           -s "dll = ctypes.CDLL('mylib.pyd')"
           -s "s='this is a test'"
           -s "ch = ctypes.c_char()"
              "dll.charAt(s, len(s), 1, ctypes.byref(ch))"

  100000 loops, best of 3: 2.5 usec per loop

One style of coding that improve the performance somewhat is to build a C struct to hold all the parameters.

  $ timeit -s "from test_mylib import dll, charAt_param"
           -s "s='this is a test'"
           -s "obj = charAt_param(s=s, size=len(s), pos=3, ch='')"
              "dll.charAt_struct(obj)"

  1000000 loops, best of 3: 1.71 usec per loop

This may work because most of the fields in the charAt_param struct are invariant in the loop. Having them in the same struct object save them from getting rebuilt each time.

My overall observation is that ctypes function has an overhead that is 2 to 3 times to a similar C extension function. This may become a limiting factor if the function calls are fine grained. Using ctypes for performance enhancement is a lot more productive if the interface can be made to medium or coarse grained.

A snapshot of the source code used for testing is available for download. This is also useful if you want a boiler plate for building your own ctypes library.

2009.07.16 [] - comments

 

 

blog comments powered by Disqus

past articles »

 

BBC News

 

Kiev protest blast wounds 100 police (31 Aug 2015)

 

Arrests as Austria tightens border (31 Aug 2015)

 

Palmyra temple 'still standing' (31 Aug 2015)

 

US horror filmmaker Wes Craven dies (31 Aug 2015)

 

Bangkok police claim bomber hunt reward (31 Aug 2015)

 

Brazil WhatsApp mayor on the run (30 Aug 2015)

 

Obama renames Mount McKinley as Denali (31 Aug 2015)

 

Taliban admit Mullah Omar death silence (31 Aug 2015)

 

Boko Haram 'spreads to Lagos' (31 Aug 2015)

 

India lifts ban on Jain fast to death (31 Aug 2015)

more »

 

Slashdot News for nerds, stuff that matters

 

Over 225,000 Apple Accounts Compromised Via iOS Malware (2015-08-31T13:40:00+00:00)

 

OnHub Router -- Google's Smart Home Trojan Horse? (2015-08-31T12:56:00+00:00)

 

Unearthed E.T. Atari Game Cartridges Score At Auction (2015-08-31T12:13:00+00:00)

 

"McKinley" Since 1917, Alaska's Highest Peak Is Redesignated "Denali" (2015-08-31T11:28:00+00:00)

 

T-Mobile Starts Going After Heavy Users of Tethered Data (2015-08-31T08:29:00+00:00)

 

Brain Cancer Claims Horror Maestro Wes Craven At 76 (2015-08-31T05:33:00+00:00)

 

F-35 To Face Off Against A-10 In CAS Test (2015-08-31T02:46:00+00:00)

 

CenturyLink Takes In Subsidies For Building Out Rural Broadband (2015-08-30T23:55:00+00:00)

more »

 

TechPsychic Tech Rumors and Invented News

more »

 

SF Gate

 

Bay Area News (7 Jan 2012)

 

City Insider (11 Feb 2012)

 

Crime Scene (13 Feb 2012)

 

C.W Newius Column (10 Jan 2012)

 

C.W. Nevius Blog (11 Feb 2012)

 

Education News (10 Jan 2012)

 

KALW (11 Feb 2012)

 

Matier and Ross Blog (11 Feb 2012)

 

Google’s new Wi-Fi router sleek, but has a few hiccups (31 Aug 2015)

 

Fans rule at StubHub office in San Francisco’s SoMa (30 Aug 2015)

 

Would SF Prop. F spur Airbnb suits, with neighbor suing neighbor? (30 Aug 2015)

 

Best Bluetooth headphones of 2015 (30 Aug 2015)

 

VMworld kicks off at Moscone; will VMware appease the faithful? (29 Aug 2015)

 

Daily Briefing, Aug. 30 (29 Aug 2015)

more »

 

Asia Times Online

 

China ramps up charges against Zhou (Fri 20 Mar 2015 11:00:00 GMT)

 

'100 dead' in Myanmar fighting (Fri 20 Mar 2015 11:00:00 GMT)

 

Tunisian president vows no mercy (Fri 20 Mar 2015 11:00:00 GMT)

 

SPENGLER Israel's 'referendum' on 'two-state solution' (Fri 20 Mar 2015 11:00:00 GMT)

 

Russia, S Ossetia sign 'integration' pact (Fri 20 Mar 2015 11:00:00 GMT)

 

US military plunges Aquino into crisis (Fri 20 Mar 2015 11:00:00 GMT)

 

Rahmon celebrates Tajik democracy (Fri 20 Mar 2015 11:00:00 GMT)

 

THE BEAR'S LAIR Being old in 2040 no fun (Fri 20 Mar 2015 11:00:00 GMT)

 

China grant boosts Nepal ties (Fri 20 Mar 2015 11:00:00 GMT)

more »

 


Site feed Updated: 2015-Aug-31 07:00