2009-07 - 戇人日記

Blog

San Francisco, USA

今天好辛苦地從網上找到一堆中國方言的例子，紀錄下來跟大家分享一下。我只懂粵語，所以好不容易才找都其他方言的紀錄，如有什麼地方弄錯請見諒。

暗殺者 (台語)

阿明半暝轉來到厝，老母猶未睏teh等伊，入門tō問：“敢有影？美國á kap咱斷交”“有影lah，新聞teh報。”“是按怎beh kap咱斷？”阿明無想著老母thàn食人賣蚵te，也會問這款kap美國á外交ê問題，tshìn-tshái tō應：“Khi-moo-tsih bái tō斷。” “Ah有要緊無？”這日有人客買蚵te，kā伊tâu:“Tann台灣知慘a。”所以這暗蔥命猶m̄睏，專工等阿明轉來，beh問伊有要緊無。“阿母，你ta̍k日tī巷á頭煎蚵te，我tī市á賣虱目魚丸湯，hit個美國á總統腳踏，敢成實bat來交關一擺？斷無斷，生理照作，有啥關係？…作你去睏lah，免煩惱tse有ê無ê。”蔥命坐hia想，這個美國á總統mā是真無情，huah斷tō斷。

http://tailo.fhl.net/Tanlui/Tanlui41.html

蘇州話

搿么今朝头正好，倪一淘去哉啘。

http://www.suzhouhua.org/home/jc/nwts.asp

茶山情緣 (客家)

阿土嫂唱：老公逐日走四方，遊遊野野真放蕩。厓無閒直掣捩捩轉，汝毋曉佬厓來幫忙。阿土哥：晡娘姐，汝莫怨怪來莫受氣，換領靚衫厓渡汝上街去。

溫秀琴：有啦有啦！蘭英該額恩俚較煞猛兜仔匉忒佢，佢个工錢會分恩俚兩儕匉啦！溫秀鳳：恁樣還差毋多！

http://www.hakka.gov.tw/public/Attachment/813111471371.pdf

每日一句 ---- 上海闲话

侬晓得伐,昨捏化半体落度雨来,度是度的来交关陌路散接撕了,侬看到伐,今早地思里刚,有一部帕萨特搓子雨司从门分里进到里巷,搓子里巷蚕撕司,老哈拧格.

http://travellife.org/forum/viewthread.php?tid=51924&highlight=

哈我讲温州话！(是溫州話嗎？)

亡天阿~短命儿啊~你非死网吧里那,你阿伯哈你急活那,哈我过做起,把你生类那~人死亡啊,哈我肚生大沃那.

http://www.703804.com/viewthread.php?tid=323443&page=1#pid5012097

旅遊人生討論區 (粵語)

幫襯佢一杯野飲就可以攤係個海邊耐耐GAM 吹住海風睇住日落落水浸浸身再上番黎飲啖野飲><

而如果係酒店既住客就有優先啦~~~ YESSS ~ 唔駛去旅行都霸位呀MA真係><

http://travellife.org/forum/viewthread.php?tid=52826&highlight=

2009.07.19 [language, chinese] - comments

ctype performance benchmark

I have done some performance benchmarking for Python's ctypes library. I am planning to use ctypes as an alternative to writing C extension module for performance enhancement. Therefore my use case is slight different from the typical use case for accessing existing third party C libraries. In this case I am both the user and the implementer of the C library.

In order to determine what is the right granularity for context switching between Python and C, I have done some benchmarking. I mainly want to measure the function call overhead. So the test functions are trivial function like returning the first character of a string. I compare a pure Python function versus C module function versus ctypes function. The tests are ran under Python 2.6 on Windows XP with Intel 2.33Ghz Core Duo.

First of all I want to compare the function to get the first character of a string. The most basic case is to reference it as the 0th element of a sequence without calling any function. The produce the fastest result at 0.0659 usec per loop.

  $ timeit "'abc'[0]"

  10000000 loops, best of 3: 0.0659 usec per loop

As soon as I build a function around it, the cost goes up substantially. Both pure Python and C extension method shows similar performance at around 0.5 usec. ctypes function takes about 2.5 times as long at 1.37 usec.

  $ timeit -s "f=lambda s: s[0]"  "f('abc')"

  1000000 loops, best of 3: 0.506 usec per loop

  $ timeit -s "import mylib" "mylib.py_first('abc')"

  1000000 loops, best of 3: 0.545 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
              "dll.first('abc')"

  1000000 loops, best of 3: 1.37 usec per loop

I repeated the test with a long string (1MB). There are not much difference in performance. So I can be quite confident that the parameter is passed by reference (of the internal buffer).

  $ timeit -s "f=lambda s: s[0]; lstr='abcde'*200000"
              "f(lstr)"

  1000000 loops, best of 3: 0.465 usec per loop

  $ timeit -s "import mylib; lstr='abcde'*200000"
              "mylib.py_first(lstr)"

  1000000 loops, best of 3: 0.539 usec per loop

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "lstr='abcde'*200000"
              "dll.first(lstr)"

  1000000 loops, best of 3: 1.4 usec per loop

Next I have make some attempts to speed up ctypes performance. A measurable improvement can be attained by eliminating the attribute look up for the function. Curiously this shows no improvement in the similar case for C extension.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd');
           -s "f=dll.first"
              "f('abcde')"

  1000000 loops, best of 3: 1.18 usec per loop

Secondary I have tried to specify the ctypes function prototype. This actually decrease the performance significantly.

  $ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
           -s "f=dll.first"
           -s "f.argtypes=[ctypes.c_char_p]"
           -s "f.restype=ctypes.c_int"
              "f('abcde')"

  1000000 loops, best of 3: 1.57 usec per loop

Finally I have tested passing multiple parameters into the function. One of the parameter is passed by reference in order to return a value. Performance decrease as the number of parameter increase.

  $ timeit -s "charAt = lambda s, size, pos: s[pos]"
           -s "s='this is a test'"
              "charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.758 usec per loop

  $ timeit -s "import mylib; s='this is a test'"
              "mylib.py_charAt(s, len(s), 1)"

  1000000 loops, best of 3: 0.929 usec per loop

  $ timeit -s "import ctypes"
           -s "dll = ctypes.CDLL('mylib.pyd')"
           -s "s='this is a test'"
           -s "ch = ctypes.c_char()"
              "dll.charAt(s, len(s), 1, ctypes.byref(ch))"

  100000 loops, best of 3: 2.5 usec per loop

One style of coding that improve the performance somewhat is to build a C struct to hold all the parameters.

  $ timeit -s "from test_mylib import dll, charAt_param"
           -s "s='this is a test'"
           -s "obj = charAt_param(s=s, size=len(s), pos=3, ch='')"
              "dll.charAt_struct(obj)"

  1000000 loops, best of 3: 1.71 usec per loop

This may work because most of the fields in the charAt_param struct are invariant in the loop. Having them in the same struct object save them from getting rebuilt each time.

My overall observation is that ctypes function has an overhead that is 2 to 3 times to a similar C extension function. This may become a limiting factor if the function calls are fine grained. Using ctypes for performance enhancement is a lot more productive if the interface can be made to medium or coarse grained.

A snapshot of the source code used for testing is available for download. This is also useful if you want a boiler plate for building your own ctypes library.

2009.07.16 [python] - comments

Exploring parallel programming

When I first started blogging, I have a lot of geeky posting about software development. The idea of blogging is to write on short subjects frequently. It is different from writing a long article, which is so burdensome many people end up not doing it at all. But in practice, I have not lived up to expectation of a prolific blogger. Sometimes the gap between postings can be a few months.

Anyway this is the quick posting on geeky subject that I think I should do more. Recently I am working on high performance computing and number crunching task. I have dipped into writing C extension for Python for performance. I got spectacular result from the exercise. It is the best of both world. Programming in Python supplement by C is still very agile and iterative. On the other hand C is giving the performance unachievable with pure Python.

I am started to look into the multiprocessing module introduced in Python 2.6. It is a library to run multiple python processes. With an interface similar to the threading module, it is positioning itself as an alternative approach to threading to circumvent the limitation of Python's GIL. I haven't give much thought to it initially. But looking closer, I find it is really more than just an alternative to threading. It is really a parallel programming framework. For example, the method Pool.map() is really a prototype of a mapreduce framework. I am quite excited to explore this module in the coming days.

Another module I am looking at is the ctypes module. It allows Python code to call C functions directly. Again I haven't thought much about it when it was introduced in Python 2.5. That's because I haven't have a use case of it. Now that I am writing C module, I begin to realize how revolutionary it is. C module delivery great performance. But writing one is a pain. Just the need to keep track of reference count may double the number of lines of your code. For me I am not building a reusable module for other people. It is one off function for me to do number crunching. Using ctypes may allow me easier access to C without the burden of actually writing a Python module.

2009.07.12 [python, tech] - comments

Good day

8pm in the evening, it was those time of the day when sunlight shine horizontally from the west. Looking out of the window, it illuminated distance objects in a glowing golden light. The crown of the tree outside was highlighted today. A little bit of drizzle coming out of nowhere has just passed. There was just enough moisture in the air for a rainbow to emerge. For a few minutes a beautiful picture has appeared in front of my eyes. I feel thankful for a having a good day.

2009.07.11 [personal] - comments

Angkor Geography

Just to follow up with more fun with Google Earth. I have captured another picture a short distance to the northeast of Angkor. The are many dots scattered in this territory that look like bomb craters. Unlike the fields to the south, there seems to have little trace of human in this area. What are these dots really?

From two independent sources I have found enough information to identify this geography feature. If you have the July 2009 issue of National Geographic, you can check the hand drawn map to see what have they depicted in the northeast area. I have also found one picture from Google Earth that gives ground level perspective. Unlike the Angkor templates of historical fame there aren't thousands of pictures taken there. I have to look hard to find one single picture. But what I saw confirms the illustration from National Geographic.

The answer is they are natural ponds. It appears that they only filled up seasonally. When the pictures was taken they appeared to be mostly dry.

2009.07.05 [geography] - comments

Angkor Travel

I have spent this evening travelling in the ancient city of Angkor. Unfortunately I am not able to set foot in this land yet. Instead I have visited Angkor virtually using Google Earth from my home. It is a small pity that I wasn't going there personally. But what an incredible experience does Google Earth provide! The aerial image gives me so much intelligence on this distance land that even travelling on the ground cannot achieve. He go here.

All this was prompted by an article on the collapsed of Angkor on the current issue of National Geographic. I launched Google Earth to find an image of modern day Angkor to compare with the illustration of a historical map. The glory of the 13th century city is still clearly visible from the satellite.

The square at the center is the site of the ancient capital Angkor Thom, now filled with dense jungle in dark green color. The surrounding area are fields with little population. Siem Reap to the south is the only population center today. The most striking features on the map are several large rectangles. I have labeled three of them, the West Baray, East Baray and North Baray. They are actually enormous reservoir built in ancient time. The largest one West Baray measures 2 x 8 km. Today they are mostly filled up. There are still water in the west side of West Baray. Otherwise they are filled will fields. There are even roads and houses built within the Baray. National Geographic has suggested the failure of this water system during a mega-drought period is one reason for the decline of Angkor.

You can also find clusters of blue dots on the image. They point to pictures uploaded by users. These are actually enormously useful information. First of all you cannot fully understand the area by looking at the aerial picture alone. The user pictures give you detail ground level view and help you to understand some geographic features spotted from the sky. Secondly it tells you where tourists are going! An area with a dense cluster of blue dots are picture worthy places. This is an instant travel guide constructed automatically!

Since Google Earth has come out a few years ago, it has helped me so much in learning about places and geography. It is a really revolutionary tools!

2009.07.03 [travel, geography] - comments

past articles »

BBC News

•	Prague gunman killed himself on roof as police approached (22 Dec 2023)
•	Bodycam footage shows police hunting Prague gunman (22 Dec 2023)
•	Alex Batty: Police launch abduction investigation into disappearance of British teen (22 Dec 2023)
•	Banksy stop sign drones art removed in London (22 Dec 2023)
•	Martin Kemp refunds disabled ticket after fans' difficulty with seller (22 Dec 2023)
•	Queues at Dover as Christmas getaway begins for millions (22 Dec 2023)
•	New Â£38,700 visa rule will be introduced in early 2025, says Rishi Sunak (22 Dec 2023)
•	UK at risk of recession after economy shrinks (22 Dec 2023)
•	Mohamed Al Bared: Student jailed for life for building IS drone (22 Dec 2023)
•	Andrew Tate denied request to visit ill mother in UK (22 Dec 2023)

SF Gate

暗殺者 (台語)

蘇州話

茶山情緣 (客家)

每日一句 ---- 上海闲话

哈我讲温州话！(是溫州話嗎？)

旅遊人生 討論區 (粵語)

旅遊人生討論區 (粵語)