tungwaiyip.info

home

about me

links

Blog

< May 2010 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     

past articles »

Click for San Francisco, California Forecast

San Francisco, USA

 

Python CSV reader is much faster than pickle

If you are considering to serialize a large amount of data to the disk, performance may become a concern to you. Python provides a serialization tool in the pickle module. There is also an optimized version called the cPickle. But how do they perform?

The data of concern to me is tabular data. In order to do a bake off, I have generated 50,000 records of sample data. The CSV representation is shown below:

seq, name, address, city, age, birthday
1000,John M. Doe,2147 Main St.,Middle Town 14,47,1985-05-15
1001,John N. Doe,2148 Main St.,Middle Town 15,48,1985-05-16
1002,John O. Doe,2149 Main St.,Middle Town 16,49,1985-05-17
1003,John P. Doe,2150 Main St.,Middle Town 17,50,1985-05-18
1004,John Q. Doe,2151 Main St.,Middle Town 18,51,1985-05-19
1005,John R. Doe,2152 Main St.,Middle Town 19,52,1985-05-20
1006,John S. Doe,2153 Main St.,Middle Town 20,53,1985-05-21
1007,John T. Doe,211 Main St.,Middle Town 21,1,1985-05-22
...

Naturally, CSV is a contender for storing tabular data. (Indeed the data source I'm working with is in CSV format.) The two pickle modules produce identical data output. In addition, Python 2.6 also provides a JSON module that do the similar task as pickle but outputs a standard text based format. I included it in the comparison below.

First observation, CSV output the most compact data at 3MB. Pickle output is 40% larger at 4.2MB. JSON is somewhere in between. The speed? CSV is the winner among them all.


Method Load Time (ms) File size (MB)
CSV 188 3
CSV int 289 3
cPickle 692 4.2
pickle 1,815 4.2
JSON 4,975 3.9

Note that CSV reader create data items as string. In the sample data, two out of the six columns are integer fields. In order to do an apple-to-apple comparison I have another test that do integer conversion after loading such that the data loaded is identical to pickle's. This impacted the performance somewhat. But it is still more than twice as fast as the faster cPickle module. The standard library's JSON's performance trailing far behind, making it unsuitable for anything performance intensive. FYI, unlike the other modules, JSON's output is in unicode.

The test is done by Python 2.6 on Windows XP machine with 2.33GHz Core2 CPU (Download source code).


2010.05.12 [, ] - comments

 

 

blog comments powered by Disqus

past articles »

 

BBC News

 

Trump and Putin held another, undisclosed meeting at G20 (19 Jul 2017)

 

Justine Damond: Australian PM calls shooting 'inexplicable' (19 Jul 2017)

 

Let Obamacare fail - Trump's new plan (18 Jul 2017)

 

First child to undergo double hand transplant can now do this (18 Jul 2017)

 

Despacito breaks global streaming record (19 Jul 2017)

 

Doris Payne, 86-year-old jewel thief, arrested again (19 Jul 2017)

 

How did dogs become our best friends? New evidence (18 Jul 2017)

 

Briton and Italian die in beach rescue near Brindisi (18 Jul 2017)

 

Is WhatsApp being censored in China? (19 Jul 2017)

 

Antonio Conte: Chelsea manager signs improved two-year deal at champions (18 Jul 2017)

more »

 

SF Gate

 

Bay Area News (7 Jan 2012)

 

City Insider (11 Feb 2012)

 

Crime Scene (13 Feb 2012)

 

C.W Newius Column (10 Jan 2012)

 

C.W. Nevius Blog (11 Feb 2012)

 

Education News (10 Jan 2012)

 

KALW (11 Feb 2012)

 

Matier and Ross Blog (11 Feb 2012)

 

Cap-and-trade survives razor-thin votes in California Legislature (18 Jul 2017)

 

Ann Coulter battles Delta; Gillette goofs; Titleist sues (17 Jul 2017)

 

Business News Roundup, July 18 (17 Jul 2017)

 

Kitchen gadget maker Juicero feels squeeze, cuts 25% of staff (17 Jul 2017)

 

Judge limits Google pay, job data in Labor Department audit (17 Jul 2017)

 

Penguin pulls book by Doerr, tech CEO amid Duggan harassment suit (17 Jul 2017)

more »

 


Site feed Updated: 2017-Jul-19 00:00