tungwaiyip.info

home

about me

links

my software

Media

Yucatán Photos

St Lucia Photos

Photo Album

Videos

Blog

< October 2012 >
SuMoTuWeThFrSa
  1 2 3 4 5 6
7 8 910111213
14151617181920
21222324252627
28293031   

past articles »

Click for San Francisco, California Forecast

San Francisco, USA

 

Learning R and Octave/Matlab

I am engaged in the hazardous activity of learning R and Octave/Matlab at the same time. I am fairly new to both language and I am trying hard to not to confuse one with the other.

The story is I have signed up 3 Coursera online courses at the same time, a seemingly suicidal exercise. The first one Probabilistic Graphical Models, the most difficult Coursera course I've encountered by far. PGM uses Octave so I have to learn from scratch. The next course is Computing for Data Analysis. This is basically a very long R tutorial and programming exercise. It should be fairly easy for programmers. I hope I can use this opportunity to get familiar with R. And it only last 4 weeks. Since the R course looks very manageable I pick up the third course Mathematical Biostatistics Boot Camp. I dismissed it at the beginning because I think I have nothing to do bio-something. Turns out you can drop the bio- prefix and it is just a statistics course. I'm already doing independent study on probability and statistics to reinforce my basis to tackle the PGM course. Since this course cover roughly the same ground I might as well follow along. Interesting both the R and biostatistics course as well as another data analysis course offered next year are from John Hopkins University from the medical domain.

Here is my first impression on R and Octave from the point of view of a programming. Both of them are from an lineage of math and science computing separate from general purpose programming language. So the syntax can be weird. For example R use the character $ when we would use . in other object oriented language. Like R uses "a$b" that simply means "a.b". Instead "." seems to have no special meaning in R and is just a part of the identifier.

The biggest feature of them over other high level languages is the support of high level data type like vector, matrix and data frame. Octave has literal to construct matrix easily. You can also access the subset and elements of the matrix using powerful indexing functions and syntax. These high level data structure introduce a whole new set of capabilities like slicing, projection, grouping and vectorized functions, etc.

The downside is they are not necessary good general purpose language. It may frustrate you to find things you can do easily in regular language now requires new learning and a lot of trial and error. There are a lot of idiosyncrasies in the language that you have to understand. And frankly if the language is better designed there would be less of these issues to deal with.

R is especially laden with convenient shortcuts that is not well designed. I think of it as PHP of data analysis language. It is created to address practical need without too much concern of good programming language design. For example there is a family of function `x-apply` that maps a function to a list. `lapply` is similar to map() in Python. `sapply` is same as `lapply` but it simplify the result by turning list of 1 element items into just a vector. `tapply` is a more powerful variation of `lapply`. It could benefit from some result simplification too. But instead of having a `stapply` function, you can use a simplified=True argument to `tapply` to achieve the same result. How about having a separate `simplify` function that you can use on both x-apply function so that you don't have an explosion of options and alternatives? R seems to have a culture to provide convenient functions and apply coercion to make things work, but result in irregular and non-transparent magical operations.

Finally after struggle a lot on the Octave exercise just to get basic things works, I designed to re-implement the solution in Python and numpy, an environment that I'm more familiar with. I think I can learn more by focusing on the task rather than learning the rope of a new language. Although numpy have many of the same capability of Octave and R, this exercise makes me aware of the intrinsic value they provides. Numpy is a library build on top of Python. As such it has no literal to build vector and matrix, which are native data type in Octave. You have to use numpy.ndarray to build matrix. The [] builds regular Python list.

These are some of my observation in week 1. There are about 10 more weeks to go.

2012.10.02 comments

 

 

blog comments powered by Disqus

past articles »

 

Kontagent

Kontagent is hiring software engineers

BBC News

 

Afghan Taliban attack central Kabul (24 May 2013)

 

Two held after plane alert over UK (24 May 2013)

 

UN hails 'historic' Obama drone vow (24 May 2013)

 

Woolwich: Security services defended (24 May 2013)

 

Syria 'to attend peace conference' (24 May 2013)

 

India soldiers die in Kashmir ambush (24 May 2013)

 

US road bridge falls into river (24 May 2013)

 

'Comfort women' snub Japan mayor (24 May 2013)

 

Cockroaches evolving to evade traps (24 May 2013)

 

'Poor response' to Iraq abuse claims (24 May 2013)

more »

 

Slashdot News for nerds, stuff that matters

 

Ask Slashdot: When Is the User Experience Too Good? (2013-05-24T16:23:00Z)

 

UC Berkeley Group Working On Creating Inexpensive 3-D Printer Materials (2013-05-24T15:40:00Z)

 

FiOS User Finds Limit of 'Unlimited' Data Plan: 77 TB/Month (2013-05-24T14:58:00Z)

 

Xbox One Used Game Policy Leaks: Publishers Get a Cut of Sale (2013-05-24T14:16:00Z)

 

Possible Collision Between Cube-satellite and Old Space Junk (2013-05-24T13:33:00Z)

 

AT&T Quietly Adds Charges To All Contract Cell Plans (2013-05-24T12:50:00Z)

 

Drupalcon Attendees Come Together To Build Help4ok.org In 24 Hours (2013-05-24T12:08:00Z)

 

Twitter's New Money-Making Plan: Lead Generation (2013-05-24T09:28:00Z)

more »

 

TechPsychic Tech Rumors and Invented News

more »

 

SF Gate

 

Bay Area News (7 Jan 2012)

 

City Insider (11 Feb 2012)

 

Crime Scene (13 Feb 2012)

 

C.W Newius Column (10 Jan 2012)

 

C.W. Nevius Blog (11 Feb 2012)

 

Education News (10 Jan 2012)

 

KALW (11 Feb 2012)

 

Matier and Ross Blog (11 Feb 2012)

 

Tesla a success for federal loan program (22 May 2013)

 

The best personal fitness gadgets (20 May 2013)

 

State unemployment rate drops to 9 percent (17 May 2013)

 

EDD cuts back phone help to the jobless (17 May 2013)

 

Bay Area median home price hits ,000 (16 May 2013)

 

Best Bluetooth accessories (13 May 2013)

more »

 

Asia Times Online

 

China's reform hands fail to clap (24 May 2013)

 

Western hypocrisy over Chinese nukes (24 May 2013)

 

Neo-Nazi denial in Myanmar (24 May 2013)

 

Obama narrows scope of war on terror (24 May 2013)

 

America's truth-seeking drone program (24 May 2013)

 

Neighbors eye Sharif with caution (24 May 2013)

 

What's a disqualified candidate to do? (24 May 2013)

 

Turkey puts a new paradigm in play (24 May 2013)

 

Indian growth model unsustainable at best (24 May 2013)

 

German savings, crisis in Europe, and China (24 May 2013)

more »

 


Site feed Updated: 2013-May-24 10:00