|
|
|
Peak Traffic Solution (Facebook Programming Puzzle)
The Peak Traffic puzzle is easy to interpret. It is a graph problem to find all maximal cliques. Implementating it, like many graph problems, is quite difficult though. The possible combination is huge. And there is no simple ordering graph subset. It end up costed me several days to do it. Here is the outline of my solution:
- Parse the input. Generate an undirected graph.
- Generate a list of cliques using recursive algorithm.
- Eliminate subsets from the result to find all maximal cliques.
After working on it for several days, I'm quite done. I haven't spent any more effort to optimize it and expain it. So here is my un-tidied code.
2010.03.03 [tech] - comments (0)
Refrigerator Madness solution (Facebook programming puzzle)
Refrigerator Madness is the second Facebook programming puzzle I have tried. The problem description is complicated. I won't try to repeat it here but to suggest you to refer to the original facebook page.
To solve this problem it is necessary to see through the smokescreen. Instead of thinking how much Red Bull a Facebook engineer likes to drink, it is easier to map it to a problem known as Two-Sided Matching. A model with intuitive meaning is a marriage matchup between a group of man and woman. Each man has his list of preferred woman and each woman has her list of preferred man. The marriage is considered stable if there are no two people of opposite sex who were not married to each other but would both prefer to be.
For the Refrigerator Madness problem just map the engineer from the upper 50th percentile to "man" and lower 50th percentile to "woman" (or vice versa). The matching condition is equivalent to the stable marriage criteria.
At this point the problem can be solved by applying the Gale-Shapley algorithm. Write the code and it's done!
2010.02.22 [tech] - comments (5)
smallworld solution (Facebook programming puzzle)
Facebook has published a number of programming puzzles to challenge software developers. Their problems are not trivial at all. This is my attempt on "Snack" level problem It's A Small World. Even this one take me sometimes to solve.
The idea is that there are n points on a 2-dimensional plane. For every point, we have to find the 3 nearest neighbors. The naive solution simply match all possible pairs, resulting in O(n2) complexity. The problem requires us to find a better solution than O(n2).
My idea is to divide the plane into a regular grid like below. For example, to find the neighbor of point A, we can focus on cell 2 in the center top and pretty much ignore the points in cell 9 in the lower right. This works very well if the points are randomly distributed. But if many points are clustered together like cell 9, it will work less well. We can recursively partition it into smaller region. But the basic plan should be suffice for this puzzle. The subdivision is a very simple O(n) process.
To search for the nearest neighbor, we start from points from the same cell. If this doesn't find the three nearest neighbors we extend into the neighboring cells. The pseudo code is shown below:
result = None
for step in 0...n
cells = find cells "step" away from the source
min_distance = minimum distance from source to the boundary of the cells
for all point in cells
insert distance(source, point) into result
result = 3 closest points in result
if there exists 3 points in result those distance < min_distance
return result
end for
Of course I'm not the first one to consider these algorithm. Wikipedia has some information on the Nearest neighbor search problem. My solution code is available here. (2010-02-21 Facebook's puzzle server is down. So the solution has not been validated.)
I have decided to put the algorithm into good use. I used it to generate a map of all sizable cities in the world and their nearest neighbors. With 20,000 cities a naive algorithm will need 400 million pairing. Using this algorithm only 10 million pairing are need. It is far from optimal, but good enough to handle Facebook's problem. Also I am curious to use it to find the loneliest city in the world, that is a city those nearest neighbor is the furthest away. Check the map to find out which city is the loneliest city in the world!
2010.02.21 [tech] - comments (1)
How Google Screw Up in China, the Missing Story
Last month the IT world is jolted by the news of a security breach of Google's system by Chinese hackers and the subsequence response by Google threatening to pull out their China operation altogether. Many in the west see this as a courageous resistance to China's Internet censorship and a righteous response to the assault on some human right activists. Other observers noted that Google's market share in China is trailing far behind the local competitor, that they may using this incident to give themselves a graceful excuse to exit.
All of them could have learned something from anthropologist Tricia Wang's insightful observation of Google's uses or non-use in China. Tricia is a research scholar working in China to study how youths and migrants are using ICT to manage their inter-personal communication networks. She come up with her observation from the angle of grassroot youngsters.
Among her observations are
- Google is mostly irrelevant to Chinese users. While elite, educated Chinese depend on Google as much as others do, young people has a very different usage pattern. They mostly use Internet with IM or mobile devices to connect to their friends. They have never found Google to be useful.
- Google has a huge branding issue in China. Most Chinese cannot spell the word 'Google'! I was amused to find that Google is known by its colloquial name GouGou (狗狗doggy) more than its official name Gu-Ge (谷歌). Here in the west, especially within the IT industry, Google's brand name stands for leadership in innovation and intelligent services. There in the back street of China, Google's name is associated with the lowly doggy. What a satire!
Tricia has an excellent piece of analysis that should be on most reader's radar screen.
2010.02.11 [tech] - comments (0)
Mac Mini Wireless Network Problem
Last month I've upgraded my first generation Mac Mini to Intel Core Duo. I have been bother by its wireless network issues ever since. It is a late 2009 model with Snow Leopard. It is connected to a Linksys WRT110 router. The problem is its poor reliability. Packets are dropped frequently. Sometimes it disconnects from the access point altogether. This means a very frustrating experience for some applications. VNC session are dropped frequently to the point it is unusable. Directory browsing and file transfer with Samba is a hit or miss. It unreliable and often incurs lengthy delay.
It seems mine is not an isolated problem. Wireless issue is a frequent topic on Apple support forum. I find many message like this one:
Topic : New Mac Mini Wireless Internet - Slow, Disconnects frequently
I use ping to test its connectivity. Pinging the gateway from the Mac gets result like below. There are plenty of missing packets.
64 bytes from 192.168.1.1: icmp_seq=7 ttl=64 time=1.219 ms
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=3.698 ms
64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=3.862 ms
Request timeout for icmp_seq 10
Request timeout for icmp_seq 11
64 bytes from 192.168.1.1: icmp_seq=12 ttl=64 time=3.671 ms
64 bytes from 192.168.1.1: icmp_seq=13 ttl=64 time=3.766 ms
Request timeout for icmp_seq 14
64 bytes from 192.168.1.1: icmp_seq=15 ttl=64 time=3.997 ms
64 bytes from 192.168.1.1: icmp_seq=16 ttl=64 time=5.394 ms
Pinging the Mac from a PC, both are connected wirelessly, is even worst. Besides loss packets, the response time is pathetic. Sometimes it takes hundreds of millisecond.
Reply from 192.168.1.120: bytes=32 time=609ms TTL=64
Reply from 192.168.1.120: bytes=32 time=632ms TTL=64
Reply from 192.168.1.120: bytes=32 time=655ms TTL=64
Request timed out.
Reply from 192.168.1.120: bytes=32 time=97ms TTL=64
Reply from 192.168.1.120: bytes=32 time=1ms TTL=64
Reply from 192.168.1.120: bytes=32 time=93ms TTL=64
Request timed out.
Reply from 192.168.1.120: bytes=32 time=640ms TTL=64
Request timed out.
Note that this is not a wireless signal problem. I have moved the access point to only 15 feet away from the Mac with a direct line of sight. The test does marginally better at best. Besides, none of other wireless devices on the network have any connectivity problem.
After several false start using different suggestions from the Internet, I have found a config that dramatically improve the Mac's connectivity. I set the router to use Wireless-G only. The default of Linksys WRT110 is in "Mixed" mode. It includes some proprietary Linksys enhancement that none of my device can take advantage anyway. And it is not even Wireless-N. Setting it to Wireless-G only improve the Mac's connectivity with no loss of performance for other devices.
2010.01.26 [tech] - comments (0)
Post RDBMS database
The title of this precentation Data and Capital Markets in Money:Tech 2008 is not very obvious. It is actually an excellent presentation on the state-of-art of database design. Dr. Michael Stonebraker is behind a series of database innovation for several decades from INGRES and Postgres up to his current work on StreamBase Systems. In this talk he has criticized the traditional RDBMS as something woefully outdated. He has presented several new directions in database design that he think will soon upstage the current generation of RDBMS systems.
2009.09.28 [tech] - comments (0)
Exploring parallel programming
When I first started blogging, I have a lot of geeky posting about software development. The idea of blogging is to write on short subjects frequently. It is different from writing a long article, which is so burdensome many people end up not doing it at all. But in practice, I have not lived up to expectation of a prolific blogger. Sometimes the gap between postings can be a few months.
Anyway this is the quick posting on geeky subject that I think I should do more. Recently I am working on high performance computing and number crunching task. I have dipped into writing C extension for Python for performance. I got spectacular result from the exercise. It is the best of both world. Programming in Python supplement by C is still very agile and iterative. On the other hand C is giving the performance unachievable with pure Python.
I am started to look into the multiprocessing module introduced in Python 2.6. It is a library to run multiple python processes. With an interface similar to the threading module, it is positioning itself as an alternative approach to threading to circumvent the limitation of Python's GIL. I haven't give much thought to it initially. But looking closer, I find it is really more than just an alternative to threading. It is really a parallel programming framework. For example, the method Pool.map() is really a prototype of a mapreduce framework. I am quite excited to explore this module in the coming days.
Another module I am looking at is the ctypes module. It allows Python code to call C functions directly. Again I haven't thought much about it when it was introduced in Python 2.5. That's because I haven't have a use case of it. Now that I am writing C module, I begin to realize how revolutionary it is. C module delivery great performance. But writing one is a pain. Just the need to keep track of reference count may double the number of lines of your code. For me I am not building a reusable module for other people. It is one off function for me to do number crunching. Using ctypes may allow me easier access to C without the burden of actually writing a Python module.
2009.07.12 [python, tech] - comments (0)
Did My Virus Checker Kill My Computer?
One day my wife’s laptop won’t boot up all of a sudden. I rolled up my sleeves trying to fix it. This is a Dell Inspiron 710m with Windows XP. It turned out to be one of the toughest PC problems I have encountered in years. I have spent many hours using many different ways to repair it. I am still not sure what broke it in the first place. But the virus checker is my prime suspect. Did it kill my computer?
The first problem is Windows XP boots up to a blue screen saying “This application has failed to start because USER32.dll was not found”. First of all I used the BIOS diagnostic program to quickly check that the system do not have hardware problem. Then I found an Internet posting suggesting the virus checker may have make a huge mistake to consider the system file user32.dll as virus infected and has removed it, causing the system to become unusable.
This looks like a good lead. Sure enough I found user32.dll is missing from the \Windows\system32 directory. First of all I used an Ubuntu 8.1 CD to boot up the system. Then I mount the Windows NTFS with a command like:
sudo mount -t ntfs-3g -o nls=utf8,umask=000 /dev/sda2 /media/c
This posting shows more information about mounting a Windows drive on Linux. Then I just found the backup copy of user32.dll from the dllcache and copied it to \Windows\system32. This simple step solved the problem. My system was able to boot up.
Nevertheless this is only the start. The laptop wireless was off after reboot. Its LED indicator was off all the time. I tried various ways to turn it on without success. Then I come across a useful support page from Dell about How To Restore or Reinstall Microsoft® Windows® on a Dell™ Computer. It provides instruction to three different ways to restore the computer. The simplest way is to use the System Restore function provided by Windows itself. The system automatically backup its configuration regularly. I pick up a configuration saved before this instance happens and restored it. Bingo, the wireless LED light just comes back after reboot!
Unfortunately, the network was still not usable. In fact, even if I plugged in a network cable, it still cannot ping any external hostname. Furthermore I cannot ping this laptop's IP address from another computer. The only sign of life was the laptop can ping an external host if I give it the IP address. I make sure the Windows firewall if turned off. This still did not help. Then I found out the virus checker program also have a firewall function itself. I shut it down also. Bingo, then the network works! It sounds like the firewall was 100% effective in protecting the computer simply by disconnecting it from the outside world entirely.
At this point, the laptop is basically functional. But I still see some programs fail with a message saying that it maybe infected by virus. I think it is necessary to take a more drastic step. I used the second method in the Dell support page, the PC Restore function. It re-images the hard disk to restore the factory image. This necessitates backing all the data files because they will be wiped out in the process. But at least it can start from a clean slate without any residue problem.
So I have re-imaged the hard disk and restore the data files. Then I have to uninstall tons of useless programs the come pre-installed on a Dell PC. I also make sure I have uninstalled the hopelessly outdated virus checker Trend Micro ("PC-cillin"), which I suspect has caused me several days of misery.
2009.04.17 [tech] - comments (0)
Blog maintenance
When I first started creating this website and my blog, I wrote most of the software on top of Apache myself. I want this to be a learning experience as a webmaster. A very good learning experience it is, I did not have to put in too much time as the software is rather simple. But overtime, things need touch up here and there. Some reorganization is needed to make it friendly to search engine. The layout and presentation also need some fine-tuning. The trouble of building my own website is these maintenance task tend to be deferred and piled up.
[more...]
2007.09.06 [tech] - comments (0)
Please, what year?
Web discussion forums is often a place to find useful information. Search engine
make this valuable archives only one query away from us. However, I am surprised
at how many forum software still present their information badly. For example, I
found this from some slashdot derived forum:
[more...]
2006.11.25 [tech] - comments (0)
The Future of Web Apps - Day 2
-
Mike Arrington of TechCrunch started the second day by giving his take on startup winners and losers. He defines winners as who accomplished a liquidation event like IPO or acquisition. But look carefully at his list, which includes companies like myspace and skype, all of them got acquired. There was no IPO!
Further he think this list is hot for new startups:
[more...]
2006.09.15 [tech, futurewebapps-sf06] - comments (0)
The Future of Web Apps - Day 1
I was excited to be at this year's the Future of Web Apps summit. This two days conference has gathered many great speakers to look at some of the web's most successful sites and applications and practical advices on creating your own web app.
Here are some highlights from the first day.
Dick Hardt from Sxip take about identity 2.0. He's points about identity is based on past history, which in turn predict future behavior is well taken. He later demonstrated logging on to websites using different protocol such as OpenID, i-names and Infocard. Since all discussions remained at high level, it is not obvious to the uninitiated to understand what is the magic behind these new technology.
[more...]
2006.09.13 [tech, futurewebapps-sf06] - comments (0)
A hole in the wall helps educate India
Coming from a culture where we associate news with accidents, violent crimes and disasters, I have came out and start to question our perverse obsession on negative news. Not all news agencies are created equal though. The Christian Science Monitor, for example, put a lot of focuses on education. Here is one good piece of story about an experiment in India where they put
free computer stand in slums and found kids made good use of them. Surprising they found that with no supervision or instruction
the children "download and play audio and video, send and receive e-mail, chat, and so on," he says. They quickly move on to learn some English from English-language websites, read Indian newspapers, and even "look for jobs for their fathers,"
And it does not end there.
Many have changed their goals from say, rickshaw driver to engineer, and most now want to go to college.
Seeing this kind of heart warming notes really light up my day.
Thinking more about this, it is not all that surprising that kids can teach themselves about computer without any instructions. Afterall I learned much about computer and programming by myself before even set foot in the college, and all these without really spending much in gadgetry.
2006.06.01 [education, tech] - comments (0)
The Long Time Tail
I am a little behind in blogging. Last week I have went a few
separate but all very interesting events. Last Friday was the Long Now Foundation's seminar The Long Time
Tail by Chris Anderson with Will Hearst. Chris has introduced an
influential concept of long
tail. This time he has extended the concept in the time dimension.
The sales of an item would decline over time. Traditionally an item
would become unavailable or go out of print because it becomes
uneconomical to continues to sell it. But in the world of unlimited
virtual inventory with a searchable, discoverable front end, Chris
discovered that the demand, though diminished, would extend well into
the future. This is the same as the original long tail framework. And it
applies very well to the time dimension.
The business implication? A content archive has great and untapped
economic value. The implication to the consumers? We will enjoy a much
wider selection of quality works. And we will be less subject to
marketing hype that tries to push hits to make money in a flash of time.
Chris' presentation was full of studies of different markets. One
study of Netflix users says the customers are a lot more satisfy with
old movies than new movies. There is a logical explanation for this.
Good movies stand the test of time. I for one are very glad to find the
old collections available on the market again. For some time old Hong
Kong movies have been release in VCD format, a lower quality, lower cost
predecessor to DVD. Despite the lack of a searchable online front end
and generally available only in a uncategorized, flea market like retail
outlets, I still manage to amass a good collection great movies over
time. I have fond memory of many old movies I have seen one or two times
when they were released years ago. It is great to be able to enjoy them
again many years later. The old releases also reportedly turning a good
profit. Yet another confirmation that there is a demand of the long
tail.
2006.05.19 [tech, economics] - comments (0)
Book Market As Technology Trend Indicator
Tim O'Reilly posted some
analysis and visualizations of book sales data on his website. It
analyzed book sales trend in several dimensions. Java is shown on a
steady downward trend. On the other hand C# is clearly gaining share.
The idea is the state of book market would be a technology trend
indicator. Of course this is more likely to be a trailing indicator. Idea has
to be mature enough before it can turned into a book. And it would takes
months for it to be published and adopted in the mass market.
Nevertheless it gives a good picture what people are really buying into.
2006.04.20 [tech] - comments (0)
Live Clipboard Demo
Ray Ozzie made an extraordinary demo of Live
Clipboard in ETech. He illustrated a very simple concept to enable
users to pass data between web applications and the PC, by using the
clipboard! All this involves only defining the live clipboard data
format and simple enhancement to the applications.
He reminded us the clipboard was a great GUI innovation that allows
transfer of data between multiple applications. This is also a glimpse
of the power of the semantic web.
2006.03.09 [tech] - comments (0)
Book Review: Practical Internet Groupware
Practical Internet Groupware
by Jon Udell
In his book "Practical Internet Groupware", former BYTE magazine
editor Jon Udell layout an architecture that links human minds into
collaborative relationships. Base on his actual experience in building
BYTE's intranet as well as the magazine's public online services, he
gave his insight on the powerful use of Internet.
Among the many IT books I have read, this book stand out as sublime,
even avant-garde. Got a question? Search the Internet, send a follow up
email to folks you have never met. That's something many of us have
probably done without much thinking. Yet Jon would step back and reflect
on the dynamic that had happened. An ad-hoc workgroup was formed between
him and several person on one particular task. The collaboration was
unbounded by time, geography or corporate affiliation. He strived to
grasp the subtle interactions and to facilitate this flow of information
on the Internet.
People are lazy and do not like to learn or adapt to complex rules
impose by computer systems. On the other hand simple rules and clever UI
tweak can often make interactions spontaneous and effective. Use an
appropriate subject for a message is one good example. The author
discussed one of the oldest groupware on the Internet, the Usenet
newsgroups. He termed it conferencing and explained why it is a better
channel for some kind of interactions compare to email. Many of us who
get caught in lengthy email debate would be delighted to know there are
more effective way to conduct this kind of discussion. Indeed a seamless
integration of web, email, newsgroup and a searchable document database
are the components that make a formidable groupware application.
Unlike most IT books, he did not focus on any single platform,
computer language or a technology. Whether it is a tool from Microsoft
or its competitors, a freeware or a commercial product, he would use it
if he see fits. Throughout the book are short, unglamorous, but
nevertheless working code samples. Given I read this 6 years after its
1999 publishing date, many of the code or specific technologies are
already considered obsoleted. Yet the insight that stem from these early
system are just as relevant today. Think just what is the core component
of web 2.0 technology? User participation!
Perhaps nothing reveal more about this book than its front cover. The
'practical' in the 'Practical Internet Groupware' means everything is
derived from actual experience and real code rather than a theoretical
discussion. Yet it is in small print while the 'Internet Groupware' is
emphasize in the banner. That's because the code and the actual systems
are just starting points that spawn the exploration of threads that link
people into collaborative relationship. This is an immensely powerful
Internet application we have yet to master.
2006.02.26 [book, tech] - comments (0)
Wikipedia is a Long Tail Business
I hit upon an entry of my family name Tung 董 in
Wikipedia yesterday. What I saw doesn't delight me. It looks like a
mischievous teenage has put his twisted bio into the entry. I went
in to clean it out and added Tung Chee Hwa as
the sole representative of the Tung clan for now. Although it has only 3
lines, this is my first original content contributed to Wikipedia. And
a little something I have done for my family name. Furthermore I
am delight to find the prankish entry has only been up for 10 days
before I shot it down.
Wikipedia is frequently looked as a rival to traditional institutions
like Britannica. People like to pick out bad entries from Wikipedia and
complain how professionalism is being overran by amateurishness. I of
course have many counter arguments. But today I have
realized something more. If encyclopedia is to be a most comprehensive
reference of knowledge of all kind, then it is a long tail business! I
don't expect anyone would care enough to put an entry of my family name
into an encyclopedia (I have been generous to call Tung "a common
Chinese family name"). There is going to be many many knowledge
important only to a small group of people and few others. In the age of
information explosion Britannica would have a hard time to hire enough
experts to write about each and everything important and remain
economically viable. Just 4 years since inception Wikipedia already
claim more articles than Britannica. It is really no accident.
2005.11.11 [tech] - comments (0)
Open Source Development Platform
As a software developer I am a strong advocate of open source
software. They are used extensively both at my work and for my private
projects. In retrospect, open source platform, often referred to as
LAMP, has long past the stage being just an useful extension to
proprietary software. It has become my dominant development platform. If
I were to build a server application today, it would make very little
sense for me to consider Microsoft servers. Why choose a framework that
leave you with a single tools vendor. LAMP has proven to be
technical viable, cost nothing to experiment and distribute, and more
importantly I trust them because of its openness. Nowadays I need
little justification to pick LAMP over Microsoft.
What a big leap from just a few years ago when it looks like
Microsoft is going to take over the world.
2005.11.01 [software, tech] - comments (0)
Low cost startup
Reuters reports low-cost
computers and open source software making it cost less to invest in
a startup. I always think open source is an under appreciated
revolution. Not only is it redefining laws of economics and inspires
people to collaborate, now it also form the basis of a low-cost
computing platform and sprawl a new wave of innovations.
2005.04.03 [tech, business] - comments (0)
past articles »
|
|
|
|
|
|
BBC News
|
|
more »
|
|
|
|
Slashdot News for nerds, stuff that matters
|
|
more »
|
|
|
|
TechPsychic Tech Rumors and Invented News
|
|
more »
|
|
|
|
SF Gate
|
|
more »
|
|
|
|
Asia Times Online
|
|
more »
|
|
|
|