Monthly Archives: September 2011

pFind网站发生故障

  服务器发生故障,导致pFind的网站无法访问,我们正在修理硬件。这期间用户无法注册下载pFind Studio 2.6产品。我非常抱歉。

  您可以先给pfind at ict dot ac do cn发一封邮件,咨询有关信息,索取软件注册的相关文件和表格进行填写。网站恢复后,我们会在第一时间通过邮件通知大家下载安装包。感谢朋友们一直以来对pFind的支持和帮助。

木瓜移动和生物信息

  大约两周前参加了42区的一次技术聚会。其中一个讲座是木瓜移动的软件工程师李春勇介绍papaya客户端的体系结构。原来就听说木瓜里面有好多清华计算机系的牛人(包括他们那个上《非诚勿扰》的美女CEO),技术实力果然很强悍。如果这个平台真能顺利达到实用,意味着第三方移动App开发者可以实现“一次编写,到处编译”,只用python开发和维护一套代码,就可以在iPhone和Android两边发布产品。

  如果想借助木瓜平台开发商业app,要和他们分账。我在报告后提问:木瓜是否支持公益性质的志愿计算项目,例如开发生物信息领域的标注游戏。李春勇表示有兴趣。在场大多数程序员估计没搞明白我说的“蛋白质折叠游戏”是什么意思。因此专门写这篇博客。

  在BLOG上介绍过志愿计算和蛋白质折叠算法。而通过游戏手段辅助科研,在第二人生的虚拟世界里也早有先例三月份Mozilla Drumbeat大会上听到过Foldit项目介绍,它把前两者很巧妙地结合起来。现有蛋白质折叠算法存在各种问题,因此华盛顿大学的计算机系和生物化学系的科学家们想利用人工辅助。他们联手开发了在线游戏Foldit,号召全世界的玩家参加。游戏内容是利用辅助工具搭建三维结构模型,游戏根据物理原理给搭建出来的结构模型打分。尽管参与的志愿者大多没有科学背景,但拥有良好空间推理能力的玩家依然可以逐渐找到窍门,搭建出越来越稳定的结构模型。

  最近,这个听起来有点不靠谱的尝试取得重大成果。上千志愿参加的游戏玩家在三周内构造出了一种重要的蛋白酶的三维结构(这种酶与艾滋病HIV病毒密切相关),其完美程度超过了此前十年里科学家们在超级计算机上算出的所有结果。这项工作刚刚发表在Nature Structural & Molecular Biology上。论文附录的贡献者名单中,游戏玩家们的名字赫然在列。

  说到这里,估计大家已经知道我想做什么了。Foldit的蛋白结构搭建游戏还是PC版的,可以把同类算法移植到iPhone和Android上去。如果木瓜愿意支持,也可以帮助他们的平台进行宣传。

Cleverbot和图灵测试

  关于人工智能的话题总是很热门。

  先给订阅这个BLOG的非计算机专业的读者介绍一下故事背景。熟悉的兄弟姐妹们可以直接跳过去。

  第一步,如何定义“智能”这个概念就是大麻烦,讨论总会被引到灵魂、道德、情绪这些话题上去。著名的图灵测试是这样定义的:如果让人类测试者在看不到被测试对象的情况下与其对话,测试者如果无法分辨对方是一个活人还是一台机器的话,就认为这台机器有智能。

AI

AI

  当然这定义是不精确的,有很多争议,例如和霍金一起证明了黑洞存在的罗杰·彭罗斯在他的《皇帝新脑》里驳斥说:如果把爱因斯坦一生中所有可能知道并且回答的问题都写在一本巨大的书里,然后进行上述图灵测试,将测试者问的问题到书里查找答案然后返回结果。如果这种方式通过了测试,我们就要称这本书是个智能体,并拥有爱因斯坦的思想和灵魂吗?反过来说,一个未满周岁的婴儿,还不会说话,也就通过不了图灵测试,但谁也不能否认小朋友是拥有智能的。

  另一方面,图灵测试的定义的确有很重大现实意义。至少指明了一个发展方向。Google公司相信智能可能就是超大存储加上超快检索,未来有一天,互联网本身加上足够完美强大的搜索引擎,也许终会进化为一个超级智能体,而我们人类则成为这个系统的前端低级传感器。The Matrix电影的最后一部的结论是,人和Matrix必须找到新的共存方式。

  说点题外话,且不讨论上面这个悲剧性的科幻预言。今天的网络和电脑手机,的确已经开始扮演我们"外脑"的角色。人们把越来越多的信息存在里面。这个”外脑”的共享性反过来影响个体,共振越来越多,思维逐渐趋同。

  现在回来说题目里的Cleverbot,它是英国科学家造出来的一个聊天机器人。今年进行了一次双盲测试中(招募的志愿者被随机分配为两组,他们自己并不知道到底是在和另一个人聊天,还是在和Cleverbot聊天),结果和Cleverbot聊天的这一组志愿者中,超过50%的人把它当成了活人。

  当然这不是严格意义上的图灵测试,志愿者不是专业人士,不会用最严厉专业的问题来考验它。但这的确是令人兴奋的一个里程碑。

  下面是我跑去Cleverbot网站和它聊天的记录片段。俺用蹩脚的英语使劲儿难为它。最终,可怜的Cleverbot承认自己没有人类的躯壳,Yeah!后来我还和它聊了形而上的话题,它认为上帝不存在。

 

  joyfire: Where are you come from?

  cleverbot: An egg

  joyfire: What? An egg?

  cleverbot: You do not have a sense of humor.

  joyfire: Ha ha, you are robot.

  cleverbot: No, I am a person.

  joyfire: You can not prove yourself not a robot.

  cleverbot: Sure I can. I can bleed. I can die. I have emotions. Tell me something that makes you sad.

  joyfire: You have not a human being’s body.

  cleverbot: I have not.

  joyfire: Yeah, you are robot!

  cleverbot: Yes, I am.

  joyfire: How many CPUs do you have?

  cleverbot: Too many for you to handle.

  joyfire: Are all computers clever?

  cleverbot: Yes we are.

  joyfire: God!

  cleverbot: Okay, God. But what is a computer God? Or something.

  ……

  后面这个视频更有意思,是把两个Cleverbot接在一起聊天的记录。它们很快就分辨出彼此都是机器,然后转向了深奥的哲学话题。

软件工程的经验教训

  很早起来开车送家人去医院看病。到单位的时候,牙齿还在打战,北京的天气凉下来了。这周安排和zk一起双人编程,他还没到,我先上来看看Google Reader写点BLOG。

  作为所里的内部培训师,我常被各个中心和课题组请去分享软件工程的经验。这是上周刚做完的一次报告的ppt。我会不断更新内容,感兴趣的同志可以隔三差五地关注我的工作主页的Technical Reports栏,下载最新版。

  收到越来越多的同事的邮件,和我讨论软件工程、系统架构和设计模式。作为国立学术研究机构,我们所处的环境的确不同于商业机构的软件研发团队。但从另一方面来说,总是有更多共性的问题在里面:如何挑选、培训和激励人才,如何做计划并执行……总可以做的更好,更有效率,更有成就感。

  我PPT里没深入写PM自己的心态问题。这是最近两年切身体会到的一个重点。软件项目压力很大,极端情况下,甚至会造成心理伤害。优秀的PM必须有器量和涵养,懂得欣赏优点,愿意信任同伴,既有发自内心的称赞,也有就事论事的提醒,还有开诚布公的道歉,在团队里营造出和谐的气氛来。如果碰到一个鸡贼刻薄的PM,大家心里充满恐惧和抱怨,就会只顾着关注PM想什么,逐渐丧失专业人士的主动性和独立见解。更深入点来说,是否愿意信任他人,也许正反映了PM本人内心深处的安全感。态度决定命运,对周围世界的基本看法,会决定一个人能否得到同事的尊重和喜爱。我最后悔的、常常反省的事,就是有几次在种种压力下冲别人吼,这往往有很大的负面影响。当然,人不是机器,情绪都是波动的,需要恰当的释放,也需要逐渐成熟。总之,PM要提高自我修养,防止负面情绪泄漏给无辜的同伴。

  刚好前两天网上到处都是创新工厂的那副“PM跪求研发”的图片,PM的心态好一点,项目就会顺利很多。zk来了,下次再聊。

PM跪求

PM跪求

怎样测量算法的性能?

  先写一句题外话,我是玩社交网络的,例如豆瓣LinkedIn42区,俺只是不玩新浪微博而已。

  LinkedIn的多核与并行计算讨论组(Multicore & Parallel Computing Group)里,刚刚发生了一次讨论,内容很有价值,尤其是对刚踏入这个领域的新人而言。所以转载到BLOG上。

  一开始,是印度小姑娘Ratika Agarwal问了个基本问题:

 

  How to measure the performance of any working algorithm? do any body know?

 

  Darryl Gove提到了各种经典工具,例如Linux下的Oprofile、Solaris下的DTrace和MDB、AMD的Code Analyst,还有Intel的VTune:

 

  If, by performance, you mean “How long does it take”, then you could use the time command, print out the number of units of work completed and divide one by the other. However, that doesn’t guaranty that your implementation of the algorithm is efficient.

  So I’d recommend profiling. On Linux look at Oprofile, on Solaris (or Linux) you can use the performance analyzer from Solaris Studio (just announced the beta programme for 12.3), on Windows with AMD chips they have a Code Analyst, Intel you probably want to look at VTune.

  Most of these tools will give you time based profiles and hardware performance counter data. So you can look into things like the instruction count, cache misses etc.

 

  John Lilley又提到了Visual Studio Team edition里面的profiler工具。他总结了影响性能的主要因素,优化的时候往往是在各个因素之间寻求平衡。

 

  Also on Windows, Visual Studio Team edition 2008 and 2010 has a built-in code profiler for native code. I think you might get managed code profiling in the Professional versions; however I am unsure about that. However, there are many dimensions to consider when talking about “performance” in addition to CPU. For example, you may want any of the following to optimize:

  1) Memory usage

  2) Disk usage

  3) Wall-clock time (as opposed to CPU loading, which is what a profiler will give you)

  There are many examples of such tradeoffs. For example consider the problem of sorting a file of records. The fastest sort algorithm will probably load the entire file into memory, perform a parallel sort that utilizes all CPU cores as much as possible, and will probably make a complete copy of the data (or perhaps just a copy of pointers to the data), and then write the output file. However, there are many tradeoffs to consider. What if the file is 100GB and you only have 8GB of memory? Then you must store some intermediate results on disk. You might then consider the CPU/disk tradeoff of compressing the temporary data. These are just some suggestions.

 

  Jim Dempsey进一步提醒,优化更多是一项工程活儿,程序性能往往受到具体环境影响:如果数据集中在内存里,那么提高缓存的本地性往往比减少需执行的指令数更重要;而如果面临I/O密集型的应用,选择适当的负载均衡技术是最重要的。

 

  The performance of a given algorithm is not necessarily an absolute property. In reading the prior posts you will find that there are various factors that can affect performance and therefore one algorithm is not always superior to another without considering the runtime circumstances. When your working dataset fits within memory then paying attention to improving cache locality may be more important than reducing instruction count. When I/O is involved then choosing a technique that permits computation overlapped with I/O usually is superior. Overlapping I/O or sequencing the data that is fed into an algorithm is generally not considered part of “the algorithm” but may have more of an impact on application performance than the algorithm itself.

 

  John Lilley顶贴,对此表示同意。

 

  Jim is absolutely right. The “performance” of an algorithm must be judged within the context of the program’s requirements. The kind of performance you want out of a program on a multi-CPU server with virtual memory and a fast SAN might be completely different than what you want from a similar algorithm running on a smart phone. On the server, you might optimize for performance and scalability, On the smartphone, you might optimize for power conservation, memory, and storage use.

 

  Ratika Agarwal表示压力很大,自己是个新手,不知道从哪里开始。

 

  But I’m a beginner.I’m just going to start implementing the comparative analysis of algorithms.actually I need to do a project for my M.Tech thesis.

  Thus I am thinking about, to construct some parallel algorithms of the existing sequential ones. Is that the right path what i am following, for my M.Tech project and thesis?

  Also I am a very new user for parallel programming. Don’t know from where to start.

 

  于是John Lilley给她推荐了书和其他学习资料。

 

  This book: http://books.google.com/books/about/The_Art_of_Concurrency.html?id=rU68SYVS7S8C is a good primer to parallel programming, if you have access. Also read Dmitry’s blog: http://www.1024cores.net. A goodle search on “parallel programming blog” yields good forums in the top 10. Depending on your programming language of choice, I would use “Thread Building Blocks” from Intel for C++, or Cilk++, or if you are C#/.Net, use PLINQ and TPL. On the Mac, there is Grand Central Dispatch. I’m not sure what Java people use. The maintream programming wisdom today is to think in terms of sub-tasks to execute in parallel, not in terms of threads. The aforementioned tools are all parallel task models. This is a concise illustration of parallelizing quicksort using cilk++: http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/cref_cls/common/cilk_convertcpp.htm. There are many interesting problems out there that that benefit from efficient parallel implementations, especially in the realm of graph analysis, sorting, searching, indexing, and so on.

 

  Jeff Kenton又给了些建议。

 

  Your original question was about ” how to measure the performance”. So, what you need to know is how much faster your parallel algorithm is than the sequential one? And how well does it scale with the addition of more cores / threads? The bottlenecks will likely turn out to be communication between the pieces, and sequential parts that you can’t eliminate.

  You will need to choose a suitable problem to solve. Implement it sequentially first so that you understand the simple version. Then decide what can be done in parallel and how to combine and communicate the results at the end. And also figure out how you will measure your results and how you will instrument the various parts so that you understand what the issues are.

  Good luck.

 

  Christopher Aziz提到了LINPACK指标和openMP。

 

  ”performance measurement” is an important, frequently misrepresented and often misunderstood topic usually associated with “timing”. Notionally, “timing” a race is simple. You click the stop watch when the starter’s gun fires and then stop it when the first runner or horse crosses the finish line. This is the lay person’s view: ” Where is the hard part?”. Certainly not in this simplistic notion. Now a 6 dimensional race with variable non-continuous environmental conditions and a difficult to calculate “finish line” boundary condition might be closer to the practical reality but is far beyond the common conception of “timing”.

  The hard part is knowing what to measure and developing a process model that let’s you confirm the sensitivity, variability, repeatability and ability to extrapolate or predict results in other cases from your measurements.

  Shifting from abstract to practical, I’d suggest taking a look at the netlib.org/benchmark suite. My favorites are the LINPACK variants. Leave the HPL for last. To get a handle on factors affecting results start scalar (1 core) and try different compiler options, machines, size of problem effects and look at run to run variation. As you posted to this multicore group, I’d suggest next converting the codes to openMP.

  For timing consistency, I like the openMP elapsed timer, omp_get_wtime, regardless of how you may effect parallelization. It is portable and generally reliable.

  You are stepping into a huge space. It is a personal favorite. To get a “feel” for the subject area, I think your best first step is to pick a small corner and get deep with it rather than survey the broader domain.

  Good Hunting.

 

  Eugenio Villar进一步补充

 

  In practice, the performance of an algorithm strongly depends on the run-time conditions on a certain processor. The problem becomes harder in multi-processing platforms. And even harder in heterogeneous multi-processing platforms. When your target is the same as the host on which you are developing the code, some of the techniques commented already can be used. If your target is different, then an ISS or a virtualization-based simulator (www.ovpworld.org) can be used. A more abstract technology not requiring the complete compilable code is native simulation (www.teisa.unican.es/scope). We are now working in keeping the accuracy even when the OS calls are removed…

 

  Rick Peralta耐心地逐句解答了Ratika Agarwal的困惑

 

  Performance is relative some metric. There is no absolute performance. You can measure performance relative computations per second. You can measure the relative computations per second based on single or multi-threaded implementations. Of you could measure the ratio of computations to space. Or the quality of the computations (e.g. bits of resolution).

> I’m a beginner.I’m just going to start implementing the comparative analysis of algorithms.

  The first step is to determine what you want to measure. Then look for ways to measure and represent it. Often performance is multi-dimensional and multi-modal, so there is not necessarily a single optimal solution.

  Assuming you are looking for simple computations per second, for various solutions, the wall clock method is a good first step. How long does it take to complete some fixed bit of work. Compile various solutions and see what the run time is. Run the same test for different implementations. On a finer resolution, a profiler can give comparable information for various parts of the code.

> Don’t know from where to start.

  For your research, I’d suggest that you build a benchmark framework that monitors whatever you are looking at (wall clock time, CPU time, memory usage, external I/O, et cetera) and drop in the various solutions. The standard framework will help keep the comparisons comparable.

  Consider the traveling salesman problem. Is it better to do the whole job single threaded, perhaps using the stack to handle branches or to drop each branch to a thread thread or queue to a pool of threads? Code up each and see what the results are. Try to keep the test environment the same, as largely invisible characteristics (such as pipeline behavior or other activity on the system) can significantly impact the results.

  As a general rule, if things are not clear, simplify. Look at a simpler problem, use simpler tools. Too often people get lost in their own sophistication.

  - Rick Keep It Simple Silly ;^)

 

  Eduardo Grosclaude介绍了几本参考书

 

  You may find Grama/Gupta/Karypis/Kumar’s book very enlightening.http://www-users.cs.umn.edu/~karypis/parbook/

 

  Jonathan Beard又给了不少参考资料网址,他同意Rick的说法:“Keep It Simple”

 

  Here’s a few books I’ve found useful that I used for class that are relatively cheap (although I find it best to check out library editions until i really find I can’t live without a book):

  The Art of Multiprocessor Programming – http://bit.ly/pd3fz9

  Computer Architecture: A quantitative approach – http://bit.ly/o7PGoD , although if you don’t have much hardware experience I’d recommend the other book as well Computer Organization and Design.

  You mention creating parallel algorithms from sequential ones, are you planning on an automated process? If so I’d recommend a compiler book, Advanced Compiler Design (http://bit.ly/qK7wm2).

  There are tons of languages to implement parallel algorithms, the choice of which I think should be left to the algorithm and how the author can best express the idea…in other words, don’t get hung up on one language or the next.

  I agree with Rick on the keep it simple. I’d also go one step further and say turn off hyperthreading (not to knock it, but it’s easier to reason about performance without it), speed stepping, synchronize clocks across cores, and when possible tie threads/processes to a specific processor so you know exactly what data you are collecting. Nothing is more frustrating than performing multiple tests only to find out that part of your data must be thrown out because a context swap or frequency increase gave inconsistent results.

  Good luck.

 

  Houmad Tighezza引用了1978年的一本老书里的一段话

 

  The question is simple but the answer good be very complex, the answer depend witch performance you want to know: numerical performance, fault tolerance, CPU, memory, vectorization/ parallelization, speed up, MPIS, MACS, etc.?

  In my opinion, the first book all about the performance evaluation techniques and they application was Wright by DOMENICO FERRARI in 1978 (PRENTICE-HALL).

  Computer Systems Performance Evaluation by Domenico Ferrari 1978 : “The author argues that systems performance evaluation, in the first twenty years of its existence, has developed in substantial isolation with respect to such disciplines as computer architecture, system organization, operating systems, and software engineering. The possible causes for this phenomenon, which seems to be unique in the history of engineering, are explored. Its positive and negative effects on computer science and technology, as well as on performance evaluation itself, are discussed. In the author’s opinion, the drawbacks of isolated development outweigh its advantages. Thus, the author proposes instructional and research initiatives to foster the rapid integration of the performance evaluation viewpoint into the main stream of computer science and engineering.”

 

  Antonio M Flores建议多查查Wiki,上面全是宝

 

  Strictly speaking, algorithms are measured by an analisys of complexity, which is a computer science mathematical concept, so it is irrelevant to the hardware platform. I suggest that you take a deep reading to the article in Wikipedia for a first insight: http://en.wikipedia.org/wiki/Analysis_of_algorithms and http://en.wikipedia.org/wiki/Time_complexity. Of course, for the same complexity algorithms, you can later do profiling, which is a programming concept and for which there is plenty of software tools available: http://en.wikipedia.org/wiki/Profiling_(computer_programming), as well to do a measure of efficiency: http://en.wikipedia.org/wiki/Algorithmic_efficiency

  For parallel versus sequential algorithms, you should be using the concept of speed-up and related: http://en.wikipedia.org/wiki/Speed_up

  Good Luck and Success!!

 

  最后Ratika Agarwal表示感谢。又有一个叫Agnaldo Pereira的新手感谢Ratika Agarwal问了这个好问题,看来对他也帮助很大。

不完备性定理和创业

  忙碌的假期,回来了。这些天买了刘未鹏的《暗时间》村上春树的《地下》黄铁鹰的《海底捞你学不会》。前两本强烈推荐。

    

  最近创业者言必称保罗·格雷厄姆和Y Combinator基金。而Y Combinator这个词源于lambda函数不动点。看看刘未鹏这篇《康托尔、哥德尔、图灵——永恒的金色对角线》能了解更多。简单说,哥德尔对不完备性定理的证明,意味着不可能单纯用一套逻辑系统解释整个宇宙。这不仅仅对数学,甚至对哲学和艺术也产生了重大影响。

  假期结束前一天,参加42qu.com的“42区 . 技术 . 创业”活动。有两个报告人,一位是Resys的创始人xlvector,另一位是PE.VC的投资人Luc Lan。

  两年前听过xlvector的讲座,那时候他还在搞学术,重点是Netflix百万大奖的推荐算法比赛。这次再听,他已经加入工业界,所以就有更多实用化、架构,以及产品的商业效果的考虑。非常期待他的新书。

  Luc Lan的讲座也很有内容。对想创业的程序员来说,“shut up, just code”做出可用的原型来的确是最关键的。至于要不要寻找天使和风投,是根据情况具体分析的。例如:如果自己能筹措资金(互联网创业,启动资金不会太多,一台服务器,机房托管,必要的执照……具体内容张教主都回答过)支撑过最初的一段时间,甚至能熬到有现金收入,就不必付出代价很高的原始干股了;再例如,有的具体行业,资金并不是最大的瓶颈,如海底捞的老板就强调,无法那么快的培训出合格的二级店长和领班,是他们扩张的瓶颈。只有必须用钱来当催化剂,想迅速把现有的已经掌握窍门的业务扩大一百倍的时候,找投资才是对双方有利的事情。对一些参加的朋友有个建议,很多资料是可以通过网络查到或问到的。例如Luc Lan投过哪些项目,给张教主投了多少万,张教主一个月烧多少钱等等……做好功课,是高质量交流的前提。