Some Research Topics for Intern Students
Zhenjiang Hu welcomes
excellent students to do short-term intern research at NII on the following specific
Systematic Parallel Programming using Hadoop
Hadoop is a
software framework that supports data-intensive distributed
applications, which enables applications to work with thousands of
nodes and petabytes of data.
One key idea of Hadoop
is MapReduce, the
de facto standard for large scale data-intensive applications.
It can be seen as
parallel programming model with two skeletons: map and reduce. By
making use of the skeletal parallelism, users can program their
applications with ready-made skeletons without considering low level
details of parallelization, data distribution, load balancing and
Despite the simplicity of MapReduce, it is still a challenge
for a programmer to systematically solve his or her (nontrivial) problems.
Consider the maximum prefix sum problem of a sequence.
For instance, if the input sequence is
3, -1, 4, 1, -5, 9, 2, -6, 5
the maximum of the prefix sums should be 13 to which the underlined prefix corresponds.
It is not obvious how to solve this problem
efficiently with MapReduce (and we encourage the reader to
pause to think how to solve this). Such problems
widely exist in the real world, e.g,
financial time series analysis.
We have proposed two systematic approach to solving problems with
MapReduce. One idea is to ask the users to write two efficient
sequential programs, one for solving the problem itself and the other
for solving the inverse problem, from which an efficient MapReduce program
can be automatically generated. The related work can be found in
the following paper:
The other approach is a new framework to synthesize efficient
MapReduce parallel program for a general class of problems which can
be specified in the following generate-test-and-aggregate form:
aggregate . test . generate
Problems that match this specification can be solved by first
generating possible solution candidates, then keeping those candidates
that pass a test of a certain condition, and finally selecting a valid
solution or making a summary of valid solutions with an aggregating
We have shown that this framework can be efficiently implemented
on Hadoop as a Java library. Users of our library can
concentrate on specifying naive generate-test-and-aggregate algorithms
in Java, and obtain efficient MapReduce programs for free.
Generate, Test, and Aggregate --- A Calculation-based Framework for Systematic Parallel Programming with MapReduce,
22nd European Symposium on Programming
Tallinn, Estonia, March 24 - April 1, 2012. pp. 254-273.
A Generate-Test-Aggregate Parallel Programming Library,
2013 International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2013),
conjunction with PPoPP 2013, Shenzhen, China, February 23, 2013.
The purpose of this intern research is to evaluate and extend
the above approaches through more practical examples.
In addition, we may go further to see how computation on trees/graphs can
be systematically mapped to MapReduce. As the first step, we would
like to see how tree skeletons or graph queries can be efficiently implemented with
MapReduce. One related paper is:
Bidirectional Model Transformation in Software Engineering
transformations provide a novel mechanism for synchronizing and
maintaining the consistency of information between input and
output. They consist of a pair of well-behaved transformations:
forward transformation is used to produce a target view from a
source, while the backward transformation is used to reflect
modification on the view to the source.
This pair of forward and
backward transformations should satisfy certain bidirectional
properties. Bidirectional transformations are indeed pervasive and
can be seen in many interesting applications, including the
synchronization of replicated data in different formats,
presentation-oriented structured document development,
interactive user interface design,
coupled software transformation,
and the well-known view updating mechanism which has been intensively studied in the
J. Nathan Foster,
James F. Terwilliger,
Bidirectional Transformations: A Cross-Discipline Perspective,
International Conference on Model Transformation
ETH Zurich, Switzerland, June 29-July 3 2009.
LNCS 5563, Springer. pp.260-283.
Dagstuhl Seminar on Bidirectional Transformations,
SIGMOD Record, Vol.40, No.1 2011.
The purpose of this intern research is to investigate how to use the
bidirectional graph (model) transformation system GRoundTram in software engineering, say for co-evolution
(synchronization) of models and codes
software development. An application can be found in the following paper.
Maintaining Invariant Traceability through Bidirectional Transformations,
34th International Conference on Software Engineering
Zurich, Switzerland, Kune 2-9, 2012. pp.540-550.
Adaptive (Incremental) Bidirectional Graph Transformationl
The objective of this intern project is to make use of the
computing atechnique to improve efficiency of bidirectional
transformation in general, and of bidirectional graph transformation
idea about bidirectional graph transformation is discussed in the
As another reference, the following paper discusses about an implementation of
the combinator libary X/Inv in Haskell for bidirectional tree transformation.
Bidirectionalizing Graph Transformations,
15th ACM SIGPLAN International Conference on Functional Programming
Baltimore, Maryland, USA, September 27-29, 2010.
Maintained by Zhenjiang Hu. Last modified on 2011.