Some Research Topics for Intern Students
Zhenjiang Hu welcomes
excellent students to do short-term intern research at NII on the following specific
topics:
Systematic Parallel Programming using Hadoop
Hadoop is a
software framework that supports data-intensive distributed
applications, which enables applications to work with thousands of
nodes and petabytes of data.
One key idea of Hadoop
is MapReduce, the
de facto standard for large scale data-intensive applications.
It can be seen as
a skeletal
parallel programming model with two skeletons: map and reduce. By
making use of the skeletal parallelism, users can program their
applications with ready-made skeletons without considering low level
details of parallelization, data distribution, load balancing and
fault tolerance.
Despite the simplicity of MapReduce, it is still a challenge
for a programmer to systematically solve his or her (nontrivial) problems.
Consider the maximum prefix sum problem of a sequence.
For instance, if the input sequence is
3, -1, 4, 1, -5, 9, 2, -6, 5
the maximum of the prefix sums should be 13 to which the underlined prefix corresponds.
It is not obvious how to solve this problem
efficiently with MapReduce (and we encourage the reader to
pause to think how to solve this). Such problems
widely exist in the real world, e.g,
financial time series analysis.
We have proposed two systematic approach to solving problems with
MapReduce. One idea is to ask the users to write two efficient
sequential programs, one for solving the problem itself and the other
for solving the inverse problem, from which an efficient MapReduce program
can be automatically generated. The related work can be found in
the following paper:
The other approach is a new framework to synthesize efficient
MapReduce parallel program for a general class of problems which can
be specified in the following generate-test-and-aggregate form:
aggregate . test . generate
Problems that match this specification can be solved by first
generating possible solution candidates, then keeping those candidates
that pass a test of a certain condition, and finally selecting a valid
solution or making a summary of valid solutions with an aggregating
computation.
We have shown that this framework can be efficiently implemented
on Hadoop as a Java library. Users of our library can
concentrate on specifying naive generate-test-and-aggregate algorithms
in Java, and obtain efficient MapReduce programs for free.
The purpose of this intern research is to evaluate and extend
the above approaches through more practical examples.
Tree Computation with MapReduce
The purpose of this intern research follows similar motivation as the
above on systematic parallel programming on
Hadoop. It would be interesting to see how computation on trees can
be systematically mapped to MapReduce. As the first step, we would
like to see how tree skeletons can be efficiently implemented with
MapReduce. One related paper is:
Co-Evolution of Models and Codes using Bidirectional Transformation
Bidirectional
transformations provide a novel mechanism for synchronizing and
maintaining the consistency of information between input and
output. They consist of a pair of well-behaved transformations:
forward transformation is used to produce a target view from a
source, while the backward transformation is used to reflect
modification on the view to the source.
-
Krzysztof Czarnecki,
J. Nathan Foster,
Zhenjiang Hu,
Ralf Lammel,
Andy Schurr,
James F. Terwilliger,
Bidirectional Transformations: A Cross-Discipline Perspective,
International Conference on Model Transformation
(ICMT 2009),
ETH Zurich, Switzerland, June 29-July 3 2009.
LNCS 5563, Springer. pp.260-283.
-
Zhenjiang Hu,
Andy Schurr,
Perdita Stevens,
James Terwilliger,
Dagstuhl Seminar on Bidirectional Transformations,
SIGMOD Record, Vol.40, No.1 2011.
pp.35-39.
This pair of forward and
backward transformations should satisfy certain bidirectional
properties. Bidirectional transformations are indeed pervasive and
can be seen in many interesting applications, including the
synchronization of replicated data in different formats,
presentation-oriented structured document development,
interactive user interface design,
coupled software transformation,
and the well-known view updating mechanism which has been intensively studied in the
database community.
The purpose of this intern research is to investigate how to use the
bidirectional graph (model) transformation system GRoundTram for co-evolution
(synchronization) of models and codes
in model-driven
software development.
A Generic GUI for Supporting Bidirectional Model-driven Software Development
As discussed above, we have develop a bidirectional graph transformation
system GRoundTram with a simple GUI, which can deal with
edge-labelled graphs in the dot format.
The purpose of this intern research is to develop a generic GUI (in Java) upon the current
GRoundTram system, which can deal with various kinds of graphs such as UML, feature models,
control flow graphs, etc. This needs us to express relationship between the edge-labelled graphs
and other graphs. For instance, the relationship between the edge-labelled graphs and the
UML graphs is discussed in the following paper:
-
Isao Sasano,
Zhenjiang Hu,
Soichiro Hidaka,
Kazuhiro Inaba,
Hiroyuki Kato,
Keisuke Nakano,
Toward bidirectionalization of ATL with GRoundTram,
International Conference on Model Transformation
(ICMT 2011),
Zurich, Switzerland, June 27-28, 2011. LNCS 6707. pp.138-151.
Maintained by Zhenjiang Hu. Last modified on 2011.