Indiana University

 

Active Consulting Projects

equation

HPA works with AVL on dental images

Dental caries is an infectious, communicable disease that causes destruction of teeth via acid-forming bacteria found in dental plaque. Scientists evaluate its treatment by analyzing Microfocus Computed Tomography (μ-CT) images collected from tooth specimen over time. With each tooth having 5-phase longitudinal evaluation and each phase generating one thousand high-resolution images, the overall data volume is tremendous. HPA and AVL are working together to utilize HPC resources to segment images and identify the region of interest (ROI). The ParaView high performance visualization software is also used to get qualitative understanding of the data.

equation

Network Monitoring Framework

Scientists are dealing with ever growing sizes of data that is generated from experiments and needs processing. In a world where the computing power is not always concentrated at a single institution, state or country and to encourage collaboration across continents, there is a need for efficient transfer of data from one place to another. While there exist many protocols and tools that the users can use, it may not be possible for the users to spend time and energy to analyze the network and figure out the best way forward.

The Network Monitoring Framework we are developing is a step in the direction of helping the users run a set of tests quickly (a finite number of times or over a month or longer to determine the behavior of the network). The Network Monitoring Framework is a simple yet elegant tool which contains a suite of tests based on the the most frequently used network and data transfer protocols, such as Ping, Iperf and SCP. It also incorporates testing of distributed filesystems like RFS and Lustre, if needed.

The framework currently stores the data in RRD (round-robin) databases so that the user can easily generate graphs for a set of dates. RRDtool is a round robin database for time series data. It can easily plot graphs from the data stored and can be incorporated into Bash, Perl and other scripts. We aim to provide a flexible data-storage and plotting solution in the future.

So far, we have deployed the framework from IU to TU, Dresden and NOAO, Tucson. We are planning to deploy to more end-points to both gather network statistics and to test/improve our framework.

equation

HPA, ZIH and NCGAS conduct detailed performance study of Trinity

The High Performance Applications group together with the National Center for Genome Analysis Support and the Center for Information Services and High Performance Computing at Technische Universitaet Dresden are conducting a detailed performance study of Trinity. Trinity is a novel method for efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. It delivers very good results at the cost of a long runtime. To speed up the analysis, we are looking at each step of the Trinity workflow and how it can be optimized for running on a cluster of large memory nodes (link to IU mason cluster). Initial results are showing a performance increase of up to 30% , just by properly installing and configuring Trinity to take full advantage of all the features of a modern HPC system.

equation

HPA assists with the upgrade of Dr. Mike Jolly's Fortran 77 code

Professor Mike Jolly, and the PDE Group in Applied Mathematics and Computation, are investigating how energy is transferred through length scales in a turbulent fluid flow with the support of a grant from the National Science Foundation. Their Fortran based code solves for an approximate solution of the 2D Navier-Stokes equations. These are partial differential equations (PDE's) whose solution depends on both time and space. Their length scales are approximately inversely proportional to the indices in the 2D arrays.
This research presents three computational challenges. Patterns must be averaged over long time intervals. As precision increases, the length of time steps must be decreased. Finally, each experiment must be run under multiple force configurations. The HPA group is assisting with the upgrade of this code from a Fortran 77 style to a more efficient and modern Fortran construct and exploring thread level parallelism.

GitHub

Leveraging SDSC’s Dash to Enable Genomics Research

Researchers from the Lynch Lab in the IUB Biology department focus their research “on mechanisms of evolution at the gene, genomic, and phenotypic levels, with special attention being given to the roles of mutation, random genetic drift, and recombination.” The researchers needed to assemble relatively large gene sequences. However, the assembly software requires a large amount of memory to assemble these sequences. Since the group’s memory requirements exceeded the current capabilities at IU, the HPA group assisted the researchers in acquiring compute time on the TeraGrid’s Dash machine. Dash is a virtual shared memory machine utilizing the ScaleMP foundation and containing an aggregate of 768GB of shared memory. Access to this system has allowed the Lynch group to proceed with their assemblies.

GitHub

Using COMSOL on IU Supercomputers

Several groups in the IUB Chemistry and Physics departments use the multiphysics modeling and simulation software COMSOL. For some problems their computational need was beginning to exceed their resources. To aid their efforts the High Performance Applications group obtained a trial license of COMSOL to install and evaluate on the Quarry cluster. After working closely with researchers in the Chemistry department HPA successfully installed and tested several configurations of the COMSOL software on Quarry. This included interactive client/server mode, batch mode, and cluster MPI mode. With these tests in hand, COMSOL users can have confidence that the Quarry supercomputer can meet any departmental computational shortfalls.

GitHub

HPA works with Prof. Mu-Hyun Baik in the IUB Chemistry department

Prof. Mu-Hyun Baik and his group in the IUB Chemistry department use computational quantum chemistry in their research on artificial photosynthesis, reaction pathways of the cancer drug cisplatin, and the chemistry involved in Alzheimer's disease. These studies involve extensive calculations of the structure of complex molecules, and of reaction mechanisms that are difficult or impossible to observe experimentally.

The quantum chemistry codes they use can be quite complex, and learning how to compile and run them on large parallel computers can be difficult and time consuming. Such is the case with two principal code packages the Baik Group uses, Jaguar and MOLCAS. Especially with Jaguar, the problems of compiling, and getting the code to run correctly and efficiently have become major roadblocks to progress. HPA is partnering with the group to take responsibility for these issues. We are installing and maintaining Jaguar and MOLCAS on IU's platforms, and assisting in recompiling, testing and benchmarking as the Baik Group makes changes to the codes. We are also using Vampir to analyze the parallel performance of these codes.

This is a project where establishing a collaborative relationship is important. In addition to installing and maintaining Jaguar and MOLCAS, we are helping with production runs. By becoming familiar with the Baik Group's computational procedures and problems, we are solving some of the difficult problems of maintaining and running these complex codes on IU's parallel machines. In addition, results from our Vampir analysis may help the Jaguar and MOLCAS developers improve their codes' performance.

Close collaboration between HPA and the Baik Group is enabling Prof. Baik and his group to concentrate on doing chemistry. Helping them use IU's HPC resources more efficiently is reducing the time and cost for them to obtain research results.

GitHub

HPA evaluates github:fi

The High Performance Applications group provides support for researchers at Indiana University and users of the TeraGrid. This support includes migration of code between computing platforms, profiling, tracing and optimization to improve application performance and parallelization of existing serial codes. During such projects, we have found that source code management can often be improved by using revision control systems, like SVN or GIT. Github:fi takes this a step further by providing a rich web based collaboration platform. HPA is currently evaluating how such a tool can be used in a University setting. HPA is working together with the Research Technologies Core Services group to evaluate Github:fi. A brief overview presentation is available here.

HPA assists in processing thousands of skull images

A new faculty member has brought thousands of skull images from his previous institution. He has begun a project to "average" these images to create a template human skull. Given this skull template, individual images can be compared to determine the variability of human skulls. Eventually, an atlas of templates will be created giving an average skull by region of the earth. The resulting atlas and variability maps have application in Anthroplogy, Archeology and Forensics.

HPA assisted in installation of the analysis programs to carry out this work and is currently developing a workflow system to automate the processing of this large number of images

PulsarCU1300_web

SPEC Updates MPI2007 Benchmark

SPEC, the Standard Performance Evaluation Corp, has released a new version of its SPEC MPI2007 benchmark that adds a large data suite designed for systems from 64 to 2048 cores.

SPEC MPI2007, developed by SPEC's High-Performance Group (SPEC/HPG), measures the performance of parallel computing systems and clusters running actual end-user Message-Passing Interface (MPI) applications. It provides performance metrics that can be used to compare different hardware architectures (SMP, NUMA, clusters) and interconnects, processors, memory hierarchy, compilers, and MPI implementations.

SPEC MPI2007 V2.0 was developed by SPEC members AMD, Argonne National Laboratory, Fujitsu, IBM, Indiana University, Intel, Platform Computing, QLogic, SGI, Sun, and Technische Universität Dresden. HPA was involved with testing the new version on Linux and Windows systems.

 
M51 galaxy

Managing Massive Data Sets with an ODI Image Analysis Pipeline

The One Degree Imager (ODI) is the flagship of the WIYN Consortium's new instrument initiatives. The combination of its large size and ability for electronic image stabilization make ODI a unique and very competitive instrument. ODI is sensitive to visible light and features a one thousand mega-pixel camera, an impressive number compared to the average digital camera at eight mega-pixels. The camera will cover a one-square-degree field of view which will allow ODI to capture vast areas of sky, greater than four times the area of the full moon, in a single image. UITS HPA has been working with the IU Astronomy department to develop a new innovative software system to deal with the massive influx of data produced by the new instrument. This software pipeline will leverage several of IU's cutting edge systems including the Data Capacitor as well as leveraging IU's involvement in the TeraGrid.

Read More
Developing a genetic/physiology/environmental/disease based model to predict drug exposure

Developing a genetic/physiology/environmental/disease based model to predict drug exposure

This model development is composed three phases:

  • Text mining from published data for prior knowledge integration
  • Multi-compartment-model development for genetic/physiology/environmental/disease based prediction
  • Experimental data based prediction validation

Impact: this modeling system is a translational tool to correlate the medication to clinical outcome.

The entire development process includes transforming the R code into C, porting the C code onto a Linux cluster, and then parallelizaing it by using OpenMP.