mklz-fps

mklz Fast Parallel Sort


Project maintained by firephinx Hosted on GitHub Pages — Theme by mattgraham

Project Checkpoint

Summary

We are planning on competing in 15-210’s Sorting Competition by Professor Guy Blelloch. In order to compete effectively, we are going to implement different parallel comparison-based sorts in a few languages and optimize for performance on the provided 72-core machine as well as run tests on a variety of other machine such as the unix machines, gates machines, latedays cluster, and Xeon Phis.

Background

Sorting is a classical problem in computer science. The challenge with sorting these days is correctly optimizing the performance of different sorting algorithms on increasingly parallel and distributed machines.

This parallel sorting competition was staged by Professor Guy Blelloch and the rest of the 210 team to incentivize students to help work on this real-world problem by writing efficient parallel sorting algorithms in a variety of languages, and especially garbage-collected languages. In addition, they would like to test how their SML work-stealing scheduler holds up against highly optimized solutions from Carnegie Mellon students.

The Challenge

The main source of the challenge will be managing memory access and bandwidth requirements and creating a way for the sorting algorithm to adapt quickly and easily to each machine’s configuration. Sorting is a problem with low arithmetic intensity; thus, making intelligent design choices based on the architecture of the target machine (e.g. caches, hyperthreading, etc.) will be crucial to getting good performance.

We will also be coding in higher-level languages which, unlike C, will probably not expose as much low-level control over the machine to us. We will need to make use of our 418 knowledge to coax these high-level languages into giving us good performance.

Resources

We will do initial testing on the 20 core unix machines and the Gates machines.

Our “starter code” consists of the two reference implementations that Blelloch’s team provided. We will need to modify the given C++ code or install gcc 4.9 in order to get it to run on the GHC machines though.

There is also substantial existing literature on parallel sorting algorithms. We will refer to those such as this paper on Robust Massively Parallel Sorting.

We will need access to the Xeon Phi machines in addition to the four days’ worth of Aware access that Blelloch’s team will give us.

Goals and Deliverables

Plan to achieve

Hope to achieve

Planned Demo

Platform Choice

The platform has been already been determined for us by the rules of the contest, but we have chosen the programming languages that we will use, which are Go, Java, and C++ because they provide us with the most flexibility for parallelism.

We will be creating a C++ version with CUDA and/or ISPC as stated in our stretch goals because sorting is a bandwidth-intensive problem, and GPUs have much higher bandwidth currently compared to CPUs.

Project Checkpoint

Progress

So far, Matt has implemented a parallel Sample Sort algorithm in Go that sorts 100 million pairs in 1.9 seconds. Kevin is still working on the parallel sorting algorithm in Java and has tested the performance of the built in parallel sort algorithm. We have analyzed the performance of their C++ code on the unix machines and have cemented our focus on Go as our main submission into the parallel sorting competition. We will be submitting our Go implementation soon to Guy Blelloch in order to unlock access to the Aware machine and begin testing both implementations on the Aware machines.

Potential Issues

We don’t foresee any issues with completing the project. The only problem might be submitting 2 languages to the sorting competition because there are many projects due on the week of May 4th. However, we anticipate being able to complete all of our planned tasks by the class competition on May 12th.

Schedule

Week Tasks Progress
Apr 10 - Apr 16 Review literature on parallel sorting algorithms
Get the provided C++ code to compile and run on GHC machines and then analyse the performance of the provided C++ code
Choose at least 2 garbage-collected languages and get them working on the GHC machines (set up the environment so we are able to compile and run parallel code)
Completed 4/14
Completed 4/25
Completed 4/14
Apr 17 - Apr 23 Implement first attempt at parallel sorting in the garbage-collected languages of our choice
Analyse initial results from the first attempt and choose a language to focus on for our main contest submission
Completed 4/23
Completed 4/25
Apr 24 - Apr 30 Do write-up for project checkpoint (due Apr 25)
Focus on optimizing the parallel sort in the language of our choice (Matthew - Go, Kevin - Java)
Submit our progress so far to Guy Blelloch’s team in order to request access to Aware machine
Completed 4/25
Completed 5/4
Completed 5/2
Apr 31 - May 5 Test and optimize on the Aware machine
Submit to the 210 sorting competition (due May 4)
Completed 5/4
Completed 5/4
May 6 - May 12 Write benchmarking code with CUDA thrust
Prepare for final presentation
Completed 5/11
Completed 5/11