Dr. Nikolas Askitis

Specializing in Data Structures, Sorting, and Scalable Genome Repeat Detection

Email: email
Feel free to contact me if you have any questions regarding my publications or software (in particular for the latest addition: RepMaestro).
I plan to release the source code for my data structures just as soon as I'm done with my publications.  Until then, the binaries
are available for evaluation.  My C.V. can be provided upon request.


PhD Thesis: Efficient Data Structures for Cache Architectures (2007)     [ PDF ]  
1321 downloads since Jan 2008. Check out the real-time stats here

Doctoral Citations:

Dr Askitis has developed dramatic improvements to some of the most fundamental string data structures used in computation by redesigning them for use on current computer architectures. His new string algorithms are much faster and require much less space than their predecessors, some of which have been the subject of research for over fifty years. These improvements have the potential to greatly improve the efficiency of a wide range of computing applications including databases, web search and data processing.


Publications and Software

Legal note:

The software provided is for non-commercial use only. By using my software, you accept the terms and conditions outlined in the README file (found in the software package). Please contact me if you want to use or modify the software in a product or commercial/research environment. Thank you. The software is also periodically updated; check back for updates.

  1. N. Askitis and R. Sinha,   RepMaestro: Scalable Repeat Detection on Disk-based Genome Sequences    [ PDF ]   [ Software ]   [ Pre-compiled HG19 datasets ]  
    Published to the prestigious Bioinformatics Oxford Journal, Advanced Access, doi:10.1093/bioinformatics/btq433, July 2010.

  2. N. Askitis,   OzSort 2.0: Sorting up to 252GB for a Penny     [ PDF ]
    Winner of the 2010 Sort Benchmark PennySort contest. Entry was refereed and published on the International Sort Benchmark HomePage May 2010 (medal to be awarded at ACM SIGMOD 2010).

  3. N. Askitis and R. Sinha,   Engineering Scalable, Cache and Space Efficient Tries for Strings       [ PDF ]   [ Software ]   [ Datasets ]
    Published to the prestigious International VLDB Journal, Published Online, 2010. DOI: 10.1007/s00778-010-0183-9 ISSN: 1066-8888

  4. N. Askitis and J. Zobel,   B-tries for Disk-based String Management       [ PDF ]   [ Software ]   [ Datasets ]
    Published to the prestigious International VLDB Journal, Vol. 18, No. 1, Pg 157, January 2009.

  5. N. Askitis,   Fast and Compact Hash Tables for Integer Keys     [ PDF ]   [ Software ]   [ Datasets ]
    32nd Australasian Computer Science Conference, Wellington, New Zealand, January 2009. Vol 91. Page 113. ISBN 978-1-920682-72-9

  6. N. Askitis and R. Sinha,   OzSort: Sorting over 246GB for a Penny    
    Refereed and published on the International Sort Benchmark Home Page (medal awarded at ACM SIGMOD 2009). The first Australian team to win the PennySort competition.

  7. R. Sinha and N. Askitis,   OzSort: Sorting 100GB for less than 87kJoules    
    Refereed and published on the International Sort Benchmark Home Page (medal awarded at ACM SIGMOD 2009). The first Australian team to win the JouleSort competition, and the first team to demonstrate the application of standard (non-laptop) computers components to deliver competitive energy-efficient large-scale sorting.

  8. N. Askitis and J. Zobel,  Redesigning the string hash table, burst trie and BST to exploit cache,
    Initial submission September 16th 2007, candidate for publication to ACM Journal of Experimental Algorithmics. Initial feedback received (finally) in late July 2010, with just a few minor corrections.
    Manuscript is now under second stage review. 
    • RMIT Technical Report TR-08-4 is available upon request. See PhD Thesis for more details.

  9. N. Askitis and R. Sinha,   HAT-trie: A Cache-conscious Trie-based Data Structure for Strings      [ PDF ]   [ Software ]   [ Datasets ]
    In Proceedings of the 30th Australasian Computer Science Conference, Ballarat, Australia, January 2007.

  10. N. Askitis and J. Zobel,   Cache-conscious Collision Resolution in String Hash Tables      [ PDF ]   [ Software ]   [ Datasets ]
    String Processing and Information Retrieval: 12th International Conference, Buenos Aires, Argentina, November 2-4, 2005.

Pre-processed HG19 dataset

You can download the complete pre-processed human chromosome sequence (hg19) below.  The software package includes the following files
for each chromosome in hg19, and are fully compatible with the latest version of RepMaestro: 
8n unsigned arrays can also provided upon request. These files are compressed using the popular freeware bzip2 and are hosted on 
a high-bandwidth server.  The total download size/disk-space requirement is about 31GB, though you can select and download individual chromosomes.
This service will be online soon.  
 

String datasets