PhD Thesis:
Efficient Data Structures for Cache Architectures (2007)
[ PDF ]
1321 downloads since Jan 2008. Check out the real-time
stats
here
Doctoral Citations:
Dr Askitis has developed dramatic improvements to some of the
most fundamental string data structures used in computation by redesigning
them for use on current computer architectures. His new string algorithms
are much faster and require much less space than their predecessors, some
of which have been the subject of research for over fifty years. These
improvements have the potential to greatly improve the efficiency of a wide
range of computing applications including databases, web search and data
processing.
Publications and Software
Legal note:
The software provided is for non-commercial use only. By using my software, you accept the terms and conditions outlined in the README file (found in the software package).
Please contact me if you want to use or modify the software in a product or commercial/research environment. Thank you.
The software is also periodically updated; check back for updates.
- N. Askitis and R. Sinha,
RepMaestro: Scalable Repeat Detection on Disk-based Genome Sequences
[ PDF ]
[ Software ] [ Pre-compiled HG19 datasets ]
Published to the prestigious Bioinformatics Oxford Journal, Advanced Access, doi:10.1093/bioinformatics/btq433, July 2010.
- N. Askitis,
OzSort 2.0: Sorting up to 252GB for a Penny
[ PDF ]
Winner of the 2010 Sort Benchmark PennySort contest. Entry was refereed and published on the International Sort Benchmark HomePage
May 2010 (medal to be awarded at ACM SIGMOD 2010).
-
N. Askitis and R. Sinha, Engineering Scalable, Cache and Space Efficient Tries for Strings
[ PDF ]
[ Software ] [ Datasets ]
Published to the prestigious International VLDB
Journal, Published Online, 2010. DOI: 10.1007/s00778-010-0183-9 ISSN: 1066-8888
-
N. Askitis and J. Zobel, B-tries for
Disk-based String Management
[ PDF ] [ Software ] [ Datasets ]
Published to the prestigious International VLDB
Journal, Vol. 18, No. 1, Pg 157, January 2009.
-
N. Askitis, Fast and Compact Hash Tables for Integer Keys
[ PDF ] [ Software ] [ Datasets ]
32nd Australasian Computer Science Conference,
Wellington, New Zealand, January 2009. Vol 91. Page 113. ISBN 978-1-920682-72-9
- N. Askitis and R. Sinha,
OzSort: Sorting over 246GB for a Penny

Refereed and published on the International Sort Benchmark Home Page (medal awarded at ACM SIGMOD 2009). The first Australian team to win the PennySort competition.
- R. Sinha and N. Askitis,
OzSort: Sorting 100GB for less than 87kJoules

Refereed and published on the International Sort Benchmark Home Page (medal awarded at ACM SIGMOD 2009). The first Australian team to win the JouleSort competition, and the first team to demonstrate the application of standard
(non-laptop) computers components to deliver competitive energy-efficient large-scale sorting.
-
N. Askitis and J. Zobel, Redesigning the string hash table, burst trie and BST to exploit cache,
Initial submission September 16th 2007, candidate for publication to ACM Journal of Experimental Algorithmics. Initial feedback received (finally) in late
July 2010, with just a few minor corrections.
Manuscript is now under second stage review.
- RMIT Technical Report TR-08-4 is available upon request. See PhD Thesis for more details.
-
N. Askitis and R. Sinha, HAT-trie:
A Cache-conscious Trie-based Data Structure for Strings
[ PDF ]
[ Software ] [ Datasets ]
In Proceedings of the 30th Australasian Computer Science
Conference, Ballarat, Australia, January 2007.
-
N. Askitis and J. Zobel, Cache-conscious
Collision Resolution in String Hash Tables
[ PDF ]
[ Software ] [ Datasets ]
String Processing and Information Retrieval: 12th
International Conference, Buenos Aires, Argentina, November
2-4, 2005.
Pre-processed HG19 dataset
You can download the complete pre-processed human chromosome sequence (hg19) below. The software package includes the following files
for each chromosome in hg19, and are fully compatible with the latest version of RepMaestro:
- One-line plain-text file (.dat) (with no FastA headers).
- FastA header file (.header)
- 4n unsigned suffix array
- 4n unsigned LCP array
- 1n unsigned BWT array
- 9n unsigned SLB array
8n unsigned arrays can also provided upon request. These files are compressed using the popular freeware bzip2 and are hosted on
a high-bandwidth server. The total download size/disk-space requirement is about 31GB, though you can select and download individual chromosomes.
This service will be online soon.