toil.test.sort.restart_sort

A demonstration of toil. Sorts the lines of a file into ascending order by doing a parallel merge sort. This is an intentionally buggy version that doesn’t include restart() for testing purposes.

Module Contents

Functions

setup(job, inputFile, N, downCheckpoints, options)

Sets up the sort.

down(job, inputFileStoreID, N, path, downCheckpoints, ...)

Input is a file, a subdivision size N, and a path in the hierarchy of jobs.

up(job, inputFileID1, inputFileID2, path, options[, ...])

Merges the two files and places them in the output.

sort(file)

Sorts the given file.

merge(fileHandle1, fileHandle2, outputFileHandle)

Merges together two files maintaining sorted order.

copySubRangeOfFile(inputFile, fileStart, fileEnd)

Copies the range (in bytes) between fileStart and fileEnd to the given

getMidPoint(file, fileStart, fileEnd)

Finds the point in the file to split.

makeFileToSort(fileName[, lines, lineLen])

main([options])

Attributes

defaultLines

defaultLineLen

sortMemory

toil.test.sort.restart_sort.defaultLines = 1000
toil.test.sort.restart_sort.defaultLineLen = 50
toil.test.sort.restart_sort.sortMemory = '600M'
toil.test.sort.restart_sort.setup(job, inputFile, N, downCheckpoints, options)[source]

Sets up the sort. Returns the FileID of the sorted file

toil.test.sort.restart_sort.down(job, inputFileStoreID, N, path, downCheckpoints, options, memory=sortMemory)[source]

Input is a file, a subdivision size N, and a path in the hierarchy of jobs. If the range is larger than a threshold N the range is divided recursively and a follow on job is then created which merges back the results else the file is sorted and placed in the output.

toil.test.sort.restart_sort.up(job, inputFileID1, inputFileID2, path, options, memory=sortMemory)[source]

Merges the two files and places them in the output.

toil.test.sort.restart_sort.sort(file)[source]

Sorts the given file.

toil.test.sort.restart_sort.merge(fileHandle1, fileHandle2, outputFileHandle)[source]

Merges together two files maintaining sorted order.

All handles must be text-mode streams.

toil.test.sort.restart_sort.copySubRangeOfFile(inputFile, fileStart, fileEnd)[source]

Copies the range (in bytes) between fileStart and fileEnd to the given output file handle.

toil.test.sort.restart_sort.getMidPoint(file, fileStart, fileEnd)[source]

Finds the point in the file to split. Returns an int i such that fileStart <= i < fileEnd

toil.test.sort.restart_sort.makeFileToSort(fileName, lines=defaultLines, lineLen=defaultLineLen)[source]
toil.test.sort.restart_sort.main(options=None)[source]