In external sorting, what is a 'run' in the context of multiway merge sort?
The total number of sorted files
A portion of the data that is sorted in memory
A single element in the unsorted data
The final merged and sorted output
How does the 'k-way merge' in multiway merge sort relate to disk I/O efficiency?
Higher 'k' always leads to the fewest I/O operations, regardless of data size
Lower 'k' reduces memory usage but might increase disk I/O
'k' represents the number of sorting algorithms used, not the I/O impact
The optimal 'k' is independent of the available memory size
Why are distributed systems often well-suited for implementing parallel sorting algorithms?
Distributed systems automatically choose the optimal sorting algorithm
Distributed systems inherently prevent data races in parallel processing
Network latency is negligible in modern distributed systems
They provide a natural way to divide data and processing across multiple nodes
What is the significance of the minimum run size ('minrun') parameter in Timsort's implementation?
It controls the maximum depth of recursion allowed during the merge process, limiting space complexity.
It specifies the minimum number of elements that will trigger the use of Timsort; smaller datasets are sorted using a simpler algorithm.
It sets the threshold for switching from Merge sort to Quicksort during the sorting process.
It determines the maximum size of a run that will be sorted using Insertion sort.
What is the space complexity of Timsort in its typical implementation?
O(n) - Linear space
O(log n) - Logarithmic space
O(1) - Constant space
O(n log n) - Log-linear space
Which of the following scenarios would be an ideal use case for external sorting?
Generating a leaderboard from a massive online gaming database
Sorting a small array of integers within a mobile app
Sorting a list of recently accessed files by timestamp
Reordering a linked list in a real-time graphics engine
How does parallel merge sort leverage multiple cores for improved performance?
It assigns each element to a separate core for independent sorting
It employs a different sorting algorithm on each core for diversity
It uses a single core for sorting but multiple cores for data I/O
It divides the data, sorts sub-arrays concurrently, then merges the results
How does Timsort identify and leverage existing sorted subsequences ('runs') within the input data?
It performs a preliminary pass over the data using a hash table to mark sorted elements.
It iterates through the data, detecting sequences where elements are in ascending or strictly descending order.
It recursively divides the array until it reaches sub-arrays of size 1, which are inherently sorted.
It uses a divide-and-conquer approach to identify the median of the data and splits runs based on that.
In parallel quick sort, what is the impact of choosing a pivot element on performance?
Only a randomly chosen pivot guarantees optimal parallel efficiency
The pivot should always be the first element in each partition
Pivot selection is irrelevant in a parallel context
A poorly chosen pivot can lead to unbalanced workloads across cores
Which of these applications is LEAST likely to benefit significantly from parallel sorting?
Real-time fraud detection in financial transactions
Analyzing large-scale genomic data for disease research
Climate modeling simulations on a supercomputer
Sorting a small list of contacts in a mobile phone app