Why is the choice of the number of ways in multiway merge sort a trade-off?
Higher ways reduce disk I/O but increase memory usage.
Lower ways improve cache locality but decrease sorting speed.
Higher ways simplify the algorithm but limit dataset size.
Lower ways are faster for small datasets but slower for large ones.
Which of the following scenarios would be an ideal use case for external sorting?
Sorting a small array of integers within a mobile app
Reordering a linked list in a real-time graphics engine
Generating a leaderboard from a massive online gaming database
Sorting a list of recently accessed files by timestamp
In external sorting, what is a 'run' in the context of multiway merge sort?
A single element in the unsorted data
A portion of the data that is sorted in memory
The final merged and sorted output
The total number of sorted files
What is a potential drawback of using a high number of ways (e.g., 1024-way) in a multiway merge sort for external sorting?
Higher complexity in managing the merging of numerous runs.
Reduced efficiency in handling datasets with high entropy.
Decreased performance due to excessive disk I/O operations.
Significantly increased memory consumption for buffering.
What is a potential use case for parallel sorting in a distributed system?
Sorting data within a single process on a web server.
Sorting sensor data collected from multiple devices in real-time.
Sorting the files in a directory on a personal computer.
Sorting the contents of a small in-memory database table.
How does the 'k-way merge' in multiway merge sort relate to disk I/O efficiency?
'k' represents the number of sorting algorithms used, not the I/O impact
Higher 'k' always leads to the fewest I/O operations, regardless of data size
Lower 'k' reduces memory usage but might increase disk I/O
The optimal 'k' is independent of the available memory size
How does parallel merge sort leverage multiple cores for improved performance?
It employs a different sorting algorithm on each core for diversity
It divides the data, sorts sub-arrays concurrently, then merges the results
It assigns each element to a separate core for independent sorting
It uses a single core for sorting but multiple cores for data I/O
Why are distributed systems often well-suited for implementing parallel sorting algorithms?
They provide a natural way to divide data and processing across multiple nodes
Network latency is negligible in modern distributed systems
Distributed systems inherently prevent data races in parallel processing
Distributed systems automatically choose the optimal sorting algorithm
What is the space complexity of Timsort in its typical implementation?
O(1) - Constant space
O(n) - Linear space
O(n log n) - Log-linear space
O(log n) - Logarithmic space
What is the primary advantage of using a multiway merge sort over a standard two-way merge sort in external sorting?
Minimized disk I/O operations
Simplified implementation
Reduced memory consumption
Improved time complexity in all cases