Many modern CPUs (f.e. Intel, AMD), GPUs and TPUs include SIMD instructions. Following the motto “don’t pay for what you do not use” i would rather have more cores. Even the ARM based servers (Graviton, Ampere Altra) have NEON vector instructions however those smaller vectors seem to allow them to have more cores (among other trade-offs). Even on consumer chips vector instructions take up a noticeable amount of die space [1]. Throughput over latency. The problem is embarrassingly parallel and can be divided to suit any number of cores. Cores could be register or stack machines. Cores need to have access to >100 MB. Algorithm does not depend on order of communication between cores however inter core communication is required (shared memory or message passing is fine). The workload is branch rich so SIMD/SIMT is not possible. The workload is mostly integer instructions. The workload is memory bound not compute bound. Can you think of any off the shelf or rentable hardware better suited to this workload then many core ARM chips which have some vector instructions i won’t use? [1] https://www.igorslab.de/wp-content/uploads/2021/12/alder_lake_die_2-980×495.png Notes: GPUs tends to have huge vector width. Xeon PHI has huge vectors. https://www.greenarraychips.com/ s cores are to small, have to little RAM and are not available in large machines, Adapteva chips are not taped out on a competitive node, Sunway SW26010 has huge vectors, Graphcore has huge vectors
Story Published at: August 16, 2022 at 12:57PM
Story Published at: August 16, 2022 at 12:57PM