top of page
Writer's pictureDavid Mitlyng

Weekly Takeaways-October 28, 2024

Updated: Nov 1

The Other AI Elephant in the Room

Of all the talk about the benefits of Artificial Intelligence (AI), there is an ugly downside. Everybody is mostly concerned about AI as a humanity killer: gaining sentience and running amok, being used to develop a super virus, or commanding killer drone swarms.But there is another elephant in the room - and it is really hungry. AI has a nearly insatiable appetite for datamanpower, energy, and water.For the most part, the AI industry has responded with a brute force solution: building more data centers that are powered by nuclear power plants. But there is a more elegant solution: better timing synchronization that allows distributed databases to efficiently send and receive data and optimizes power-hungry AI.It may seem non-intuitive, but timing precision reduces surge events in databases, eliminates centralized nodes, and reduces the effort to work with the database. Meta and NVIDIA found that a synchronization improvement of 80x made the distributed database run 3x faster - "an incredible performance boost."And better timing helps address a new problem created by more powerful AI models: power surges.AI requires a lot of processors working together, that, individually, have small power surges. To ensure that these surges aren't aligned, these parallel processors need to de-sync via sync (see below).AI (or, actually, our demand for better AI) is an insatiable beast. Timing can help keep it trim.Last Week's Theme: Any Port in a Storm


Industry News


Conferences


The More You Know...

"Accurate timing is a new compute glue" for data centers being developed for AI.Over the past decade there has been a trend of data centers moving "from centralized systems to more decentralized and now more distributed systems" to improve resiliency, scalability and efficiency.But this also creates new issues that precise synchronization can help mitigate:

  • Congestion reduction - complex deep learning models typically use "all to all" collective operations, which is "one of the most expensive workloads in large data centers." Creating accurate transmission windows and communication sequences can "put some order to the chaos," reducing data congestion that leads to packet drop and retransmission.

  • Data consistency - data sequencing, high frequency telemetry and performance analysis requires accurate time stamping for consistency, ordering, debugging and monitoring.

  • Efficiency - storage and databases use accurate timing to accelerate cache coherency and reduce database locking time. Optimization through better synchronization lowers the Window of Uncertainty = less data traffic = server utilization is higher = reduced costs.

  • Latency reduction - precision timing streamlines data processing across infrastructure sites.

  • Security - synchronization enhances security measures, ideally leveraging a PNT parent signal that cannot be spoofed or jammed (unlike GNSS).

With the advent of AI-centric data centers, there is an additional need for precise synchronization.Processing the data in Large Language Models (LLMs) as fast as possible requires horizontal expansion with a lot of machines working together. These machines and processes tend to run differently, which creates inefficiencies and tail latency that engineers strive to remove.But this optimization creates other problems.Individually, these machines have small power spikes, that, if aligned, can lead to catastrophe. It wasn't a problem until recently, but as the size of LLMs have grown, so have the number of machines that "require a lot of parallelization" causing power "spikes showing up close to each other."So data centers need to be synchronized to de-sync these potential catastrophic power spikes, thereby allowing them to run more efficiently and consume less power and water.

 

 

9 views0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page