Friction, Morning Routine, RoCe Meta Paper, AWS RNG, Slurm, Rail-Optimize, 800VDC, Phyllo, Approach

Manson AI: You need friction

AI Agent: narrow focus: goal, proof, steps

Bear Grylls’ Morning Routine: Cold (never get used to), bared foot, strength training, 30 minutes

RoCE networks for distributed AI training at scale: I have managed to read the paper ! Although in the AI word, two years is an eternity, I think it is still interesting.

1) Network Topology: backend network only for GPUs (RDMA nics), non-blocking. Frontend network: data ingestion, checkpointing, logging.

Pod = AI zone
 leaf = RTSW, DAC cables, shallow buffer
 spine = CTSW, deep buffers. fiber between leaf-spine.
SuperSpine = ATSW, oversubscribed, connect AI zones

intra-node -> nvlink
ROCE: cpu offloading, ethernet (standard)

collective communication library serves as the sw abstraction between training workloads and the NIC
                                 schedules verbs calls over QP (Queue Pairs)
 parallelism strategy determines collective: allreduce, allgather, alltoall
 choice logical topology:

------------------

2) Routing: work load. low entropy flows (few flows) -> ECMP bad (5-tuple udp: src/dst ip, src/dst port, protocol), burstiness, elephant flows
--

 RTSW uplinks 1:2 under-subscribed! -> expensive (short-term)
 1) QP scaling: use destination QP of Roce packet using the UDF capability in switch to increase entropy -> Enhanced ECMP -> short-term
 2) Central TE controller -> long-term: CP real-time topology end-to-end cluster, 
                                        flow matrix (flow bps) + CSPF (constrained SPF)
                                        write in switches dataplane
                                     DP: TE overrides default BGP routing policy in leaf. Use Exact Match table.
                           Not good with multiple link failures. Doesnt scale 
 3) Flowlet switching: try to improve 1 and 2. hw assistant schema. put packets in different ports in ECMP
     out-of-order: move packets only after 1/2 RTT
     load-aware path assignment: better than TE

------------------
                     
3) Transport: congestion management. Start with DCQCN. packet drops on ACK/NACK can cause prolonged Local ACK timeout (LAT)
--
 Tuning DCQCN not great (strict ECN -> minimize PFC (can lead to head-of-line blocking)

 200G, we stayed with relaxed ECN marking, allowing for buffer build up in the CTSW, while keeping default DCQCN settings.
 400G We proceeded without DCQCN. just PFC for flow control
 re-design collective library: two-stage copy

------------------

4) Operations:
 Change QoS priority of Clear to Send (CTS) messages. In RTSW ASIC, modify dsCP marking for ACK  messages
 Tuning VOQ in CTSW
 obeservability: OOS: out of seq.
                 Link flaps
                 Local ACK timeouts (LAT)
                 PFC watchdog: catch any long-duration PFC pause (>200ms)
                 buffer utilization RTSW
                 reachibility (pings)
                 constant latency monitoring loaded and unloaded (catch regressions)
                 base lines!!!

Perplexity: Hosting Qwen on Blackwel:

AWS RNG – Random Graph Network: The paper is totally out of my space, but the concept looks brutal. With an operations hat, how you troubleshot it? (ping, traceroute, link congestion, data flows patterns, etc)

Slurm: I like the “Slurm vs. Kubernetes”

Slurm Workload Manager (short for Simple Linux Utility for Resource Management) has become a cornerstone of large-scale computing. Originally created in the early 2000s to support large-scale high-performance computing (HPC) environments, Slurm is now widely recognized as the de facto scheduler for HPC clusters. Today, it orchestrates jobs across thousands of servers and GPUs in some of the world’s most advanced computing environments. 

Interview Question: 512 GPU, non-blocking (full bisection) and 2xUFM! I really liked this. I think for once I understand the rail-optimize (fat-tree = leaf-spine). Just break one leaf-spine link, beautiful!!!

800VDC: Next step in electrical infra in DC space.

Phyllo by hand

Approach woman: curiosity and no performance. Practice. Be at peace with uncomfortable and akwardness. Rejection as learning

Genghis Khan

Very interesting book. In Western Work we know a lot about Roman Empire, Alexander the Great, etc. But we dont look very often to Asia. And Gengish Khan and the following Mongol empires shaped much of the world society on the time and until know.

He was focused in meritocracy. As part of his war strategy, it was the elimination of the aristocracy of the conquered land. Very strong focus in integration. They never imposed their culture, they had full freedom for religious belief. They were brutal in war but never cruel. Torture was common in Europe and other empires, for them, it was against their belief.

They had a very clear war strategy: light travel, fast striking. They had few luggage and basic diet. They cleared the path for their horses for advancing and returning. So they destroyed and agriculture over their conquering paths.

It is interesting how Genghis Khan crumbled after his death because he didnt manage this family properly. But still, the new kingdoms kept a balance for a long time.

And something that reminded me to Rome, they had to keep expanding the empire just to keep happy the capital…. They introduced the paper money and women ruled when the men were fighting… and their campaigns lasted years!

Trade was critical for Mongols. They reached Hungary and the Balkans. They trade slaves with Venice and Genoa.

The climate was critical for their success, when the weather became warmer, their pastures were less productive in Mongolia, they had less horses, so the base of their strength was tilted.

They were master of propaganda, to spread fear so they conquering was easier. The empire was based on good army, good propaganda and good administration (just think of the sear size of the empire). They founded public education.

Mongols unified China. I didnt know that, they founded Beijing and started Forbidden City. They created the Chinese identity but they followed the mongol customes behind the curtines.

And the end of it was the Plague. The Plague stopped commerce and people. Without the fluid transit of people and goods, they couldnt keep it together.

And it is really shocking the bad reputation that has been written about Mongols after their incredible empire and success.

Portokalopita

This is a cake I wanted to try after I visited Greece with my friends. I never had the name of the cake, but after that holidays I did dome research and I think I found the name, portokalopita. I’ve got a receipe, but then I did nothing.

So finally, I tried. The taste is similar but the execution is not great. Mine is too runny.

Ingredients

Syrup

  • 1 1/2 cup orange juice (just squeeze it from oranges…. I didnt do it)
  • 1/2 cup water
  • 1 1/2 cup sugar
  • 1 cinnamon stick

Cake

  • 180g phyllo sheets (I think I need double)
  • 4 eggs
  • 1/2 cup sugar
  • 1 cup olive oil
  • 2 tsp vanilla extract
  • 200g yogurt
  • 2 tsp baking powder
  • orange zest from 2 medium oragnes.

Instructions

  • Preheat oven at 120C.
  • Place the phillo sheets on a tray. Cut them in slices so it easy to fit. Get them in the oven, until hard and crunchy. Turn them over when needed. Remove from oven and let is cool down
  • Make the syrup. In a small sauce pan mix the orange juice, water, sugar and cinnamon. Bring to boil, then reduce heat and simmer for 7 minutes. Set aside to cool
  • Set oven at 180C
  • In a large bowl add the eggs, sugar, oil and vanilla. Beat until frothy.
  • In a smaller bowl, mix the yogurt and baking powderand set aside for 2-3 minutes. Add the yogurt to the egg mix
  • Add the orange zest and crumbled phyllo into the egg mix gradually.
  • Grease an oven dish, and pour the cake mix. Even it out
  • Bake for 35-40 minutes or until the top is dark golden.
  • Remove from oven and make a few slashes with a knife and immediately drizzle the syrup slowly.
  • Let the cake sit for 2-3 hours.
  • Keep in the fridge for 1h before serving.

The result:

I think the syrup is too much and I my cake mix needed more phyllo

I will try this next time

Cowboy commandments

This is a tiny book I found in a toilet during holidays a couple of years ago. I bought a while ago and can’t find it anymore, this is one from the same author.

Fall in love (only when you can’t help it)

Dont forget that there are always consequences

When you get bucked off, get back on

Skin you own deer

If it breaks, fix it.

Never cut, what you can untie

If you make a mess, clean it up

Talk less and say more

Never betray a Trust

Make apologies, not excuses

Don’t get even, get over it

Dont waste good money on cheap boots

Help what you can; endure what you can’t

Do it Today, tomorrow is promised to no one.

Never miss a chance to Dance

Act right, behave yourself, do your job, and things will turn out all right

Take care of your knees; you are going to need them all your life

Do your best, that’s all.

AWS SRD, MRC, OSPF, Manson, JEPA, Ultra Ethernet, NCCL, Calypso, GTC2026, ChiNOG12, TRAP, SWAP

AWS owns its networking stacks:

NVIDIA releases MRC: Multipath Reliable Connection – I assume they need to do something to compete with UltraEthernet

OSPF shutdown router: I would have test this. In my opinion, the key thing is although the router LSA1 is in the neighbors LSDB, SPF is ignoring it. Still quite interesting, you always learn/re-learn something.

Mark Manson – 10y therapy: I like the very beginning and then when Chris says you have to through shit to understand those rules and appreciate them.

JEPA (Joint Embedding Predictive Architecture): LeCun against LLM. Proof.

Microsoft Ultra Ethernet: The first is a bit more interesting as you can see the flows

NCCL cheat codes

Tech Field Day 2025 – Arista AI

GTC2026 CoreRabbit.

GTC2026 Summary

ChiNOG12 Petr Lapukhow

AWS SRD: Why AWS doesnt use infiniband

Calypso Submarine Cable: Interesting to visualize your global network infrastructure

TRAP: How to remember/learn

  • Test: desirable difficulties. Testing helps to retain.
  • Retain: reviewing timing (RemNote)
  • Associate: with something you already know
  • Perform: Use it, build.

Linux SWAP: Because I have many open tabs, sometimes I kill my laptop, it is a bit old but I have 8GB RAM.

When Chrome spikes memory, the kernel may struggle to reclaim fast enough, leading to:

  • Many processes waking → “procs” spike
  • Heavy disk I/O → swap/page reclaim
  • System stalls → direct reclaim + possible OOM pressure

As I use ZFS, the recomendation is not create extra SWAP in there. So create just 1G from my main volume:

Create logical volume:

sudo lvcreate -L 1G -n swap athens-vg

Format + enable:

sudo mkswap /dev/athens-vg/swap
sudo swapon /dev/athens-vg/swap

Persist:

echo '/dev/athens-vg/swap none swap sw 0 0' | sudo tee -a /etc/fstab

result:


# swapon --show
NAME       TYPE      SIZE USED PRIO
/dev/dm-2  partition 976M   0B   -1
/dev/zram0 partition 3.8G 2.7G  100
# 
# lvs
  LV      VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home    athens-vg -wi-ao----  22.00g                                                    
  root    athens-vg -wi-ao---- <27.94g                                                    
  storage athens-vg -wi-ao---- 186.00g                                                    
  swap_1  athens-vg -wi-ao---- 976.00m                                                    
# 

I installed too earlyoom (this avoids full system lockups by killing memory hogs earlier) and zram-tools (use compressed RAM as swap)

root@athens:/boot# dpkg -l | grep zram
ii zram-tools 0.3.7-1 all utilities for working with zram
root@athens:/boot# dpkg -l | grep earlyoom
ii earlyoom 1.9.0-1 amd64 Early OOM Daemon
# systemctl status earlyoom
● earlyoom.service - Early OOM Daemon
     Loaded: loaded (/usr/lib/systemd/system/earlyoom.service; enabled; preset: enabled)
     Active: active (running) since Thu 2026-04-30 08:19:50 BST; 1 week 3 days ago
 Invocation: e3dc68822a604ac6befe12fa2f44b650
       Docs: man:earlyoom(1)
             https://github.com/rfjakob/earlyoom
   Main PID: 1176 (earlyoom)
      Tasks: 1 (limit: 10)
     Memory: 628K (max: 50M, available: 49.3M, peak: 3M)
        CPU: 10.274s
     CGroup: /system.slice/earlyoom.service
             └─1176 /usr/bin/earlyoom -r 3600

May 08 09:06:59 athens earlyoom[1176]: mem avail:  4112 of  6185 MiB (66.48%), swap free: 3678 of 4899 MiB (75.09>
May 08 10:06:59 athens earlyoom[1176]: mem avail:  3741 of  6029 MiB (62.06%), swap free: 3996 of 4899 MiB (81.57>
May 08 15:21:37 athens earlyoom[1176]: mem avail:  3821 of  6005 MiB (63.63%), swap free: 3546 of 4899 MiB (72.39>
May 08 22:52:18 athens earlyoom[1176]: mem avail:  3909 of  5998 MiB (65.17%), swap free: 3719 of 4899 MiB (75.92>
May 09 09:46:15 athens earlyoom[1176]: mem avail:  3855 of  6055 MiB (63.66%), swap free: 3994 of 4899 MiB (81.53>
May 09 17:39:36 athens earlyoom[1176]: mem avail:  3211 of  5856 MiB (54.83%), swap free: 4054 of 4899 MiB (82.75>
May 09 18:39:37 athens earlyoom[1176]: mem avail:  2400 of  5291 MiB (45.37%), swap free: 3890 of 4899 MiB (79.41>
May 09 21:15:33 athens earlyoom[1176]: mem avail:  1925 of  4791 MiB (40.18%), swap free: 4398 of 4899 MiB (89.79>
May 09 22:15:34 athens earlyoom[1176]: mem avail:  2899 of  6040 MiB (48.00%), swap free: 4074 of 4899 MiB (83.17>
May 10 09:00:46 athens earlyoom[1176]: mem avail:  2563 of  5991 MiB (42.79%), swap free: 4315 of 4899 MiB (88.08>
# 
# sudo systemctl status zramswap
● zramswap.service - Linux zramswap setup
     Loaded: loaded (/usr/lib/systemd/system/zramswap.service; enabled; preset: enabled)
     Active: active (exited) since Thu 2026-04-30 08:19:50 BST; 1 week 3 days ago
 Invocation: 9f4e1a2534c8409292782da4512fbae9
       Docs: man:zramswap(8)
   Main PID: 1198 (code=exited, status=0/SUCCESS)
   Mem peak: 3.8M
        CPU: 58ms

Apr 30 08:19:50 athens systemd[1]: Starting zramswap.service - Linux zramswap setup...
Apr 30 08:19:50 athens zramswap[1248]: Setting up swapspace version 1, size = 3.8 GiB (4113920000 bytes)
Apr 30 08:19:50 athens zramswap[1248]: no label, UUID=0728b3a9-007b-4d71-8255-009f509bca63
Apr 30 08:19:50 athens systemd[1]: Finished zramswap.service - Linux zramswap setup.
# 
# zramctl
NAME       ALGORITHM DISKSIZE   DATA  COMPR  TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4           3.8G 886.9M 264.6M 449.4M         [SWAP]
# 

Scale-up vs Scale-out: Still keep forgetting the diff

The Man Who Solved The Market

Interesting book about the “start” of quant trading by Jim Simons. Funny he was a strong smoker and quick sharp and active till the end, great Mathematician and was code breaker! I didnt know anything about Renaissance. In part, it reminds me the book from Edward O. Thorp. It was weird that with so much tech and algorithms developed, in key moments, he didnt trust them. Reminds me to Nassim Taleb and the dark swans. Still he was never crashed and always made money. I always feel uneasy with this subject. Is it moral? The thing that surprise the most was the connections with Donald Trump by members of his company and Brexit election. But he supports and finance Democrats.

Nepal, Hobby, Alibaba, Sewing, HRT, Potato Salad, SCION, Jets, Quantum, Octopus, Peel, Virginia, Fairwater, Microfluidics, Energy Storage, Infinibad HPC, Lose a nuke, ECH, NCCL for Network Engineers, RoCE Lossless, Nanog93 Networking AI

Ridgeline VII Nepal: Totally crazy, the views, the descent, the speed…

In the era of AI, take a hobby

Tofu + greens curry

Alibaba Crypto AI Agent

How a sewing machine works

HRT GTC2026 Modern Resource Responsible AI Factory: Interesting all the way to save watts. The emphasis in copper, water cooling.

German Potato Salad

SCION Routing: I didnt know that it was use in production

Jet Engines: crazy, amazing

Quantum companies I didnt know about:

qant.com (Germany)

ionq (USA)

Octopus can rewire RNA: Incredible animal.

Eat the peel: After reading this, I started eating the kiwi’s peel. Not going back. I would like to do something with banana peels and oranges. But dont want to use a tone of sugar neither. I will try my spicy banana bread next time with peel. The orange peels you can keep it as aromatic when dried.

Virginia Air Space Museum: I was there 3 years ago I think. Amazing. You have a blackbird SR71, Concorde, Space Shuttle, etc. Totally worth visiting.

Fairwater: This is already old news in the AI datacenter world. But still interesting at high level.

Microfluidics:  “Tiny channels are etched directly on the back of the silicon chip, creating grooves that allow cooling liquid to flow directly onto the chip and more efficiently remove heat”. 

Liquid air energy storage:

The process works in three stages. First, air is taken in from the surroundings and cleaned. Second, the air is repeatedly compressed until it is at very high pressure. Third, the air is cooled until it becomes liquid, using a multi-stream heat exchanger: a device that includes multiple channels and tubes carrying substances at different temperatures, allowing heat to be transferred between them in a controlled way.

“The energy that we’re pulling from the grid is powering this charging process,” says Cetegen.

When the grid needs extra energy, the liquid air is put to work. It is pumped out of storage and evaporated, becoming a gas again. It is then used to drive turbines, generating electricity for the grid. Afterwards, the air is released back into the atmosphere.

Infinibad HPC: This is a good intro for infiniband, it helped me to refresh the training I did two years ago

Designing HPC Cluster Infinibad: It seems more practicas as you have the different type of deployments based on required nodes. Avoid credit loops.

Lose a nuke: I need to visit one day Palomares

Encrypted Client Hello: ECH: I am quite naive, I would think having most browsers compatible, full support would be easy.

NCCL for Network Engineers: Good explanation. I would like to get more real/practical experience on these things.

RoCE Lossless: Good explanation of ECN and PFC with Arista example.

Nanog93 Networking AI: Interesting overview. I still missing the low level details.

Supercommunicators

Very good ebook.

Three types of conversation

  • Do you want to be helped? – What’s this really about? Decision Making mindset. Lean into data and reasoning

What does everyone want? How will we make choices together?

How to figure out what this is really about? First, recognize that this is a negotiation. Next determine what does everyone want? Then how will we make choices together?

  • Do you want to be hugged? – How do we feel? Emotional mindset. Lean into stories and compassion

Ask questions -> creates vulnerability -> triggers emotional contagion -> elicits connection -> prompt more questions ….

In a conflict, we learn why are fighting by discussing emotions.

In a conflict, we draw out emotions by proving we are listening.

In a conflict, we prove we are listening by looping for understanding: ask questions, summarize what you heard, ask if you got it right.

In a conflict, focus in controlling:

  1. Yourself
  2. Your environment
  3. The conflict boundaries

Check mood and energy!

  • Do you want to be heard? – Who are we? Social mindset

We all posses social identities that shape how we speak and hear.

how to talk about who we are:

  1. Draw out multiple identities
  2. Put everyone on equal footing
  3. Create a new group by building on existing identities

Be aware of this loop:

Telling someone they belong to a group they abhor -> triggers identity threat -> causes defensiveness -> prompts counter-attacks -> leads to telling someone they belong to a group they abhor -> loop

Before discussion:

  1. What do you hope to accomplish?
  2. How will this conversation start?
  3. What obstacles might emerge?
  4. When those obstacles appear, what’s the plan?
  5. What are the benefits of this dialogue?

Rules:

  1. Pay attention to what kind of conversation is occurring (above)
  2. Share your goals, and ask what others are seeking: prepare for the conversation. Ask many questions!
  3. Ask about others’ feelings, and share your own
  4. Explore if identities are important to this discussion.

The examples of “The Big Bang Theory” (how do you hear emotions no one says aloud?,) the court case, guns ownership, anti-vax, COVID, football team, netflix (no-rules), etc are really good.

The Algebra of Wealth

I read this ebook as I have watched this video some time ago.

This is his definition of Wealth

Wealth = Focus + (Stoicism * Time * Diversification)

Stoicism

This is the personality/philosophical part. You need to define and build your character. Take into account that luck is important (The world only pays attention to the outliers..) Exercise is important, make more decisions, create interdependency (the people that are around you and will make you better). Difficulties will be always there, go through them with enthusiasm (W. Churchill)

Focus

Focus in your passion ->Talent! Leave the passion for the weekend (baking, climbing, etc) And then the question, what is your talent? Myers-Briggs test, Galupo Chifton.

(I am INTJ it seems)

Be loyal to people not companies.

The importance: Real State, Professional Jobs (plumber, electrician, etc), prune + invest your hobbies

Time

It is limited, you can’t buy more. Make it count. Focus in compound vs inflation. You need to be ruthless in your time management. Income, spend, invest. If measured -> managed. Roll with the punches..

Diversification

Focus your time to maximize your current income. Diversify your investments to maximize your long-term wealth.

Long-term active investment doesnt beat the market.

Books:

7 Habits of Highly Effective People – Covey

Designing your life – Bill Burnett

How to quit – Annie Duke

A random walk down Walk Street

Rice paper rolls

It is something I wanted to do. I saw this video some time ago. Finally, I gave it a go, but I mainly used this video.

Ingredients

  • Carrot sticks
  • Zuchini sticks
  • Cucumber sticks
  • Chopped onion
  • Grated cheese (I didnt have noddles)
  • Rice paper
  • Water
  • Seaweed paper
  • Sauce: mustard, soja sauce, pinch paprika
  • Olive oil to fry

Process

  • Wet one rice paper
  • Add a bit of each vegetable and cheese. Roll and put in a plate
  • Add seaweed paper and vegetables. Roll and put in a plate
  • In a medium-hot pan, fry some rolls if you want
  • Prepare the sauce mix
  • Eat!

Mine dont look very pretty but they are very tasty!

And it was faster than I thought!