2025-08-07T13:53:50Z INFO 47449 [root]: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework=XLA /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb --output /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.neff --target=trn1 --auto-cast=none --model-type=transformer '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma' --lnc=1 -O1 '--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true' --logfile=/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/log-neuron-cc.txt --verbose=35 2025-08-07T13:53:50Z INFO 47449 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.12 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4 Running on AMI ami-040348201d80b58ad Running in region usw2-az4 2025-08-07T13:53:50Z INFO 47514 [root]: XLA detected 2025-08-07T13:53:50Z INFO 47514 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper 2025-08-07T13:53:50Z INFO 47514 [root]: Intermediate files stored in /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5, output in /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0 2025-08-07T13:53:50Z INFO 47514 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 2025-08-07T13:53:50Z INFO 47514 [pipeline.Pipeline.0]: Processing input #0 2025-08-07T13:53:50Z INFO 47514 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 2025-08-07T13:53:50Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 2025-08-07T13:53:50Z INFO 47514 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 2025-08-07T13:53:50Z INFO 47514 [job.HLOToTensorizer.0]: Processing input #0 2025-08-07T13:53:50Z INFO 47514 [job.HLOToTensorizer.0]: IR signature: d89b9e073981a0b1b7d0bbd0a24f147e9df13c5706d9d6be9971b857124c9496 for model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb 2025-08-07T13:53:50Z INFO 47514 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --modular-flow-mac-threshold=10 --verify-hlo=true --native-to-custom-softmax --partitioner-opts='--transformer' 2025-08-07T13:53:51Z INFO 47514 [job.HLOToTensorizer.0]: DEBUG: needsModular_PreSplit? Yes. macCnt 447256207360 threshold 4398046511104 num non-trivial Ops 3875 INFO: Number of Native SoftmaxDx's detected and replaced: 0 INFO: Number of Native Softmax's detected and replaced: 38 Pre-Partition Pre-Opt Histogram: total HLO instructions: 10629 reshape 2091 19.67% ################################################################ broadcast 1735 16.32% ##################################################### convert 1281 12.05% ####################################### transpose 1268 11.93% ###################################### constant 819 7.71% ######################### parameter 475 4.47% ############## slice 445 4.19% ############# add 365 3.43% ########### multiply 328 3.09% ########## dot 326 3.07% ######### get-tuple-element 295 2.78% ######### select 255 2.40% ####### compare 222 2.09% ###### call 186 1.75% ##### concatenate 148 1.39% #### tuple 73 0.69% ## scatter 73 0.69% ## negate 72 0.68% ## all-reduce 72 0.68% ## divide 39 0.37% # custom-call 38 0.36% # iota 7 0.07% gather 6 0.06% all-gather 3 0.03% reduce 3 0.03% sine 1 0.01% cosine 1 0.01% power 1 0.01% maximum 1 0.01% INFO: IoStatistics: total inputs: 475 INFO: IoStatistics: total outputs: 73 INFO: IoStatistics: total passthrough tensors: 0 INFO: IoStatistics: total outputs read from: 0 INFO: IoStatistics: total redundant outputs: 0 INFO: IoStatistics: total ifmap size (KiB): 8072795 INFO: IoStatistics: total ofmap size (KiB): 73728 INFO: IoStatistics: total must-alias size (KiB): 73728 INFO: IoStatistics: total may-alias size (KiB): 0 INFO: HloMacCount has found 447256199168 INFO: Traffic has found 8343919789 INFO: AIF 107.21 Pre-Partition Post-Op Histogram: total HLO instructions: 6623 reshape 1424 21.50% ################################################################ convert 992 14.98% ############################################ transpose 941 14.21% ########################################## constant 523 7.90% ####################### parameter 475 7.17% ##################### broadcast 410 6.19% ################## dot 325 4.91% ############## custom-call 223 3.37% ########## multiply 219 3.31% ######### add 219 3.31% ######### get-tuple-element 151 2.28% ###### slice 147 2.22% ###### concatenate 146 2.20% ###### select 110 1.66% #### compare 76 1.15% ### scatter 73 1.10% ### negate 72 1.09% ### all-reduce 72 1.09% ### gather 6 0.09% iota 5 0.08% all-gather 3 0.05% reduce 3 0.05% pad 2 0.03% sine 1 0.02% divide 1 0.02% tuple 1 0.02% maximum 1 0.02% rng 1 0.02% cosine 1 0.02% INFO: Found memory bound graph DEBUG: needsModular_PreSplit? Yes. macCnt 447256199168 threshold 4398046511104 num non-trivial Ops 2702 DEBUG: transformer model INFO: Partitioner configs:ModularFlow BO LBL SA ConcatGraphs: 1 MaxDisj:2 MaxSep:4 LPM:1 INFO: Markers NOT detected Potential split-points stats: #CC 75 #AR 72 #AG 3 #BN 0 nClamp 0 DEBUG: needsModular_SplitFinder? Yes. ModuleSplitter initial partitioning... #parts 75 ModuleSplitter initial partitioning... Done. INFO: Num of unique Module Definitions: 6 DEBUG: DefMap: 0 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 73 74 New disjoint wave: start 2 len 70 NumReps: 35 macs 434529894400 INFO: Attempting to identify and split optimizer at end First non-zero-mac/used part from the end is 73 Not enough zero-mac parts. skip INFO: Optimized 0 all-reduce split instructions INFO: Number of splitPoints: 37 ModuleSplitter initial partitioning... #parts 37 ModuleSplitter initial partitioning... Done. Remat: gather-iota 0 matches, 0 ops rematted INFO: Alias legality verification of partitions PASSED. INFO: No transposable_weight_idx attrs found INFO: Peak intermediate memory demand is at Partition 1. Num live intermediates at peak is 9 and memory usage is 4276228 bytes. INFO: Please refer to LiveRangeReport_PostHloPart.txt for detailed intermediate lifetime info. DEBUG: DefMap: 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 36 Wrote HLO netlist to hlo_netlist.json Wrote graph partitions in debug_info_hlo_partitions.json Processing partition 0 INFO: Number of Native SoftmaxDx's detected and replaced: 0 INFO: Number of Native Softmax's detected and replaced: 0 Replaced 0 dropout sequences with OffloadedDropout INFO: HloMacCount has found 2751463424 INFO: Traffic has found 671184678 INFO: AIF 8.20 HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert cosine custom-call dot gather get-tuple-element iota multiply negate parameter reshape scatter select sine slice transpose tuple Invoking RemoveOptimizationBarriers pass Processing partition 1 INFO: Number of Native SoftmaxDx's detected and replaced: 0 INFO: Number of Native Softmax's detected and replaced: 0 Replaced 0 dropout sequences with OffloadedDropout INFO: HloMacCount has found 12415139840 INFO: Traffic has found 201429540 INFO: AIF 123.27 HLO Ops used in computation: add all-reduce broadcast compare concatenate constant convert custom-call dot get-tuple-element multiply negate parameter reshape scatter select slice transpose tuple Invoking RemoveOptimizationBarriers pass Processing partition 2 INFO: Number of Native SoftmaxDx's detected and replaced: 0 INFO: Number of Native Softmax's detected and replaced: 0 Replaced 0 dropout sequences with OffloadedDropout INFO: HloMacCount has found 9974841344 INFO: Traffic has found 776497739 INFO: AIF 25.69 HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert custom-call divide dot gather get-tuple-element iota maximum multiply pad parameter reduce reshape rng scatter select slice transpose tuple Invoking RemoveOptimizationBarriers pass 2025-08-07T13:53:51Z INFO 47514 [job.HLOToTensorizer.0]: IR signature: e64db8b38636f1e1de70247cb8ef599fe398def409ef42d3756559fd5fc4b0dd for sg0000/HLOToTensorizer 2025-08-07T13:53:51Z INFO 47514 [job.HLOToTensorizer.0]: IR signature: a31a130bd31092d77c4f2c3afb2624ee27b06f887e825fbab973844c57b282ba for sg0001/HLOToTensorizer 2025-08-07T13:53:51Z INFO 47514 [job.HLOToTensorizer.0]: IR signature: 5686d00929ecdf5c48dcae30b42e946122da025387070f32d1ea0a1c34518e99 for sg0002/HLOToTensorizer 2025-08-07T13:53:51Z INFO 47514 [job.HLOToTensorizer.0]: Job #0 finished 2025-08-07T13:53:51Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 2025-08-07T13:53:51Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.Frontend.0 2025-08-07T13:53:51Z INFO 47514 [job.Frontend.0]: Job Frontend len(in_states) 1 2025-08-07T13:53:51Z INFO 47514 [job.Frontend.0]: Processing input #0 2025-08-07T13:53:51Z INFO 47514 [job.Frontend.0]: Start model loading 2025-08-07T13:53:51Z INFO 47514 [job.Frontend.0]: Start tensorization 2025-08-07T13:53:51Z INFO 47514 [job.Frontend.0]: Num jobs: 128 2025-08-07T13:53:51Z USER 47514 [root/Tensorizer/Tensorizer]: Running Tensorizer 2025-08-07T13:53:51Z INFO 47514 [Tensorizer]: Max workers: 3 2025-08-07T13:53:51Z INFO 48703 [Tensorizer]: Building model from Penguin script "penguin.py.000000"... 2025-08-07T13:53:51Z INFO 48704 [Tensorizer]: Building model from Penguin script "penguin.py.000001"... 2025-08-07T13:53:51Z INFO 48705 [Tensorizer]: Building model from Penguin script "penguin.py.000002"... 2025-08-07T13:53:51Z INFO 48704 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op 2025-08-07T13:53:51Z INFO 48703 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/TransformConvOp]: Running TransformConvOp 2025-08-07T13:53:51Z INFO 48704 [sg0001/Tensorizer/TransformConvOp]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TransformConvOp]: Running TransformConvOp 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TransformConvOp]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.014 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LowerTensorOp]: Running LowerTensorOp 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LowerTensorOp]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.029 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.012 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.008 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.037 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TensorOpSimplifier]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.004 seconds 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates 2025-08-07T13:53:51Z INFO 48703 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.006 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CanonicalizeIR]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeCCOpLayout]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.002 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AffinePredicateResolution]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm 2025-08-07T13:53:51Z INFO 48705 [sg0002/Tensorizer/ExpandBatchNorm]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TensorOpTransform]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.020 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LateLowerTensorOp]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.005 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.026 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MemcpyElimination]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.027 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.018 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.016 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Rematerialization]: Running Rematerialization 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Rematerialization]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.030 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.011 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.010 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.009 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.037 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.019 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.008 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/SimplifySlice]: Running SimplifySlice 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/SimplifySlice]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.006 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.109 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/PadElimination]: Running PadElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/PadElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.030 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Rematerialization]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.004 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.014 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.008 seconds 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.004 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.004 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.006 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Recompute]: Running Recompute 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Recompute]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Recompute]: Recompute finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48705 [Tensorizer]: After optimization: 38 statements 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MutateDataType]: Running MutateDataType 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MutateDataType]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: Running TileCCOps 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `All gather output tensor check failed` 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: in float32 (512,) %'all_gather.2' = AllGatherOp-149 AllGather_add(float32 (256,) %'add.11', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.8855 | hlo_id: 101 | , id = 149 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=2048 is not above min_allgather_tile_size_in_bytes=8388608` 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: in uint32 (512,) %'all_gather.3' = AllGatherOp-165 AllGather_add(uint32 (256,) %'add.12', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.8990 | hlo_id: 110 | , id = 165 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/LowerTensorOp]: Running LowerTensorOp 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/TileCCOps]: TileCCOps finished after 0.007 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/LowerTensorOp]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.012 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.031 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/TensorOpSimplifier]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.009 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.005 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp 2025-08-07T13:53:52Z INFO 48705 [sg0002/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.007 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/CanonicalizeIR]: Finished (changed=True) 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout 2025-08-07T13:53:52Z INFO 48704 [sg0001/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AffinePredicateResolution]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.015 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ExpandBatchNorm]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.038 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TensorOpTransform]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.007 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.034 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LateLowerTensorOp]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.008 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.044 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.009 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/ResolveAccessConflict]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LocalLayoutOpt]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.012 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/MemcpyElimination]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.104 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutPreprocessing]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/PadElimination]: Running PadElimination 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.006 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.029 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Rematerialization]: Running Rematerialization 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Rematerialization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Rematerialization]: Rematerialization finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.012 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.027 seconds 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.042 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TCTransform]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.006 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.031 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Recompute]: Running Recompute 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Recompute]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [Tensorizer]: After optimization: 26 statements 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TileCCOps]: Running TileCCOps 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=1048576 is not above min_allgather_tile_size_in_bytes=8388608` 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TileCCOps]: in bfloat16 (4096, 128) %'all_gather.1' = AllGatherOp-46 AllGather_add(bfloat16 (2048, 128) %'transpose.1', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((4096, 128), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.47 | hlo_id: 19 | , id = 46 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TileCCOps]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/SimplifySlice]: Running SimplifySlice 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/SimplifySlice]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/TileCCOps]: TileCCOps finished after 0.006 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=True) 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:53Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.011 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.008 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp 2025-08-07T13:53:53Z INFO 48703 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) 2025-08-07T13:53:53Z INFO 48705 [sg0002/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.006 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.560 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InferNonlocalTensors]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.016 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/ParAxesAnnotation]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/PadElimination]: Running PadElimination 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/PadElimination]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.010 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.015 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.008 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Running TCTransform 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.120 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.035 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Recompute]: Running Recompute 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Recompute]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Recompute]: Recompute finished after 0.000 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [Tensorizer]: After optimization: 25 statements 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MutateDataType]: Running MutateDataType 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MutateDataType]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/MutateDataType]: MutateDataType finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Running Simplifier 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TileCCOps]: Running TileCCOps 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TileCCOps]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/TileCCOps]: TileCCOps finished after 0.006 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.007 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.249 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.013 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.054 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertLocalTransposes]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.011 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.062 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.012 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ResolveAccessConflict]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.007 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.568 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LocalLayoutOpt]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.020 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.010 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PGTiling]: Running PGTiling 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 585 of IO tensor {'CrossPassTensor': ''}bfloat16 %input471|NC|(128, 32) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 586 of IO tensor {'CrossPassTensor': ''}bfloat16 %input472|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 587 of IO tensor {'CrossPassTensor': ''}bfloat16 %input470|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 588 of IO tensor {'CrossPassTensor': ''}bfloat16 %input469(32, 2, 128, 24, 128) is not sorted, index list (w/ AG ids): [(10, 'AG54'), (15, 'AG52'), (11, 'AG53')] 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 589 of IO tensor {'CrossPassTensor': ''}bfloat16 %input474|NC|(128, 32) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 540 of IO tensor {'CrossPassTensor': ''}bfloat16 %input473|NC|(75968, 32, 128) is not sorted, index list (w/ AG ids): [(14, 'AG59'), (13, 'AG50')] 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Running Delinearization 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.019 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutPreprocessing]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.004 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PComputeCutting]: Running PComputeCutting 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PComputeCutting]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.005 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/BFComputeCutting]: Running BFComputeCutting 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/BFComputeCutting]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.002 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LoopSplitting]: Running LoopSplitting 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LoopSplitting]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MacroGeneration]: Running MacroGeneration 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.035 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MacroGeneration]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.008 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.119 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.027 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/PGTiling]: PGTiling finished after 0.162 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferNonlocalTensors]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertIOTransposes]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.031 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.015 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.003 seconds 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose 2025-08-07T13:53:54Z INFO 48705 [sg0002/Tensorizer/DramToDramTranspose]: Finished (changed=False) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.258 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.008 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.391 seconds 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ParAxesAnnotation]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=True) 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.086 seconds 2025-08-07T13:53:54Z INFO 48704 [sg0001/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertLocalTransposes]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.005 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.008 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.016 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 1.630 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingProfiler]: Running TilingProfiler 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 19008: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 19008: matmul_128x128x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 384: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 48: simd128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: simd128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 2: indirect_load128x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: reduce128x1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: reduce128x1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingBottleneck]: 1: indirect_load32x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingProfiler]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PGTiling]: Running PGTiling 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.117 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 702 of IO tensor {'CrossPassTensor': ''}bfloat16 %input79|NC|(128, 32) is not sorted, index list (w/ AG ids): [(16, 'AG82'), (15, 'AG83')] 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 703 of IO tensor {'CrossPassTensor': ''}bfloat16 %input83(4, 4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(16, 'AG82'), (15, 'AG83')] 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 704 of IO tensor {'CrossPassTensor': ''}bfloat16 %input81(4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(16, 'AG82'), (15, 'AG83')] 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 705 of IO tensor {'CrossPassTensor': ''}bfloat16 %input78|NHWC|(4, 128, 32, 128) is not sorted, index list (w/ AG ids): [(16, 'AG82'), (15, 'AG83')] 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 459 of IO tensor {'CrossPassTensor': ''}bfloat16 %input77(16, 128, 4, 4, 2, 128) is not sorted, index list (w/ AG ids): [(9, 'AG94'), (6, 'AG90'), (7, 'AG89'), (11, 'AG93'), (13, 'AG92')] 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 698 of IO tensor non_local bfloat16 %all_gather.1(32, 128, 128) is not sorted, index list (w/ AG ids): [(16, 'AG82'), (8, 'AG84')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.041 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.007 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PGTiling]: Running PGTiling 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 669 of IO tensor {'CrossPassTensor': ''}bfloat16 %input86|NC|(128, 32) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 670 of IO tensor {'CrossPassTensor': ''}bfloat16 %input87|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 671 of IO tensor {'CrossPassTensor': ''}bfloat16 %input85|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 672 of IO tensor {'CrossPassTensor': ''}bfloat16 %input84(32, 2, 128, 24, 128) is not sorted, index list (w/ AG ids): [(7, 'AG90'), (14, 'AG88'), (8, 'AG89')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 673 of IO tensor {'CrossPassTensor': ''}bfloat16 %input90|NC|(128, 32) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 674 of IO tensor {'CrossPassTensor': ''}bfloat16 %input94(4, 4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 679 of IO tensor {'CrossPassTensor': ''}bfloat16 %input92(4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 680 of IO tensor {'CrossPassTensor': ''}bfloat16 %input89|NHWC|(4, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG85'), (12, 'AG86')] 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 447 of IO tensor {'CrossPassTensor': ''}bfloat16 %input88(16, 128, 4, 4, 2, 128) is not sorted, index list (w/ AG ids): [(2, 'AG100'), (0, 'AG96'), (1, 'AG95'), (3, 'AG99'), (4, 'AG98')] 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferNeuronTensor]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.035 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.020 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.004 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.102 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.260 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.010 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.004 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DataLocalityOpt]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.025 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 1.214 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 512: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 128: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 32: simd32x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 32: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 16: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: indirect_load128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: rmsnorm128x512x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x256 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x256 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x64 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x64 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingBottleneck]: 4: simd128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.010 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.009 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PComputeCutting]: Running PComputeCutting 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PComputeCutting]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.066 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 19008: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 19008: matmul_128x128x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 594: transpose_128x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 384: dma128x512 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 384: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 48: dma128x4096 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 48: dma128x4096 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 48: simd128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 4: dma128x1024 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: indirect_load128x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1: dma128x32 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 1: simd1x1 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DMATilingProfiler]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.034 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.005 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.007 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.001 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.011 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.004 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.008 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.008 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.006 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/BFComputeCutting]: Running BFComputeCutting 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/BFComputeCutting]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LoopSplitting]: Running LoopSplitting 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LoopSplitting]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MacroGeneration]: Running MacroGeneration 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.004 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteWeights]: Running RewriteWeights 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteWeights]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/ReshapeWeights]: Running ReshapeWeights 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/ReshapeWeights]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.002 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.101 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: dma32x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma32x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: simd32x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: matmul_128x128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: dma128x4096 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: matmul_128x128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 4: indirect_load128x512 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 4: rmsnorm128x512x128 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 4: simd128x256 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MacroGeneration]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.010 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferInitValue]: Running InferInitValue 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.092 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferInitValue]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/PGTiling]: PGTiling finished after 0.579 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/InferInitValue]: InferInitValue finished after 0.024 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertIOTransposes]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.018 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.004 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.007 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyTensor]: Running SimplifyTensor 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyTensor]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48704 [sg0001/Tensorizer/DramToDramTranspose]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.005 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SundaISel]: Running SundaISel 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.009 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.009 seconds 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SundaISel]: Finished (changed=True) 2025-08-07T13:53:55Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/SundaISel]: SundaISel finished after 0.042 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.024 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.003 seconds 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:55Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.009 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.008 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.006 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.003 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.010 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.009 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.006 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeBlkDims]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.026 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.029 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.009 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.006 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.008 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SundaISel]: Running SundaISel 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 1.519 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingProfiler]: Running TilingProfiler 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 512: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 384: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 128: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 128: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 48: simd128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: simd128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 32: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 16: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 16: softmax128x1x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 16: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 4: rmsnorm128x512x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 4: simd64x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingBottleneck]: 4: simd64x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingProfiler]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.025 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronValueNumbering]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.012 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.041 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.010 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.037 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/InferNeuronTensor]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.040 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.014 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.007 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.007 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.003 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.008 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeDMA]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.041 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.005 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/DeConcat]: Running DeConcat 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/DeConcat]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialSimdFusion]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/TritiumFusion]: Running TritiumFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/TritiumFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.031 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.012 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DataLocalityOpt]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.121 seconds 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 20 MACROS WITH LARGEST INSTRUCTION COUNTS: 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 512: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 384: dma128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 384: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 48: dma128x4096 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 48: dma128x4096 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 48: simd128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: transpose_128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x128x128 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 32: matmul_128x128x512 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 16: dma128x4096 2025-08-07T13:53:56Z INFO 48704 [sg0001/Tensorizer/DMATilingProfiler]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.004 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/DeConcat]: Running DeConcat 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.006 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeMatMult]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.011 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialLoopFusion]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.021 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.006 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.009 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.011 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.005 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerTranspose]: Running LowerTranspose 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerTranspose]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.020 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.012 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerBroadcast]: Running LowerBroadcast 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerBroadcast]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.002 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LateNeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.008 seconds 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast 2025-08-07T13:53:56Z INFO 48703 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.008 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/SplitAccGrp]: Running SplitAccGrp 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/SplitAccGrp]: Finished (changed=False) 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.001 seconds 2025-08-07T13:53:56Z INFO 48705 [sg0002/Tensorizer/SpillPSum]: Running SpillPSum 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/SpillPSum]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.004 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/SpillPSum]: SpillPSum finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.011 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.028 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/RewriteWeights]: Running RewriteWeights 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/RewriteWeights]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SpillPSum]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/ReshapeWeights]: Running ReshapeWeights 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/ReshapeWeights]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.006 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/InferInitValue]: Running InferInitValue 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.012 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/InferInitValue]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/InferInitValue]: InferInitValue finished after 0.031 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyTensor]: Running SimplifyTensor 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyTensor]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.031 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.001 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.004 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LowerIntrinsics]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.008 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.304 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InlineNativeKernels]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.006 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LICM]: Running LICM 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LICM]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.031 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.009 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeType]: Running LegalizeType 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeType]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SundaISel]: Running SundaISel 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeType]: LegalizeType finished after 0.013 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SundaISel]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/SundaISel]: SundaISel finished after 0.042 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.036 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.035 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.004 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.007 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.007 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.001 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DataStreaming]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.039 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopFusion]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.015 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.001 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.096 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.005 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FactorizeBlkDims]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.005 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.005 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InferPSumTensor]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 273.865us (31.758MiB, est bw: 121.595GB/s, 51.078% of tot. time) for bfloat16<32 x 16260> TongaSB partitions[2] bfloat16 (4, 8, 32, 16260) %'all_gather.1_nostride_1562'(init=0.0)[c0_980,4c1_981_0+c1_981_1,i0.32,i1.16260] = load bfloat16<32 x 16260> non_local bfloat16 (32, 16384) %'all_gather.1'[i0.32,32c0_980+16c1_981_0+i1.16260+4c1_981_1] # id=1137, src_id=None, , attrs={'can_read_uninit': True}, instances=32 # dl = tensor_op_name: UnnamedModule | hlo_id: 1 | [[i0.32];[i1.16260]] -> [[i0.32];[i1.16260]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 74.128us (16.000MiB, est bw: 226.329GB/s, 13.825% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (4, 4, 128, 4096) %'input83_local_1033'[i48_0_1297,i32_0_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 4, 128, 4096) %'input83'[i48_0_1297,i32_0_0_1,i0.128,i1.4096] # id=1179, src_id=None, , instances=16 # dl = tensor_op_name: _dot.2 | hlo_id: 34 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 74.128us (16.000MiB, est bw: 226.329GB/s, 13.825% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (4, 2, 128, 16, 512) %'input77_local_1070'[i122_0_0_0_1076_0,i122_0_0_0_1,i0.128,i3.16,i1.128+256p_1673+128i2.2] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (8, 2, 128, 16, 2, 128) %'input77'[2i122_0_0_0_1076_0+i122_0_0_0_1,p_1673,i0.128,i3.16,i2.2,i1.128] # id=1283, src_id=None, , instances=16 # dl = tensor_op_name: _dot.3 | hlo_id: 145 | [[i0.128];[i1.128, i2.2, i3.16]] -> [[i0.128];[i1.128, i2.2, i3.16]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 22.961us (1.000MiB, est bw: 45.668GB/s, 4.282% of tot. time) for bfloat16<128 x 128> TongaSB partitions[1] bfloat16 (32, 128, 128) %'custom-call.226.1457'[i29_0_1019,i0.128,i1.128] = load bfloat16<128 x 128> non_local bfloat16 (32, 16384) %'all_gather.1'[i29_0_1019,128i0.128+i1.128] # id=1174, src_id=None, , instances=32 # dl = tensor_op_name: _custom-call.226 | hlo_id: 27 | [[i0.128];[i1.128]] -> [[i0.128];[i1.128]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 19.507us (4.000MiB, est bw: 215.017GB/s, 3.638% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (4, 128, 4096) %'input81_local_1046'[i59_0_0_1845,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 128, 4096) %'input81'[i59_0_0_1845,i0.128,i1.4096] # id=1224, src_id=None, , instances=4 # dl = tensor_op_name: _dot.1 | hlo_id: 82 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 19.507us (4.000MiB, est bw: 215.017GB/s, 3.638% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 32, 512) %'input78_local_1059'[i0.128,i2.32,128p_1604+i1.128] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 128, 32, 128) %'input78'[p_1604,i0.128,i2.32,i1.128] # id=1278, src_id=None, , instances=4 # dl = tensor_op_name: _dot | hlo_id: 131 | [[i0.128];[i1.128, i2.32]] -> [[i0.128];[i1.128, i2.32]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 7.534us (512.000KiB, est bw: 69.590GB/s, 1.405% of tot. time) for bfloat16<128 x 128> non_local bfloat16 (4, 4, 128, 128) %'transpose.1'[T_i12_0_954,T_i12_1_954_0_1848_1849,i1.128,i0.128] = store bfloat16<128 x 128> TongaSB partitions[1] bfloat16 (4, 128, 512) %'950.1676'[T_i12_0_954,i1.128,i0.128+128T_i12_1_954_0_1848_1849] # id=1398, src_id=None, , instances=16 # dl = tensor_op_name: transpose.1_pftranspose_950 | hlo_id: 16 | [[i1.128];[i0.128]] -> [[i1.128];[i0.128]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 7.337us (1.000MiB, est bw: 142.912GB/s, 1.368% of tot. time) for bfloat16<32 x 4096> {'IntermediateTensor': ''}bfloat16 (128, 32, 128) %'intermediate1'(init=0.0)[32i0_0_0_992+i2.32,i0.32,i1.128] = store bfloat16<32 x 4096> TongaSB partitions[1] bfloat16 (4, 32, 32, 128) %'UnnamedModule.1678'[i0_0_0_992,i0.32,i2.32,i1.128] # id=1139, src_id=None, , instances=4 # dl = tensor_op_name: UnnamedModule | hlo_id: 1 | [[i0.32];[i1.128, i2.32]] -> [[i0.32];[i1.128, i2.32]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 4.953us (1.000MiB, est bw: 211.705GB/s, 0.924% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (524288,) %'dot.4-buffer-1868'[1024i122_0_0_0_1076_0+4096i0.128+i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[1] bfloat16 (4, 128, 1024) %1077[i122_0_0_0_1076_0,i0.128,i1.1024] # id=1286, src_id=None, , instances=4 # dl = tensor_op_name: _dot.3 | hlo_id: 145 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 3.836us (512.000KiB, est bw: 136.670GB/s, 0.715% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[0] bfloat16 (128, 2048) %'transpose.1_pftranspose_950'[i0.128,i1.2048] = indirect_load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (151936, 2048) %'input76'[i0.128,i1.2048] generic generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[0] int32 (128, 1) %'gather.41.1674'[i0.128,0] # id=1135, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=1 # dl = tensor_op_name: _gather.41 | hlo_id: 16 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.308 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/WeightCoalescing]: Running WeightCoalescing 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/WeightCoalescing]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.005 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.008 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.002 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.010 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.016 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.004 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.075 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/RelaxPredicates]: Running RelaxPredicates 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/RelaxPredicates]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.036 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronValueNumbering]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.014 seconds 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/TensorInitialization]: Running TensorInitialization 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:57Z INFO 48705 [sg0002/Tensorizer/TensorInitialization]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.031 seconds 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit 2025-08-07T13:53:57Z INFO 48703 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.013 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/VectorizeDMA]: Finished (changed=True) 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.003 seconds 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:57Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.001 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DeConcat]: Running DeConcat 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DeConcat]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.001 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialSimdFusion]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.011 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TritiumFusion]: Running TritiumFusion 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.013 seconds 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TritiumFusion]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.043 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.005 seconds 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.011 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/VectorizeMatMult]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialLoopFusion]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.119 seconds 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/ExpandISAMacro]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48703 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.046 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.011 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48703 [Tensorizer]: BirCodeGen estimate #instances=1466 in sg0000 2025-08-07T13:53:58Z INFO 48703 [Tensorizer]: IR signature: 3964da3cc9cc122ff21f31e2c8bf756c8441b11f6ecb814c2922ac3920edc847 for nc00/sg0000/TensorizerBIR 2025-08-07T13:53:58Z INFO 48703 [Tensorizer]: Weights total number of bytes: 139520 2025-08-07T13:53:58Z INFO 48703 [Tensorizer]: Successfully built model. 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.011 seconds 2025-08-07T13:53:58Z INFO 48705 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.004 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerTranspose]: Running LowerTranspose 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerTranspose]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.010 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerBroadcast]: Running LowerBroadcast 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerBroadcast]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateNeuronInstComb]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.018 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAccGrp]: Running SplitAccGrp 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAccGrp]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.001 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SpillPSum]: Running SpillPSum 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SpillPSum]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SpillPSum]: SpillPSum finished after 0.013 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerIntrinsics]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.032 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InlineNativeKernels]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeType]: Running LegalizeType 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeType]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeType]: LegalizeType finished after 0.004 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.006 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InferPSumTensor]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.023 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/WeightCoalescing]: Running WeightCoalescing 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/WeightCoalescing]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.015 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/RelaxPredicates]: Running RelaxPredicates 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/RelaxPredicates]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TensorInitialization]: Running TensorInitialization 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TensorInitialization]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ExpandISAMacro]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.005 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMALocalityOpt]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DataStreaming]: Running DataStreaming 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DataStreaming]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DataStreaming]: DataStreaming finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SFKVectorizer]: Running SFKVectorizer 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SFKVectorizer]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.111 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizeInst]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.004 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CoalesceCCOp]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.005 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.001 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Running DMAProfiler 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 25.347% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[3] bfloat16 (4, 2, 2, 128, 24, 512) %'input84_local_886'[i15_0_0_892_0,i15_0_0_1,c1_880,i0.128,i2.24,i1.128+128p_1295] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (8, 4, 2, 128, 24, 128) %'input84'[2i15_0_0_892_0+i15_0_0_1,p_1295,c1_880,i0.128,i2.24,i1.128] # id=1044, src_id=None, , instances=64 # dl = tensor_op_name: _dot.6 | hlo_id: 49 | [[i0.128];[i1.128, i2.24]] -> [[i0.128];[i1.128, i2.24]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 219.783us (48.000MiB, est bw: 229.006GB/s, 24.073% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (2, 24, 128, 4096) %'input85_local_867'[i10_0_0,i10_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input85'[i10_0_0,i10_0_1,i0.128,i1.4096] # id=1035, src_id=None, , instances=48 # dl = tensor_op_name: _dot.4 | hlo_id: 39 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 219.783us (48.000MiB, est bw: 229.006GB/s, 24.073% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (2, 24, 128, 4096) %'input87_local_876'[i12_0_0,i12_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input87'[i12_0_0,i12_0_1,i0.128,i1.4096] # id=1038, src_id=None, , instances=48 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 74.128us (16.000MiB, est bw: 226.329GB/s, 8.119% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (4, 4, 128, 4096) %'input94_local_906'[i41_0_1140,i25_0_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 4, 128, 4096) %'input94'[i41_0_1140,i25_0_0_1,i0.128,i1.4096] # id=1058, src_id=None, , instances=16 # dl = tensor_op_name: _dot.9 | hlo_id: 67 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 74.128us (16.000MiB, est bw: 226.329GB/s, 8.119% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (4, 2, 128, 16, 512) %'input88_local_963'[i115_0_0_0_969_0,i115_0_0_0_1,i0.128,i3.16,i1.128+256p_1319+128i2.2] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (8, 2, 128, 16, 2, 128) %'input88'[2i115_0_0_0_969_0+i115_0_0_0_1,p_1319,i0.128,i3.16,i2.2,i1.128] # id=1113, src_id=None, , instances=16 # dl = tensor_op_name: _dot.10 | hlo_id: 165 | [[i0.128];[i1.128, i2.2, i3.16]] -> [[i0.128];[i1.128, i2.2, i3.16]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 19.507us (4.000MiB, est bw: 215.017GB/s, 2.137% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (4, 128, 4096) %'input92_local_919'[i52_0_0_1504,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 128, 4096) %'input92'[i52_0_0_1504,i0.128,i1.4096] # id=1079, src_id=None, , instances=4 # dl = tensor_op_name: _dot.8 | hlo_id: 102 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 19.507us (4.000MiB, est bw: 215.017GB/s, 2.137% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 32, 512) %'input89_local_952'[i0.128,i2.32,128p_1306+i1.128] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (4, 128, 32, 128) %'input89'[p_1306,i0.128,i2.32,i1.128] # id=1108, src_id=None, , instances=4 # dl = tensor_op_name: _dot.7 | hlo_id: 151 | [[i0.128];[i1.128, i2.32]] -> [[i0.128];[i1.128, i2.32]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 0.641% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 4096) %'816.1259'[i0.128,i1.4096] = load bfloat16<128 x 4096> non_local bfloat16 (128, 4096) %'add.4'[i0.128,i1.4096] # id=1141, src_id=None, , instances=1 # dl = tensor_op_name: add.4_pftranspose_816 | hlo_id: 17 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 0.641% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 4096) %'820.1264'[i0.128,i1.4096] = load bfloat16<128 x 4096> non_local bfloat16 (524288,) %'all_reduce.1-buffer-1533'[4096i0.128+i1.4096] # id=1150, src_id=None, , instances=1 # dl = tensor_op_name: all_reduce.1_pftranspose_820 | hlo_id: 52 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 4.953us (1.000MiB, est bw: 211.705GB/s, 0.543% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (524288,) %'dot.7-buffer-1531'[1024i15_0_0_892_0+4096i0.128+i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[1] bfloat16 (4, 128, 1024) %893[i15_0_0_892_0,i0.128,i1.1024] # id=1047, src_id=None, , instances=4 # dl = tensor_op_name: _dot.6 | hlo_id: 49 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.004 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.018 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/StaticProfiler]: Running StaticProfiler 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/StaticProfiler]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.003 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAPUnionSets]: Finished (changed=True) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.010 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.002 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.004 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) 2025-08-07T13:53:58Z INFO 48704 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.031 seconds 2025-08-07T13:53:58Z INFO 48704 [Tensorizer]: BirCodeGen estimate #instances=5137 in sg0001 2025-08-07T13:53:58Z INFO 48704 [Tensorizer]: IR signature: d5acbe7a0b31f8ac95815686da7c851afebbcfdd5cbaa985cad9f402d630f11e for nc00/sg0001/TensorizerBIR 2025-08-07T13:53:58Z INFO 48704 [Tensorizer]: Weights total number of bytes: 139264 2025-08-07T13:53:58Z INFO 48704 [Tensorizer]: Successfully built model. 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 1.038 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMALocalityOpt]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.005 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DataStreaming]: Running DataStreaming 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DataStreaming]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DataStreaming]: DataStreaming finished after 0.037 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SFKVectorizer]: Running SFKVectorizer 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SFKVectorizer]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.288 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/LateLegalizeInst]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.014 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/CoalesceCCOp]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.014 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.009 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Running DMAProfiler 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 2.705ms (594.000MiB, est bw: 230.258GB/s, 73.507% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (594, 128, 4096) %'695.1078'[i31_0,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (75968, 4096) %'input473'[128i31_0+i0.128,i1.4096] # id=1077, src_id=None, , instances=594 # dl = tensor_op_name: input473_pftranspose_695 | hlo_id: 90 | if -128i31_0-i0.128+75967 >= 0 [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 6.288% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[3] bfloat16 (4, 2, 2, 128, 24, 512) %'input469_local_766'[i15_0_0_772_0,i15_0_0_1,c1_760,i0.128,i2.24,i1.128+128p_2159] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (8, 4, 2, 128, 24, 128) %'input469'[2i15_0_0_772_0+i15_0_0_1,p_2159,c1_760,i0.128,i2.24,i1.128] # id=930, src_id=None, , instances=64 # dl = tensor_op_name: _dot.256 | hlo_id: 59 | [[i0.128];[i1.128, i2.24]] -> [[i0.128];[i1.128, i2.24]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 219.783us (48.000MiB, est bw: 229.006GB/s, 5.972% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (2, 24, 128, 4096) %'input470_local_747'[i10_0_0,i10_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input470'[i10_0_0,i10_0_1,i0.128,i1.4096] # id=921, src_id=None, , instances=48 # dl = tensor_op_name: _dot.254 | hlo_id: 49 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 219.783us (48.000MiB, est bw: 229.006GB/s, 5.972% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[2] bfloat16 (2, 24, 128, 4096) %'input472_local_756'[i12_0_0,i12_0_1,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input472'[i12_0_0,i12_0_1,i0.128,i1.4096] # id=924, src_id=None, , instances=48 # dl = tensor_op_name: _dot.255 | hlo_id: 40 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 191.807us (297.000KiB, est bw: 1.586GB/s, 5.212% of tot. time) for float32<1 x 128> {'no_delinear': '0'}non_local float32 (1, 75968) %'convert.59'[0,128i31_0+i0.128] = store float32<1 x 128> TongaSB partitions[1] float32 (594, 1, 128) %'dot.257.1088'[i31_0,0,i0.128] # id=1086, src_id=None, , instances=594 # dl = tensor_op_name: _dot.257 | hlo_id: 90 | if -128i31_0-i0.128+75967 >= 0 [[];[i0.128]] -> [[];[i0.128]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 22.647us (296.758KiB, est bw: 13.418GB/s, 0.615% of tot. time) for float32<1 x 15194> TongaSB partitions[1] float32 (5, 1, 15194) %'custom-call.411.1157'[i1,0,i0.15194] = load float32<1 x 15194> {'no_delinear': '0'}non_local float32 (1, 75968) %'convert.59'[15194i1+i0.15194] # id=1152, src_id=None, , instances=5 # dl = tensor_op_name: _custom-call.411 | hlo_id: 93 | if -15194i1-i0.15194+75967 >= 0 [[];[i0.15194]] -> [[];[i0.15194]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 0.159% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 4096) %'699.2138'[i0.128,i1.4096] = load bfloat16<128 x 4096> non_local bfloat16 (128, 4096) %'add.9'[i0.128,i1.4096] # id=1052, src_id=None, , instances=1 # dl = tensor_op_name: add.9_pftranspose_699 | hlo_id: 27 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 0.159% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 4096) %'703.2143'[i0.128,i1.4096] = load bfloat16<128 x 4096> non_local bfloat16 (524288,) %'all_reduce.3-buffer-2750'[4096i0.128+i1.4096] # id=1061, src_id=None, , instances=1 # dl = tensor_op_name: all_reduce.3_pftranspose_703 | hlo_id: 62 | [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 4.953us (1.000MiB, est bw: 211.705GB/s, 0.135% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (524288,) %'dot.14-buffer-2748'[1024i15_0_0_772_0+4096i0.128+i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[1] bfloat16 (4, 128, 1024) %773[i15_0_0_772_0,i0.128,i1.1024] # id=933, src_id=None, , instances=4 # dl = tensor_op_name: _dot.256 | hlo_id: 59 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 0.109% of tot. time) for bfloat16<128 x 4096> non_local bfloat16 (128, 32, 128) %'convert.57'[i0.128,i2.4+4i3.8,i1.128] = store bfloat16<128 x 4096> TongaSB partitions[0] bfloat16 (128, 8, 512) %'707.2553'[i0.128,i3.8,i1.128+128i2.4] # id=1065, src_id=None, , instances=1 # dl = tensor_op_name: convert.57_pftranspose_707 | hlo_id: 70 | [[i0.128];[i1.128, i2.4, i3.8]] -> [[i0.128];[i1.128, i2.4, i3.8]] 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.012 seconds 2025-08-07T13:53:59Z INFO 48705 [sg0002/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.004 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] 2025-08-07T13:53:59Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DoNothing]: Running DoNothing 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.003 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.526 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.019 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/StaticProfiler]: Running StaticProfiler 2025-08-07T13:54:00Z WARNING 48705 [sg0002/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 79.86 percent of all matmul computation 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/StaticProfiler]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.013 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/SplitAPUnionSets]: Finished (changed=True) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.106 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.013 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.085 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds 2025-08-07T13:54:00Z INFO 48705 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop 2025-08-07T13:54:01Z INFO 48705 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) 2025-08-07T13:54:01Z INFO 48705 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.462 seconds 2025-08-07T13:54:01Z INFO 48705 [Tensorizer]: BirCodeGen estimate #instances=96499 in sg0002 2025-08-07T13:54:01Z INFO 48705 [Tensorizer]: IR signature: 0635d3edaf75e9c19039f712ad29cf07e67b796ed4064732e2e600d5c9f5e9ff for nc00/sg0002/TensorizerBIR 2025-08-07T13:54:01Z INFO 48705 [Tensorizer]: Weights total number of bytes: 135176 2025-08-07T13:54:01Z INFO 48705 [Tensorizer]: Successfully built model. 2025-08-07T13:54:01Z USER 47514 [root/Tensorizer/Tensorizer]: Tensorizer finished after 9.739 seconds 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: End tensorization 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input76 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input0 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input79 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input83 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input82 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input1 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input81 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input80 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input78 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input77 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input4 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input2 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input5 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input86 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input87 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input85 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input84 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input90 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input94 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input93 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input92 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input91 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input89 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input88 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input6 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input2 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input7 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input471 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input472 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input470 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input469 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input474 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input1 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input473 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Network input: input3 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote bir.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote tensor_map.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote bir.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote tensor_map.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote bir.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: wrote tensor_map.json 2025-08-07T13:54:01Z INFO 47514 [job.Frontend.0]: Job #0 finished 2025-08-07T13:54:01Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.Frontend.0 2025-08-07T13:54:01Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 2025-08-07T13:54:01Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 2025-08-07T13:54:01Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: BackendDriver has 3 states with 1 core LNC 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: BackendDriver MT cwd: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5 2025-08-07T13:54:01Z INFO 47514 [job.BIRLinker.1]: Creating directory sgLnk/sg00 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: StateId sg00 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sg00 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: StateId sg01 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sg01 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: StateId sg02 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sg02 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: Number of subgraphs to link: 3 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: lnkState: {"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sgLnk/sg00", "state_id": "sgLnk"} 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: BackendDriver in_state.num_states 3 with 1 core LNC 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/log-neuron-cc.txt -o walrus_bir.out.json --enable-call-graph --enable-mt-backend --link-subgraphs sg00,sg01,sg02 --link-dir sgLnk/sg00 --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-lower-bound 0.14 --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --enable-internal-partitioner --dge-levels scalar_dynamic_offset,vector_dynamic_offsets,io --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.neff 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: Working directory is /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: propagate_exit=True 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: use_logger=False 2025-08-07T13:54:01Z INFO 47514 [job.WalrusDriver.0]: expose_stderr=True 2025-08-07T13:54:01Z INFO 49129 [Logging]: Logging to ../log-neuron-cc.txt at level 'INFO' 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: max_allowed_parallelism=128 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Loading module from sg00/bir.json 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Loading module from sg01/bir.json 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Loading module from sg02/bir.json 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Backend driver mtBackend: true numModules: 3 Cwd: "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5" 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: DynamicDMA is enabled 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets, 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Modular flow call graph is enabled 2025-08-07T13:54:01Z INFO 49129 [BackendDriver]: Internal partitioner is enabled 2025-08-07T13:54:01Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=663 blocks=3 instructions=1178 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: Running do_nothing 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=190 blocks=1 instructions=139 Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: do_nothing finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 67mb, ru_maxrss: 197mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 190 memory location(s), 1 block(s), and 139 instruction(s). Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=190 blocks=1 instructions=139 Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: Running do_nothing 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: Running do_nothing 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=135 blocks=1 instructions=75 Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 67mb, ru_maxrss: 197mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 135 memory location(s), 1 block(s), and 75 instruction(s). Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=135 blocks=1 instructions=75 Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z WARNING 49129 [birverifier::InstVisitor]: (sg00) Non - output memory location with no reader: {convert.282.1710}@SB<0,0>(1x2)#Internal DebugInfo: 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=338 blocks=1 instructions=964 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: do_nothing finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 67mb, ru_maxrss: 197mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 338 memory location(s), 1 block(s), and 964 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=338 blocks=1 instructions=964 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: birverifier finished after 0.003 seconds 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 67mb, ru_maxrss: 197mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 190 memory location(s), 1 block(s), and 139 instruction(s). Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: birverifier finished after 0.015 seconds 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 90mb, ru_maxrss: 197mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 135 memory location(s), 1 block(s), and 75 instruction(s). Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: birverifier finished after 0.134 seconds 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 238mb, ru_maxrss: 238mb (delta=41mb) 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 338 memory location(s), 1 block(s), and 964 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 2025-08-07T13:54:01Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.136 seconds 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: curr_vmrss: 230mb, ru_maxrss: 238mb (delta=41mb) 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 663 memory location(s), 3 block(s), and 1178 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 [BackendPassManager]: Running subgraph_parallel_pass 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=663 blocks=3 instructions=1178 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg00) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:01Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=190 blocks=1 instructions=139 Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 190 memory location(s), 1 block(s), and 139 instruction(s). Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg02) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:01Z USER 49129 (sg01) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:01Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=135 blocks=1 instructions=75 Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z USER 49129 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 135 memory location(s), 1 block(s), and 75 instruction(s). Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=338 blocks=1 instructions=964 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 338 memory location(s), 1 block(s), and 964 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 2025-08-07T13:54:01Z USER 49129 [BackendPassManager]: subgraph_parallel_pass finished after 0.001 seconds 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 663 memory location(s), 3 block(s), and 1178 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:01Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=663 blocks=3 instructions=1178 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: Running expand_replication 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=190 blocks=1 instructions=139 Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z INFO 49129 (sg00) [ExpandReplication]: Found 0 replicated matmults 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 190 memory location(s), 1 block(s), and 139 instruction(s). Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: Running unroll 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=190 blocks=1 instructions=139 Max writers: 2 Max Readers: 10 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:01 2025 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: Running expand_replication 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: Running expand_replication 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=135 blocks=1 instructions=75 Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z INFO 49129 (sg01) [ExpandReplication]: Found 0 replicated matmults 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 231mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 135 memory location(s), 1 block(s), and 75 instruction(s). Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: Running unroll 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=135 blocks=1 instructions=75 Max writers: 2 Max Readers: 8 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:01 2025 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=338 blocks=1 instructions=964 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z INFO 49129 (sg02) [ExpandReplication]: Found 0 replicated matmults 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: expand_replication finished after 0.001 seconds 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 232mb, ru_maxrss: 238mb (delta=0mb) 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 338 memory location(s), 1 block(s), and 964 instruction(s). Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z USER 49129 (sg02) [ModuleForkPass]: Running unroll 2025-08-07T13:54:01Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=338 blocks=1 instructions=964 Max writers: 191 Max Readers: 475 2025-08-07T13:54:01Z INFO 49129 (sg02) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:01 2025 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:01 2025 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: sg0000 Instruction count after Unroll: 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Total count: 1466 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Matmult: 930 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: GenericCopy: 136 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Load: 112 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: TensorTensor: 90 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: TensorScalarPtr: 82 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Activation: 53 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Save: 28 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: DMACopy: 10 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Memset: 9 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: TensorReduce: 8 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: TensorScalarAffineSelect: 4 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: CollectiveCompute: 2 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Reciprocal: 1 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Iota: 1 2025-08-07T13:54:01Z INFO 49129 (sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 9 2025-08-07T13:54:01Z USER 49129 (sg00) [ModuleForkPass]: unroll finished after 0.016 seconds 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 257mb, ru_maxrss: 257mb (delta=19mb) 2025-08-07T13:54:01Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 562 memory location(s), 1 block(s), and 1466 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:01 2025 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: sg0001 Instruction count after Unroll: 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Total count: 5137 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Matmult: 4467 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Load: 213 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: GenericCopy: 126 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: TensorScalarPtr: 102 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: TensorTensor: 92 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Activation: 92 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Save: 10 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: DMACopy: 10 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Memset: 10 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: TensorReduce: 8 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Select: 4 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: CollectiveCompute: 2 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Reciprocal: 1 2025-08-07T13:54:01Z INFO 49129 (sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 8 2025-08-07T13:54:01Z USER 49129 (sg01) [ModuleForkPass]: unroll finished after 0.052 seconds 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 292mb, ru_maxrss: 292mb (delta=54mb) 2025-08-07T13:54:01Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 755 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:01 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: sg0002 Instruction count after Unroll: 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Total count: 50754 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Matmult: 42227 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: GenericCopy: 6026 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Load: 786 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Save: 614 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Max: 224 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: MaxIndex: 224 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: MatchReplace: 217 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: TensorScalarPtr: 214 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: TensorTensor: 81 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Activation: 69 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Gather: 35 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Memset: 12 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: TensorReduce: 8 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: StreamShuffle: 4 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: CollectiveCompute: 3 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Select: 3 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Reciprocal: 3 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Iota: 2 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: DMACopy: 2 2025-08-07T13:54:02Z INFO 49129 (sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: unroll finished after 0.485 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 518mb, ru_maxrss: 518mb (delta=280mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9713 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.498 seconds 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=280mb) 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 11030 memory location(s), 3 block(s), and 57357 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: Running subgraph_parallel_pass 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=11030 blocks=3 instructions=57357 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=562 blocks=1 instructions=1466 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg01) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:02Z USER 49129 (sg02) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=755 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=9713 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z USER 49129 (sg00) [SubgraphForkPass]: dead_code_elim finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg01) [SubgraphForkPass]: dead_code_elim finished after 0.004 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg02) [SubgraphForkPass]: dead_code_elim finished after 0.049 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: subgraph_parallel_pass finished after 0.051 seconds 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10905 memory location(s), 3 block(s), and 57356 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=10905 blocks=3 instructions=57356 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: birverifier finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: birverifier finished after 0.006 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: birverifier finished after 0.047 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.049 seconds 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10905 memory location(s), 3 block(s), and 57356 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: Running subgraph_parallel_pass 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=10905 blocks=3 instructions=57356 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z USER 49129 (sg02) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:02Z USER 49129 (sg01) [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:02Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: subgraph_parallel_pass finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10905 memory location(s), 3 block(s), and 57356 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:02Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=10905 blocks=3 instructions=57356 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running instruction_reorder 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running instruction_reorder 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running instruction_reorder 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: instruction_reorder finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running psum_legalization 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: psum_legalization finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running legalize_cce_dma 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running pre_opts 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreOpts]: Skipped. No pre-opt passes enabled 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running error_injector 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z WARNING 49129 (sg00) [ErrorInjector]: Unrecognized injected error value "0" 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running vn_splitter 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: instruction_reorder finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running psum_legalization 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: psum_legalization finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running legalize_cce_dma 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0 seconds 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: vn_splitter finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running constant_propagate 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running pre_opts 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreOpts]: Skipped. No pre-opt passes enabled 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running error_injector 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z WARNING 49129 (sg01) [ErrorInjector]: Unrecognized injected error value "0" 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running vn_splitter 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 1 2025-08-07T13:54:02Z INFO 49129 (sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0 seconds 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: vn_splitter finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running constant_propagate 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: constant_propagate finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running lower_ac 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: lower_ac finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running input_dma_coalescing 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running remat_optimization 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [RematOpt]: Removed 0 remat instructions 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: remat_optimization finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running early_peephole_opts 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true 2025-08-07T13:54:02Z INFO 49129 (sg00) [EarlyPeepholeOpts]: Activation Accumulate: 0 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: early_peephole_opts finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running infer_stream_ids 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running pre_sched 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Start... 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Found 1 Splits CCs 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: Grouped CCs to 1 clusters. 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Done. 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: No split opportunities: 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_memsets 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_memsets: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_loads 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_loads: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 1Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Allocs: 518 instructions: 1465 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: constant_propagate finished after 0.005 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running lower_ac 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: lower_ac finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running input_dma_coalescing 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running remat_optimization 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Build fdeps inserted 3724 edges 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Done build fdeps 3724 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End build flow dependencies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start remove useless insts Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove_useless_insts 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: remove Useless Instructions: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End remove useless insts Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [RematOpt]: Removed 0 remat instructions 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: remat_optimization finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running early_peephole_opts 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true 2025-08-07T13:54:02Z INFO 49129 (sg00) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: pre_sched finished after 0.007 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [EarlyPeepholeOpts]: Activation Accumulate: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: Tensor CP elimination: 0 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: early_peephole_opts finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: instruction_reorder finished after 0.012 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running infer_stream_ids 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running pre_sched 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Start... 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running psum_legalization 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Found 2 Splits CCs 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: Grouped CCs to 2 clusters. 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dynamic_dma_setup 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running runtime_memory_reservation 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 363mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running coloring_allocator_psum 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Done. 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: allocating PSUM 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: No split opportunities: 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: main loop 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_memsets 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: size = 96 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: build_no_bitmap start 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: 50% PSUM demand before spilling 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: PSUM high-water mark = 4 tensors 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: found 77 edges 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: mean: 1.60417 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: median: 2.38206 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: adjacency vectors require 616 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: build_no_bitmap done 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: find costs 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_memsets: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_loads 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_loads: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: initialize low and high 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: lo = 96 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: hi = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: inf = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: total = 96 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: new candidates = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: select ranges 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: no more spills 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles 2025-08-07T13:54:02Z INFO 49129 (sg00) [PSUM_Allocator]: 50% PSUM utilization after allocation 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.004 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dma_optimization_psum 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running address_rotation_psum 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 19 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 2Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Allocs: 704 instructions: 5137 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: psum_legalization finished after 0.006 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 32 PSUM Banks 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: address_rotation_psum finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running coloring_allocator_sb 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 76338944 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 6933 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 2703362 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 879 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 791040 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 343 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running legalize_cce_dma 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: allocating SB 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: main loop 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: size = 394 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: find partners 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: found 62 accumulation groups 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: largest = _dot-t1112 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: tensors = 33 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: requires 40960 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: expanding partners 2025-08-07T13:54:02Z INFO 49129 []: find first defs for local 2025-08-07T13:54:02Z INFO 49129 []: find first defs for global 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: find loads 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: 1 pin count 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: 89 remat count 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: build interference graph 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: pass 1 int-tree 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Num intervals 394 Num locations 394 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: IntervalTree Build Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: info.neighbors init Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: info.neighbors partners Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: IntervalTree readback Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: edge: 9861 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: mean: 50.0558 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: median: 59.0874 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: find costs 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: initialize safe & unsafe 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: safe = 279 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: unsafe = 100 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: inf = 14 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: total = 393 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 90 #Pinned 0 #Safe 0 minCost 0.00254569 maxCost 0.0530634 locations 394 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: new candidates = 14 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: select ranges 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.007 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Total: 393 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Spilled: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Allocated: 1.000 (393) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Rover zone: 0.735 (289) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Pre-rover zone: 0.122 (48) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Post-rover zone: 0.142 (56) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Slice zone: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Blocks nothing: 0.084 (33) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Blocks medium: 0.038 (15) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until medium blocking (mean): 0.328 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until medium blocking (median): 0.353 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until medium blocking (p95): 0.434 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Blocks tall: 0.878 (345) 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.895 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: Success 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running pre_opts 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: SB spills = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: remats = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: unpinned = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: SB score = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: spilling from SB cost about 0 cycles 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Build fdeps inserted 15463 edges 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: pinning saved approximately 9010 cycles 2025-08-07T13:54:02Z INFO 49129 (sg00) [SB_Allocator]: 0% SB utilization after allocation 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Done build fdeps 15463 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End build flow dependencies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start remove useless insts Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove_useless_insts 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreOpts]: Skipped. No pre-opt passes enabled 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: pre_opts finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 76338944 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 6933 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 2703362 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 879 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 791040 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 343 bytes 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.009 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running error_injector 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: remove Useless Instructions: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End remove useless insts Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z WARNING 49129 (sg02) [ErrorInjector]: Unrecognized injected error value "0" 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: error_injector finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 519 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dma_optimization_sb 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=519 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running vn_splitter 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 79042306, 53.1233% input load, 1.43024% output write, 45.4464% spill/reload [sg0000] 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: removed 0 identical load 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:02Z INFO 49129 (sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 14 2025-08-07T13:54:02Z INFO 49129 (sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(4.19899e+07) 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: pre_sched finished after 0.020 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: Tensor CP elimination: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: average loaded DMA size 6933 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: average saved DMA size 879 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 76338944 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 6933 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 2703362 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 879 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 79042306, 53.1233% input load, 1.43024% output write, 45.4464% spill/reload [sg0000] 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 76338944 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 6933 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 2703362 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 879 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 791040 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 343 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 4870 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.005 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 12 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.003 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 704 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dynamic_dma_setup 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=704 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running runtime_memory_reservation 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running coloring_allocator_psum 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 14 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: allocating PSUM 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: main loop 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: size = 192 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 23 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: build_no_bitmap start 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: 100% PSUM demand before spilling 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: found 144 edges 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: mean: 1.5 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: median: 0.781706 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: adjacency vectors require 1152 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: build_no_bitmap done 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: find costs 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.005 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running coloring_allocator_dram 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg00) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: reserved space = 670693126 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: spill space = 3670016 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: aligned spill space = 3670016 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: dram space = 107374182400 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: size = 4 2025-08-07T13:54:02Z INFO 49129 []: find first defs for local 2025-08-07T13:54:02Z INFO 49129 []: find first defs for global 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: Num intervals 4 Num locations 4 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: IntervalTree Build Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: info.neighbors init Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: IntervalTree readback Done 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: initialize low and high 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: lo = 4 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: hi = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: total = 4 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: new candidates = 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: select ranges 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: CC buffer size limit 524288000 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: allreduce_dram_hwm 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: Real CC buffer size 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: DRAM hwm after allocation: 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DRAM_Allocator]: DRAM allocation successful 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running address_rotation_dram 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: Runtime page size at 512MB 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DRAM hwm before rotation 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: allreduce hwm 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: Real CC buffer size 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DRAM hwm after rotation 3670016 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: address_rotation_dram finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running tensorcopy_accel 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyAccel::Impl]: Accelerated 0 out of 144 tensorcopy in Function: sg0000 average acceleration factor: -nan 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running peephole_opts 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: peephole_opts finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running lower_kernel 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [LowerKernel]: Started running LowerKernel 2025-08-07T13:54:02Z INFO 49129 (sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 1465, number of allocs: 518 2025-08-07T13:54:02Z INFO 49129 (sg00) [LowerKernel]: Scan BKs time (s): 6.3e-05 2025-08-07T13:54:02Z INFO 49129 (sg00) [LowerKernel]: Lower BKs time (s): 1.1e-05 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: lower_kernel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running lower_nki_kernel 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dynamic_dma_cleanup 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: birverifier finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dynamic_dma_scan 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running build_fdeps 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 3Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Allocs: 518 instructions: 1465 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Build fdeps inserted 3724 edges 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Done build fdeps 3724 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: build_fdeps finished after 0.003 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running remove_redundancies 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [RemoveRedundancies]: remove_clobbered_writes 2025-08-07T13:54:02Z INFO 49129 (sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [RemoveRedundancies]: remove_useless_insts 2025-08-07T13:54:02Z INFO 49129 (sg00) [RemoveRedundancies]: remove Useless Instructions: 0 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: remove_redundancies finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: initialize low and high 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: lo = 192 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: hi = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: inf = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: total = 192 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: new candidates = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: select ranges 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: no more spills 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles 2025-08-07T13:54:02Z INFO 49129 (sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.019 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dma_optimization_psum 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.009 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running address_rotation_psum 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg02) [ShrinkDN]: INFO (ShrinkDN): Shrunk 2 nodes. Total savings 14336 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: Tensor CP elimination: 0 2025-08-07T13:54:02Z INFO 49129 (sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running post_sched 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 19 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 4 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 13 PSUM Banks 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: address_rotation_psum finished after 0.011 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running coloring_allocator_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 195236352 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7160 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 3145730 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2728 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 266240 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: allocating SB 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: main loop 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: size = 480 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: find partners 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 2025-08-07T13:54:02Z INFO 49129 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: found 180 accumulation groups 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: largest = _dot.6-t1000_i5 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: tensors = 50 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: requires 61440 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: expanding partners 2025-08-07T13:54:02Z INFO 49129 (sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.011 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.015 seconds 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: vn_splitter finished after 0.041 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 364mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running constant_propagate 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 []: find first defs for local 2025-08-07T13:54:02Z INFO 49129 []: find first defs for global 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: find loads 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: 1 pin count 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: 127 remat count 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: build interference graph 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: pass 1 int-tree 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Time-aware hwm post-sched 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Num intervals 480 Num locations 480 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: IntervalTree Build Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: info.neighbors init Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: info.neighbors partners Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: IntervalTree readback Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: edge: 18048 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: mean: 75.2 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: median: 69.4763 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: find costs 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: initialize safe & unsafe 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: safe = 209 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: unsafe = 152 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: inf = 118 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: total = 479 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 118 #Pinned 0 #Safe 0 minCost 0.00361083 maxCost 0.0825967 locations 480 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: new candidates = 95 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: select ranges 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Total: 479 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Spilled: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Allocated: 1.000 (479) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Rover zone: 0.733 (351) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Pre-rover zone: 0.006 (3) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Post-rover zone: 0.261 (125) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Slice zone: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Blocks nothing: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Blocks medium: 0.000 (0) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Blocks tall: 1.000 (479) 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Visited until tall blocking (mean): 0.998 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: Success 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: SB spills = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: remats = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: unpinned = 0 tensors 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: SB score = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: spilling from SB cost about 0 cycles 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: pinning saved approximately 9010 cycles 2025-08-07T13:54:02Z INFO 49129 (sg01) [SB_Allocator]: 0% SB utilization after allocation 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 195236352 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7160 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 3145730 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2728 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 266240 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.013 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 705 memory location(s), 1 block(s), and 5137 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dma_optimization_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=705 blocks=1 instructions=5137 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 198382082, 97.3572% input load, 0.528565% output write, 2.11426% spill/reload [sg0001] 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: removed 0 identical load 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Time-aware simulation time: 417023 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: sub-graph will get execute 35 times 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 65536, 0.0330352% out of total dma traffic(1.93139e+08) 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: post_sched finished after 0.029 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running expand_scheduling_units 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: average loaded DMA size 7226 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: average saved DMA size 2728 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 195170816 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7226 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 40 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 3145730 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2728 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 27 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 65536, 0.0330352% out of total dma traffic 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 198316546, 97.3563% input load, 0.52874% output write, 2.11495% spill/reload [sg0001] 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 195170816 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7226 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 3145730 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2728 bytes 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 19 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 266240 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 6573 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.012 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5135 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=702 blocks=1 instructions=5135 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 3 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 2 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 20 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 71 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 44 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.013 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 11 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 366mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 9 Sb address 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.006 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} 2025-08-07T13:54:02Z INFO 49129 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.013 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5135 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running coloring_allocator_dram 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=702 blocks=1 instructions=5135 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:02Z INFO 49129 (sg01) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running dep_opt 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 4Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Allocs: 518 instructions: 1465 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: reserved space = 201462280 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: spill space = 5242880 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: aligned spill space = 5242880 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: dram space = 107374182400 bytes 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: renumber locations 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: size = 5 2025-08-07T13:54:02Z INFO 49129 []: find first defs for local 2025-08-07T13:54:02Z INFO 49129 []: find first defs for global 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Build fdeps inserted 3649 edges 2025-08-07T13:54:02Z INFO 49129 (sg00) [build_flow_deps]: Done build fdeps 3649 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: dep_opt finished after 0.003 seconds 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: Running report_stats 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: Num intervals 5 Num locations 5 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: IntervalTree Build Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: info.neighbors init Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: IntervalTree readback Done 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: simplify interference graph 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: initialize low and high 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: lo = 5 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: hi = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: total = 5 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: simplify 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: new candidates = 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: select ranges 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: CC buffer size limit 524288000 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: allreduce_dram_hwm 4194304 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: Real CC buffer size 4194304 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: DRAM hwm after allocation: 5242880 2025-08-07T13:54:02Z INFO 49129 (sg01) [DRAM_Allocator]: DRAM allocation successful 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: Data Movement Statistics: sg0000 ┌──────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├──────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ ExternalInput -> Internal │ 1 │ 622329856 │ │ DMACopy │ Internal -> ExternalOutput │ 8 │ 8388608 │ │ DMACopy │ Internal -> Output │ 1 │ 2097152 │ │ Load │ Const -> Internal │ 3 │ 37120 │ │ Load │ ExternalInput -> Internal │ 45 │ 41952768 │ │ Load │ Internal │ 32 │ 1048576 │ │ Load (Spill) │ Internal │ 32 │ 33300480 │ │ Save │ Internal │ 20 │ 1572864 │ │ Save │ Internal -> Output │ 8 │ 1130498 │ └──────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 3 │ │ 4 │ 1 │ │ 32 │ 1 │ │ 64 │ 1 │ │ 128 │ 1 │ │ 256 │ 52 │ │ 512 │ 1 │ │ 2048 │ 4 │ │ 4096 │ 1 │ │ 8192 │ 44 │ │ 32520 │ 32 │ │ 262144 │ 8 │ │ 1048576 │ 2 │ └─────────────────────┴───────┘ 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: MM Stats: #MatMults 930 #MatMult-Transposes 72 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: IO Tensor size combined: 668476932 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: IO Tensor Statistics: ┌────────────────────┬────────────────┬──────────┬──────────────┐ │ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ ├────────────────────┼────────────────┼──────────┼──────────────┤ │ input76 │ ExternalInput │ bfloat16 │ 622329856 │ │ input77 │ ExternalInput │ bfloat16 │ 16777216 │ │ input83 │ ExternalInput │ bfloat16 │ 16777216 │ │ input78 │ ExternalInput │ bfloat16 │ 4194304 │ │ input81 │ ExternalInput │ bfloat16 │ 4194304 │ │ input5 │ ExternalInput │ bfloat16 │ 1048576 │ │ input4 │ ExternalInput │ bfloat16 │ 1048576 │ │ output1 │ ExternalOutput │ bfloat16 │ 1048576 │ │ output2 │ ExternalOutput │ bfloat16 │ 1048576 │ │ input79 │ ExternalInput │ bfloat16 │ 8192 │ └────────────────────┴────────────────┴──────────┴──────────────┘ 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.004 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg00) [ReportStats]: Large (Internal) Tensor Statistics: ┌───────────────────────┬──────────┬──────────┬──────────────┐ │ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ ├───────────────────────┼──────────┼──────────┼──────────────┤ │ input78_local_1059 │ Internal │ bfloat16 │ 4194304 │ │ input77_local_1070_i6 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i4 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i3 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i5 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i1 │ Internal │ bfloat16 │ 2097152 │ │ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ │ input77_local_1070_i7 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i2 │ Internal │ bfloat16 │ 2097152 │ │ input77_local_1070_i0 │ Internal │ bfloat16 │ 2097152 │ └───────────────────────┴──────────┴──────────┴──────────────┘ 2025-08-07T13:54:02Z USER 49129 (sg00) [ModuleForkPass]: report_stats finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5135 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running address_rotation_dram 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=702 blocks=1 instructions=5135 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: Runtime page size at 512MB 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DRAM hwm before rotation 5242880 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: allreduce hwm 4194304 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: Real CC buffer size 4194304 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DRAM hwm after rotation 5242880 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: address_rotation_dram finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5135 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running tensorcopy_accel 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=702 blocks=1 instructions=5135 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyAccel::Impl]: Accelerated 4 out of 136 tensorcopy in Function: sg0001 average acceleration factor: 1 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5135 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running peephole_opts 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=702 blocks=1 instructions=5135 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: peephole_opts finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running lower_kernel 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [LowerKernel]: Started running LowerKernel 2025-08-07T13:54:02Z INFO 49129 (sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 5139, number of allocs: 702 2025-08-07T13:54:02Z INFO 49129 (sg01) [LowerKernel]: Scan BKs time (s): 0.001154 2025-08-07T13:54:02Z INFO 49129 (sg01) [LowerKernel]: Lower BKs time (s): 5e-06 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: lower_kernel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running lower_nki_kernel 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dynamic_dma_cleanup 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: birverifier finished after 0.003 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dynamic_dma_scan 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running build_fdeps 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 5Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Allocs: 702 instructions: 5139 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Build fdeps inserted 15483 edges 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Done build fdeps 15483 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: build_fdeps finished after 0.009 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running remove_redundancies 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [RemoveRedundancies]: remove_clobbered_writes 2025-08-07T13:54:02Z INFO 49129 (sg01) [RemoveRedundancies]: remove_clobbered_writes: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [RemoveRedundancies]: remove_useless_insts 2025-08-07T13:54:02Z INFO 49129 (sg01) [RemoveRedundancies]: remove Useless Instructions: 0 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: remove_redundancies finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 367mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.033 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 368mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: Tensor CP elimination: 0 2025-08-07T13:54:02Z INFO 49129 (sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.003 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 369mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 369mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running post_sched 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: constant_propagate finished after 0.147 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 370mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running lower_ac 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Time-aware hwm post-sched 2025-08-07T13:54:02Z INFO 49129 (sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: lower_ac finished after 0.009 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 371mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running input_dma_coalescing 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.019 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 372mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running remat_optimization 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Time-aware simulation time: 33366410 2025-08-07T13:54:02Z INFO 49129 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: post_sched finished after 0.091 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 372mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running expand_scheduling_units 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.000 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 372mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 43 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 16 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 1 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 27 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 7 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z INFO 49129 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.037 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 372mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.021 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} 2025-08-07T13:54:02Z INFO 49129 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.002 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running dep_opt 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 6Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Allocs: 702 instructions: 5139 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Build fdeps inserted 15350 edges 2025-08-07T13:54:02Z INFO 49129 (sg01) [build_flow_deps]: Done build fdeps 15350 Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: dep_opt finished after 0.014 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: Running report_stats 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: Data Movement Statistics: sg0001 ┌─────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├─────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ Input -> Internal │ 1 │ 3145728 │ │ DMACopy │ Internal -> ExternalOutput │ 8 │ 8388608 │ │ DMACopy │ Internal -> Output │ 1 │ 2097152 │ │ Load │ Const -> Internal │ 2 │ 36864 │ │ Load │ ExternalInput -> Internal │ 204 │ 192954880 │ │ Load │ Input -> Internal │ 3 │ 81920 │ │ Load │ Internal │ 2 │ 2097152 │ │ Save │ Internal │ 8 │ 2097152 │ │ Save │ Internal -> Output │ 2 │ 1048578 │ └─────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 3 │ │ 32 │ 1 │ │ 64 │ 2 │ │ 128 │ 1 │ │ 256 │ 3 │ │ 2048 │ 8 │ │ 6144 │ 64 │ │ 8192 │ 139 │ │ 262144 │ 8 │ │ 1048576 │ 5 │ └─────────────────────┴───────┘ 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: MM Stats: #MatMults 4467 #MatMult-Transposes 122 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: IO Tensor size combined: 197149188 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: IO Tensor Statistics: ┌────────────────────┬────────────────┬──────────┬──────────────┐ │ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ ├────────────────────┼────────────────┼──────────┼──────────────┤ │ input87 │ ExternalInput │ bfloat16 │ 50331648 │ │ input84 │ ExternalInput │ bfloat16 │ 50331648 │ │ input85 │ ExternalInput │ bfloat16 │ 50331648 │ │ input88 │ ExternalInput │ bfloat16 │ 16777216 │ │ input94 │ ExternalInput │ bfloat16 │ 16777216 │ │ input92 │ ExternalInput │ bfloat16 │ 4194304 │ │ input89 │ ExternalInput │ bfloat16 │ 4194304 │ │ output4 │ ExternalOutput │ bfloat16 │ 1048576 │ │ input7 │ ExternalInput │ bfloat16 │ 1048576 │ │ input6 │ ExternalInput │ bfloat16 │ 1048576 │ └────────────────────┴────────────────┴──────────┴──────────────┘ 2025-08-07T13:54:02Z INFO 49129 (sg01) [ReportStats]: Large (Internal) Tensor Statistics: ┌──────────────────────┬──────────┬──────────┬──────────────┐ │ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ ├──────────────────────┼──────────┼──────────┼──────────────┤ │ input89_local_952 │ Internal │ bfloat16 │ 4194304 │ │ input84_local_886_i6 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i3 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i2 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i4 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i5 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i8 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i7 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i1 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i0 │ Internal │ bfloat16 │ 3145728 │ └──────────────────────┴──────────┴──────────┴──────────────┘ 2025-08-07T13:54:02Z USER 49129 (sg01) [ModuleForkPass]: report_stats finished after 0.001 seconds 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:02Z INFO 49129 (sg02) [RematOpt]: Removed 0 remat instructions 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: remat_optimization finished after 0.089 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running early_peephole_opts 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg02) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true 2025-08-07T13:54:02Z INFO 49129 (sg02) [EarlyPeepholeOpts]: Activation Accumulate: 0 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: early_peephole_opts finished after 0.023 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.007 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running infer_stream_ids 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: infer_stream_ids finished after 0.006 seconds 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 373mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9683 memory location(s), 1 block(s), and 50754 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z USER 49129 (sg02) [ModuleForkPass]: Running pre_sched 2025-08-07T13:54:02Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=9683 blocks=1 instructions=50754 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Start... 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Found 1 Splits CCs 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: Grouped CCs to 1 clusters. 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts 2025-08-07T13:54:02Z INFO 49129 [LayerSpiller]: LayerSpill: Done. 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Start split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Num_Splits: 1 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: End split live ranges Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_memsets 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_memsets: 0 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_loads 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_loads: 0 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: End remove redundncies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Start DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: End DCE Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 7Thu Aug 7 13:54:02 2025 2025-08-07T13:54:02Z INFO 49129 (sg02) [build_flow_deps]: Allocs: 9685 instructions: 50756 2025-08-07T13:54:03Z INFO 49129 (sg02) [build_flow_deps]: Build fdeps inserted 177894 edges 2025-08-07T13:54:03Z INFO 49129 (sg02) [build_flow_deps]: Done build fdeps 177894 Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: End build flow dependencies Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: Start remove useless insts Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: remove_useless_insts 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: remove Useless Instructions: 0 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: End remove useless insts Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z INFO 49129 (sg02) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:03 2025 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: pre_sched finished after 0.505 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50756 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=9685 blocks=1 instructions=50756 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z INFO 49129 (sg02) [TensorCopyElim]: Tensor CP elimination: 1 2025-08-07T13:54:03Z INFO 49129 (sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:03Z INFO 49129 (sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:03Z INFO 49129 (sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys 2025-08-07T13:54:03Z INFO 49129 (sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.120 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9684 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running dynamic_dma_setup 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=9684 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running runtime_memory_reservation 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 381mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running coloring_allocator_psum 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: allocating PSUM 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: main loop 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: renumber locations 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: size = 6075 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: build_no_bitmap start 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: 100% PSUM demand before spilling 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: found 16718 edges 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: mean: 5.50387 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: median: 6.99538 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: adjacency vectors require 133744 bytes 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: build_no_bitmap done 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: find costs 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: simplify interference graph 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: initialize low and high 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: lo = 6075 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: hi = 0 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: inf = 0 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: total = 6075 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: simplify 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: new candidates = 0 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: select ranges 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: no more spills 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles 2025-08-07T13:54:03Z INFO 49129 (sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.268 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running dma_optimization_psum 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z INFO 49129 (sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:03Z INFO 49129 (sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.052 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running address_rotation_psum 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks 2025-08-07T13:54:03Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks 2025-08-07T13:54:03Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: address_rotation_psum finished after 0.338 seconds 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z USER 49129 (sg02) [ModuleForkPass]: Running coloring_allocator_sb 2025-08-07T13:54:03Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 775793182 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7946 bytes 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 2414602 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 1480 bytes 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 8196 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 248 bytes 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:03Z INFO 49129 (sg02) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: allocating SB 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: main loop 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: renumber locations 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: size = 3571 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: find partners 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: found 6071 accumulation groups 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: largest = _dot.256-t851_i4 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: tensors = 50 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: requires 61440 bytes/partition 2025-08-07T13:54:03Z INFO 49129 (sg02) [SB_Allocator]: expanding partners 2025-08-07T13:54:03Z INFO 49129 []: find first defs for local 2025-08-07T13:54:04Z INFO 49129 []: find first defs for global 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: find loads 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: 1 pin count 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: 710 remat count 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: build interference graph 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: pass 1 int-tree 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Num intervals 3571 Num locations 3571 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: IntervalTree Build Done 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: info.neighbors init Done 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: info.neighbors partners Done 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: IntervalTree readback Done 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: edge: 29963 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: mean: 16.7813 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: median: 10.398 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: find costs 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: simplify interference graph 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: initialize safe & unsafe 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: safe = 3267 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: unsafe = 126 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: inf = 177 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: total = 3570 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: simplify 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 122 #Pinned 0 #Safe 0 minCost 0.00361083 maxCost 1.29711 locations 3571 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: new candidates = 98 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: select ranges 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Total: 3570 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Spilled: 0.000 (0) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Allocated: 1.000 (3570) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Rover zone: 0.928 (3313) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Pre-rover zone: 0.007 (25) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Post-rover zone: 0.064 (228) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Slice zone: 0.001 (4) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Blocks nothing: 0.057 (203) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Blocks medium: 0.003 (9) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.654 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until medium blocking (median): 0.711 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.737 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Blocks tall: 0.941 (3358) 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.797 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until tall blocking (median): 0.989 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 2025-08-07T13:54:04Z INFO 49129 (sg02) [SB_Allocator]: Success 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: SB spills = 0 tensors 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: remats = 0 tensors 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: unpinned = 0 tensors 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: size = 0 bytes/partition 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: SB score = 0 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: spilling from SB cost about 0 cycles 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: pinning saved approximately 9010 cycles 2025-08-07T13:54:28Z INFO 49129 (sg02) [SB_Allocator]: 0% SB utilization after allocation 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 775793182 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7946 bytes 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 2414602 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 1480 bytes 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 8196 2025-08-07T13:54:28Z INFO 49129 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 248 bytes 2025-08-07T13:54:28Z USER 49129 (sg02) [ModuleForkPass]: coloring_allocator_sb finished after 24.880 seconds 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:28Z USER 49129 (sg02) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:28Z USER 49129 (sg02) [ModuleForkPass]: address_rotation_sb finished after 0.132 seconds 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 383mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9685 memory location(s), 1 block(s), and 50755 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:28Z USER 49129 (sg02) [ModuleForkPass]: Running dma_optimization_sb 2025-08-07T13:54:28Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=9685 blocks=1 instructions=50755 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 778207784, 99.3792% input load, 5.14002e-07% output write, 0.620817% spill/reload [sg0002] 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: removed 0 identical load 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times 2025-08-07T13:54:28Z INFO 49129 (sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(7.73377e+08) 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 6 spill/reload instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 6 spill/reload memory locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 4100, 0.0848642% out of total spill/reload dma traffic 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 2 spill/reload instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 1 spill/reload memory locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: average loaded DMA size 7957 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: average saved DMA size 1608 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 775790876 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7957 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 2412296 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 1608 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 512, 0.0105977% out of total spill/reload dma traffic 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 4612, 0.000592644% out of total dma traffic 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 778203172, 99.3798% input load, 5.14005e-07% output write, 0.620228% spill/reload [sg0002] 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 775790876 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7957 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 2412296 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1608 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 8196 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 248 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 7858 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.330 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50748 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=9677 blocks=1 instructions=50748 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 302 Sb address 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 271 Sb address 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 165 Sb address 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 225 Sb address 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: address_rotation_sb finished after 0.295 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50748 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running coloring_allocator_dram 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=9677 blocks=1 instructions=50748 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:29Z INFO 49129 (sg02) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: reserved space = 775473690 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: spill space = 4513540 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: aligned spill space = 4554752 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: dram space = 107374182400 bytes 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: renumber locations 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: size = 18 2025-08-07T13:54:29Z INFO 49129 []: find first defs for local 2025-08-07T13:54:29Z INFO 49129 []: find first defs for global 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: Num intervals 18 Num locations 18 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: IntervalTree Build Done 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: info.neighbors init Done 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: IntervalTree readback Done 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: simplify interference graph 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: initialize low and high 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: lo = 18 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: hi = 0 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: total = 18 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: simplify 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: new candidates = 0 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: select ranges 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: CC buffer size limit 524288000 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: allreduce_dram_hwm 2113536 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: Real CC buffer size 2113536 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: DRAM hwm after allocation: 3162112 2025-08-07T13:54:29Z INFO 49129 (sg02) [DRAM_Allocator]: DRAM allocation successful 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.075 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50748 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running address_rotation_dram 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=9677 blocks=1 instructions=50748 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: Runtime page size at 512MB 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DRAM hwm before rotation 3162112 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: allreduce hwm 2113536 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: Real CC buffer size 2113536 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DRAM hwm after rotation 3162112 2025-08-07T13:54:29Z INFO 49129 (sg02) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: address_rotation_dram finished after 0.038 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50748 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running tensorcopy_accel 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=9677 blocks=1 instructions=50748 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass 2025-08-07T13:54:29Z INFO 49129 (sg02) [TensorCopyAccel::Impl]: Accelerated 0 out of 6038 tensorcopy in Function: sg0002 average acceleration factor: -nan 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.005 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50748 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running peephole_opts 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=9677 blocks=1 instructions=50748 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: peephole_opts finished after 0.017 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running lower_kernel 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [LowerKernel]: Started running LowerKernel 2025-08-07T13:54:29Z INFO 49129 (sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 50751, number of allocs: 9677 2025-08-07T13:54:29Z INFO 49129 (sg02) [LowerKernel]: Scan BKs time (s): 0.002976 2025-08-07T13:54:29Z INFO 49129 (sg02) [LowerKernel]: Lower BKs time (s): 5e-06 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: lower_kernel finished after 0.004 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running lower_nki_kernel 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.003 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running dynamic_dma_cleanup 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.005 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running birverifier 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: birverifier finished after 0.041 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running dynamic_dma_scan 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.006 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 386mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running build_fdeps 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 8Thu Aug 7 13:54:29 2025 2025-08-07T13:54:29Z INFO 49129 (sg02) [build_flow_deps]: Allocs: 9677 instructions: 50751 2025-08-07T13:54:29Z INFO 49129 (sg02) [build_flow_deps]: Build fdeps inserted 177889 edges 2025-08-07T13:54:29Z INFO 49129 (sg02) [build_flow_deps]: Done build fdeps 177889 Thu Aug 7 13:54:29 2025 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: build_fdeps finished after 0.168 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 390mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running remove_redundancies 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [RemoveRedundancies]: remove_clobbered_writes 2025-08-07T13:54:29Z INFO 49129 (sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 2025-08-07T13:54:29Z INFO 49129 (sg02) [RemoveRedundancies]: remove_useless_insts 2025-08-07T13:54:29Z INFO 49129 (sg02) [RemoveRedundancies]: remove Useless Instructions: 0 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: remove_redundancies finished after 0.018 seconds 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 390mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z USER 49129 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:29Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:29Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:29Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:29Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.224 seconds 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: Running tensor_copy_elim 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z INFO 49129 (sg02) [TensorCopyElim]: Tensor CP elimination: 0 2025-08-07T13:54:30Z INFO 49129 (sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.047 seconds 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 417mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z USER 49129 (sg02) [ModuleForkPass]: Running post_sched 2025-08-07T13:54:30Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:30Z INFO 49129 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:30 2025 2025-08-07T13:54:30Z INFO 49129 [post_scheduler]: Time-aware hwm post-sched 2025-08-07T13:54:31Z INFO 49129 [post_scheduler]: Time-aware simulation time: 5657487 2025-08-07T13:54:31Z INFO 49129 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:31 2025 2025-08-07T13:54:31Z USER 49129 (sg02) [ModuleForkPass]: post_sched finished after 1.336 seconds 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 455mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:31Z USER 49129 (sg02) [ModuleForkPass]: Running expand_scheduling_units 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:31Z USER 49129 (sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.006 seconds 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 455mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:31Z USER 49129 (sg02) [ModuleForkPass]: Running address_rotation_sb 2025-08-07T13:54:31Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:31Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 3740 PSUM Banks 2025-08-07T13:54:31Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 3864 PSUM Banks 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 9 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 31 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 30 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 4 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 106 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:32Z INFO 49129 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: address_rotation_sb finished after 0.955 seconds 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 456mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.168 seconds 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} 2025-08-07T13:54:32Z INFO 49129 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.036 seconds 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: Running dep_opt 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z INFO 49129 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 9Thu Aug 7 13:54:32 2025 2025-08-07T13:54:32Z INFO 49129 (sg02) [build_flow_deps]: Allocs: 9677 instructions: 50751 2025-08-07T13:54:32Z INFO 49129 (sg02) [build_flow_deps]: Build fdeps inserted 174221 edges 2025-08-07T13:54:32Z INFO 49129 (sg02) [build_flow_deps]: Done build fdeps 174221 Thu Aug 7 13:54:32 2025 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: dep_opt finished after 0.254 seconds 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: Running report_stats 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: Data Movement Statistics: sg0002 ┌─────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├─────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ Input -> Internal │ 1 │ 3145728 │ │ DMACopy │ Internal │ 1 │ 1048576 │ │ Load │ Const -> Internal │ 4 │ 34824 │ │ Load │ ExternalInput -> Internal │ 760 │ 773341708 │ │ Load │ Internal │ 19 │ 2414344 │ │ Save │ Internal │ 610 │ 2412292 │ │ Save │ Internal -> ExternalOutput │ 1 │ 4 │ └─────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 1 │ │ 4 │ 9 │ │ 8 │ 2 │ │ 16 │ 3 │ │ 64 │ 2 │ │ 256 │ 2 │ │ 512 │ 594 │ │ 1024 │ 14 │ │ 2048 │ 6 │ │ 6144 │ 64 │ │ 8192 │ 693 │ │ 60768 │ 1 │ │ 60776 │ 4 │ │ 1048576 │ 3 │ └─────────────────────┴───────┘ 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: MM Stats: #MatMults 42227 #MatMult-Transposes 19699 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: IO Tensor size combined: 773341712 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: IO Tensor Statistics: ┌────────────────────┬────────────────┬──────────┬──────────────┐ │ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ ├────────────────────┼────────────────┼──────────┼──────────────┤ │ input473 │ ExternalInput │ bfloat16 │ 622329856 │ │ input469 │ ExternalInput │ bfloat16 │ 50331648 │ │ input472 │ ExternalInput │ bfloat16 │ 50331648 │ │ input470 │ ExternalInput │ bfloat16 │ 50331648 │ │ input474 │ ExternalInput │ bfloat16 │ 8192 │ │ input471 │ ExternalInput │ bfloat16 │ 8192 │ │ input1 │ ExternalInput │ int32 │ 512 │ │ input3 │ ExternalInput │ float32 │ 12 │ │ output0 │ ExternalOutput │ int32 │ 4 │ └────────────────────┴────────────────┴──────────┴──────────────┘ 2025-08-07T13:54:32Z INFO 49129 (sg02) [ReportStats]: Large (Internal) Tensor Statistics: ┌───────────────────────┬──────────┬──────────┬──────────────┐ │ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ ├───────────────────────┼──────────┼──────────┼──────────────┤ │ input469_local_766_i4 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i3 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i7 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i5 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i1 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i6 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i9 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i8 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i2 │ Internal │ bfloat16 │ 3145728 │ │ input469_local_766_i0 │ Internal │ bfloat16 │ 3145728 │ └───────────────────────┴──────────┴──────────┴──────────────┘ 2025-08-07T13:54:32Z USER 49129 (sg02) [ModuleForkPass]: report_stats finished after 0.013 seconds 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 2025-08-07T13:54:32Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 30.750 seconds 2025-08-07T13:54:32Z INFO 49129 [BackendPassManager]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:32Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10897 memory location(s), 3 block(s), and 57355 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z USER 49129 [BackendPassManager]: Running assign_trigger_engine 2025-08-07T13:54:32Z INFO 49129 [BackendPassManager]: Inputs to assign_trigger_engine: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:32Z INFO 49129 (sg00) [AssignTriggerEngine]: Assigned trigger engine for 85 DMA instructions. Moved 65 DMA instructions to CC's engines. 2025-08-07T13:54:32Z INFO 49129 (sg01) [AssignTriggerEngine]: Assigned trigger engine for 10 DMA instructions. Moved 2 DMA instructions to CC's engines. 2025-08-07T13:54:33Z INFO 49129 (sg02) [AssignTriggerEngine]: Assigned trigger engine for 614 DMA instructions. Moved 4 DMA instructions to CC's engines. 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: assign_trigger_engine finished after 0.036 seconds 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10897 memory location(s), 3 block(s), and 57355 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: Running subgraph_parallel_pass 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: Running lower_local_collectives 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: Running extend_shared_lifetimes 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: Running lower_local_collectives 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: Running lower_local_collectives 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: Running extend_shared_lifetimes 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: Running extend_shared_lifetimes 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: Running dead_code_elim 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z INFO 49129 (sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:33Z INFO 49129 (sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:33Z USER 49129 (sg00) [SubgraphForkPass]: dead_code_elim finished after 0.004 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg01) [SubgraphForkPass]: dead_code_elim finished after 0.005 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions 2025-08-07T13:54:33Z USER 49129 (sg02) [SubgraphForkPass]: dead_code_elim finished after 0.046 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: subgraph_parallel_pass finished after 0.049 seconds 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10897 memory location(s), 3 block(s), and 57355 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: Running assign_hwdge_engine 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: assign_hwdge_engine finished after 0.010 seconds 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10897 memory location(s), 3 block(s), and 57355 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: Running alloc_queues 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z INFO 49129 (sg00) [AllocQueues]: DMACopy transpose will be triggered from multiple engines 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: Running alloc_queues 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: Running alloc_queues 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg01) [AllocQueues]: DMACopy transpose will be triggered from multiple engines 2025-08-07T13:54:33Z INFO 49129 (sg00) [AllocQueues]: Alloc Queue info: ┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ │ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ ├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ │ qSPIO0 │ input │ SP │ 16 │ 1 │ │ qPoolIO0 │ input │ Pool │ 16 │ 1 │ │ qSPSpillReload0 │ data │ SP │ 16 │ 3 │ │ qActSpillReload0 │ data │ Activation │ 16 │ 20 │ │ qPoolSpillReload0 │ data │ Pool │ 16 │ 64 │ │ qPoolDynamic │ dynamic │ Pool │ 16 │ 61 │ └───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: alloc_queues finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: Running chain_dma_transposes 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z INFO 49129 (sg02) [AllocQueues]: DMACopy transpose will be triggered from multiple engines 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: Running lower_control 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z INFO 49129 (sg01) [AllocQueues]: Alloc Queue info: ┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ │ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ ├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ │ qSPIO0 │ input │ SP │ 16 │ 1 │ │ qPoolIO0 │ input │ Pool │ 16 │ 1 │ │ qSPSpillReload0 │ data │ SP │ 16 │ 3 │ │ qActSpillReload0 │ data │ Activation │ 16 │ 8 │ │ qPoolSpillReload0 │ data │ Pool │ 16 │ 1 │ │ qPoolDynamic │ dynamic │ Pool │ 16 │ 217 │ └───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: alloc_queues finished after 0.001 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: Running chain_dma_transposes 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: Running lower_control 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: lower_control finished after 0.002 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: Running dep_reduction 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=518 blocks=1 instructions=1465 Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Start Dependency Reduction 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Processing async instrs... 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Processing secondary edges per engine... 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 1338 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 1462 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 1462 2025-08-07T13:54:33Z INFO 49129 (sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: lower_control finished after 0.005 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: Running dep_reduction 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=702 blocks=1 instructions=5139 Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Start Dependency Reduction 2025-08-07T13:54:33Z INFO 49129 (sg02) [AllocQueues]: Alloc Queue info: ┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ │ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ ├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ │ qSPIO0 │ input │ SP │ 16 │ 5 │ │ qPoolIO0 │ input │ Pool │ 16 │ 1 │ │ qSPSpillReload0 │ data │ SP │ 16 │ 19 │ │ qActSpillReload0 │ data │ Activation │ 16 │ 603 │ │ qPoolSpillReload0 │ data │ Pool │ 16 │ 7 │ │ qDVESpillReload0 │ data │ DVE │ 16 │ 4 │ │ qPoolDynamic │ dynamic │ Pool │ 16 │ 757 │ └───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: alloc_queues finished after 0.009 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Processing async instrs... 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Processing secondary edges per engine... 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: Running chain_dma_transposes 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: Running lower_control 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 5803 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Num Async removed: 0 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Finished dependency reduction: 9132 removed, new total 510 2025-08-07T13:54:33Z INFO 49129 (sg00) [DepReduction]: Finished Dependency Reduction 2025-08-07T13:54:33Z USER 49129 (sg00) [ModuleForkPass]: dep_reduction finished after 0.010 seconds 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 518 memory location(s), 1 block(s), and 1465 instruction(s). Max writers: 32 Max Readers: 72 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 6010 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 6010 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Num Async removed: 0 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Finished dependency reduction: 31089 removed, new total 853 2025-08-07T13:54:33Z INFO 49129 (sg01) [DepReduction]: Finished Dependency Reduction 2025-08-07T13:54:33Z USER 49129 (sg01) [ModuleForkPass]: dep_reduction finished after 0.032 seconds 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 702 memory location(s), 1 block(s), and 5139 instruction(s). Max writers: 48 Max Readers: 384 2025-08-07T13:54:33Z INFO 49129 (sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: lower_control finished after 0.071 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 460mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: Running dep_reduction 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=9677 blocks=1 instructions=50751 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Start Dependency Reduction 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Processing async instrs... 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Processing secondary edges per engine... 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 44613 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 46001 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 46001 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Num Async removed: 0 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Finished dependency reduction: 308387 removed, new total 15007 2025-08-07T13:54:33Z INFO 49129 (sg02) [DepReduction]: Finished Dependency Reduction 2025-08-07T13:54:33Z USER 49129 (sg02) [ModuleForkPass]: dep_reduction finished after 0.621 seconds 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: curr_vmrss: 472mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 9677 memory location(s), 1 block(s), and 50751 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.742 seconds 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: curr_vmrss: 452mb, ru_maxrss: 518mb (delta=0mb) 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Output has 3 module(s), 3 function(s), 10897 memory location(s), 3 block(s), and 57355 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [BackendPassManager]: Running nc_parallel_pass 2025-08-07T13:54:33Z INFO 49129 [BackendPassManager]: Inputs to nc_parallel_pass: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z USER 49129 [CoreForkPass]: Running bir_linker 2025-08-07T13:54:33Z INFO 49129 [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=10897 blocks=3 instructions=57355 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:33Z INFO 49129 (sgLnk) [BirLinker]: bir_linker cwd: 2025-08-07T13:54:33Z INFO 49129 (sgLnk) [BirLinker]: Num intermediates 111 2025-08-07T13:54:33Z INFO 49129 (sgLnk) [BirLinker]: Num Module Definitions 3 2025-08-07T13:54:33Z INFO 49129 (sgLnk) [BirLinker]: Linking to a call-graph structure 2025-08-07T13:54:33Z INFO 49129 (sgLnk) [BirLinker]: Added a new SpillReload Que qPoolPIOParam0 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: tensor_map verification successful. 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sgLnk/sg00/tensor_map.json 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: PostLink Stats: #MatMults 199502 #MatMult-Transposes 24041 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: Total Intermediate MMTs 1190 #out: 1120 #inp: 70 #symmetric: 0 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 37 #out: 35 #inp: 2 #both: 0 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: releasing pre-link modules 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [BirLinker]: linking Done. 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: bir_linker finished after 0.788 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 716mb, ru_maxrss: 716mb (delta=198mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running postlnk_dma_report 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 1055562024, 95.5358% input load, 0.206438% output write, 4.25772% spill/reload 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: postlnk_dma_report finished after 0.007 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running report_stats 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: Data Movement Statistics: main ┌─────────────┬──────┬───────┬───────┐ │ Instruction │ Kind │ Count │ Bytes │ └─────────────┴──────┴───────┴───────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ └─────────────────────┴───────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: Data Movement Statistics: sg0000 ┌──────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├──────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ ExternalInput -> Internal │ 1 │ 622329856 │ │ DMACopy │ Internal -> ExternalOutput │ 8 │ 8388608 │ │ DMACopy │ Internal -> Output │ 1 │ 2097152 │ │ Load │ Const -> Internal │ 3 │ 37120 │ │ Load │ ExternalInput -> Internal │ 45 │ 41952768 │ │ Load │ Internal │ 32 │ 1048576 │ │ Load (Spill) │ Internal │ 32 │ 33300480 │ │ Save │ Internal │ 20 │ 1572864 │ │ Save │ Internal -> Output │ 8 │ 1130498 │ └──────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 3 │ │ 4 │ 1 │ │ 32 │ 1 │ │ 64 │ 1 │ │ 128 │ 1 │ │ 256 │ 52 │ │ 512 │ 1 │ │ 2048 │ 4 │ │ 4096 │ 1 │ │ 8192 │ 44 │ │ 32520 │ 32 │ │ 262144 │ 8 │ │ 1048576 │ 2 │ └─────────────────────┴───────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: Data Movement Statistics: sg0001 ┌─────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├─────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ Input -> Internal │ 1 │ 3145728 │ │ DMACopy │ Internal -> ExternalOutput │ 8 │ 8388608 │ │ DMACopy │ Internal -> Output │ 1 │ 2097152 │ │ Load │ Const -> Internal │ 2 │ 36864 │ │ Load │ ExternalInput -> Internal │ 204 │ 192954880 │ │ Load │ Input -> Internal │ 3 │ 81920 │ │ Load │ Internal │ 2 │ 2097152 │ │ Save │ Internal │ 8 │ 2097152 │ │ Save │ Internal -> Output │ 2 │ 1048578 │ └─────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 3 │ │ 32 │ 1 │ │ 64 │ 2 │ │ 128 │ 1 │ │ 256 │ 3 │ │ 2048 │ 8 │ │ 6144 │ 64 │ │ 8192 │ 139 │ │ 262144 │ 8 │ │ 1048576 │ 5 │ └─────────────────────┴───────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: Data Movement Statistics: sg0002 ┌─────────────┬────────────────────────────┬───────┬───────────┐ │ Instruction │ Kind │ Count │ Bytes │ ├─────────────┼────────────────────────────┼───────┼───────────┤ │ DMACopy │ Input -> Internal │ 1 │ 3145728 │ │ DMACopy │ Internal │ 1 │ 1048576 │ │ Load │ Const -> Internal │ 4 │ 34824 │ │ Load │ ExternalInput -> Internal │ 760 │ 773341708 │ │ Load │ Internal │ 19 │ 2414344 │ │ Save │ Internal │ 610 │ 2412292 │ │ Save │ Internal -> ExternalOutput │ 1 │ 4 │ └─────────────┴────────────────────────────┴───────┴───────────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: ┌─────────────────────┬───────┐ │ Bytes per partition │ Count │ ├─────────────────────┼───────┤ │ 2 │ 1 │ │ 4 │ 9 │ │ 8 │ 2 │ │ 16 │ 3 │ │ 64 │ 2 │ │ 256 │ 2 │ │ 512 │ 594 │ │ 1024 │ 14 │ │ 2048 │ 6 │ │ 6144 │ 64 │ │ 8192 │ 693 │ │ 60768 │ 1 │ │ 60776 │ 4 │ │ 1048576 │ 3 │ └─────────────────────┴───────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: MM Stats: #MatMults 47624 #MatMult-Transposes 19893 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: IO Tensor size combined: 9981007404 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: IO Tensor Statistics: ┌────────────────────┬───────────────┬──────────┬──────────────┐ │ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ ├────────────────────┼───────────────┼──────────┼──────────────┤ │ input76_sg0000 │ ExternalInput │ bfloat16 │ 622329856 │ │ input473_sg0002 │ ExternalInput │ bfloat16 │ 622329856 │ │ input76 │ ExternalInput │ bfloat16 │ 622329856 │ │ input473 │ ExternalInput │ bfloat16 │ 622329856 │ │ input131 │ ExternalInput │ bfloat16 │ 50331648 │ │ input109 │ ExternalInput │ bfloat16 │ 50331648 │ │ input98 │ ExternalInput │ bfloat16 │ 50331648 │ │ input153 │ ExternalInput │ bfloat16 │ 50331648 │ │ input87 │ ExternalInput │ bfloat16 │ 50331648 │ │ input175 │ ExternalInput │ bfloat16 │ 50331648 │ └────────────────────┴───────────────┴──────────┴──────────────┘ 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: ┌─────────────────────────────┬──────────┬──────────┬──────────────┐ │ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ ├─────────────────────────────┼──────────┼──────────┼──────────────┤ │ input89_local_952_sg0001 │ Internal │ bfloat16 │ 4194304 │ │ input78_local_1059_sg0000 │ Internal │ bfloat16 │ 4194304 │ │ input84_local_886_i2_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i5_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i1_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i3_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i4_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i7_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i6_sg0001 │ Internal │ bfloat16 │ 3145728 │ │ input84_local_886_i0_sg0001 │ Internal │ bfloat16 │ 3145728 │ └─────────────────────────────┴──────────┴──────────┴──────────────┘ 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: report_stats finished after 0.014 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running coloring_allocator_dram_post_lnk 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: reserved space = 8342039572 bytes 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: spill space = 75579464 bytes 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: aligned spill space = 75726848 bytes 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: renumber locations 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: size = 111 2025-08-07T13:54:34Z INFO 49129 []: find first defs for local 2025-08-07T13:54:34Z INFO 49129 []: find first defs for global 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: Num intervals 111 Num locations 111 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: IntervalTree Build Done 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: info.neighbors init Done 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: IntervalTree readback Done 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: simplify interference graph 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: initialize low and high 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: lo = 111 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: hi = 0 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: total = 111 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: simplify 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: new candidates = 0 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 5242880 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: select ranges 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 5242880 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: Real CC buffer size 5242880 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 10579968 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [DRAM_Allocator]: DRAM allocation successful 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.042 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.028 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running lower_dynamic_dma 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: lower_dynamic_dma finished after 0.007 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running legalize_dynamic_dma 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [LegalizeDynamicDMA]: ┌───────────┬───────────────────────────────┬────────────────────────────┐ │ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ ├───────────┼───────────────────────────────┼────────────────────────────┤ │ Peeling │ 0 │ 0 │ │ Unrolling │ 0 │ 0 │ │ Splitting │ 0 │ 0 │ └───────────┴───────────────────────────────┴────────────────────────────┘ 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: legalize_dynamic_dma finished after 0.020 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running lower_dma 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [LowerDMA]: lower_dma metrics start IO Copy (DGE/DMA) 128 partition : 7935/7935 (100% DGE) power-of-2 partition : 7976/8017 (99.4886% DGE) > 3 dimensional : 0/0 non-integer desc size : 0/0 total : 7976/8017 (99.4886% DGE) Cast (DGE/DMA) 128 partition : 147/147 (100% DGE) power-of-2 partition : 147/148 (99.3243% DGE) > 3 dimensional : 0/0 non-integer desc size : 0/0 total : 147/148 (99.3243% DGE) Spill/Reload Copy (DGE/DMA) 128 partition : 0/487 (0% DGE) power-of-2 partition : 0/1140 (0% DGE) > 3 dimensional : 0/0 non-integer desc size : 0/0 total : 0/1140 (0% DGE) Cast (DGE/DMA) 128 partition : 0/0 power-of-2 partition : 0/0 > 3 dimensional : 0/0 non-integer desc size : 0/0 total : 0/0 CopyMode CCE : 36 Transpose : 0 Replicate : 0 Dynamic (DGE/DMA) scalar : 1/1 (100% DGE) vector : 289/289 (100% DGE) Opcode ReadVarAddr : 0 IndirectLoad : 0 IndirectSave : 0 IndirectSaveAccumulate : 0 DstReduceDGE : 0 lower_dma metrics end 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: lower_dma finished after 0.048 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running expand_all_engine 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: expand_all_engine finished after 0.009 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running alloc_semaphores 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: alloc_semaphores finished after 0.046 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57415 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running expand_inst_late 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=11556 blocks=4 instructions=57415 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: expand_inst_late finished after 0.044 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57439 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running seq_inst_opt 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=11556 blocks=4 instructions=57439 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [SeqInstOpt]: Removing 10 unnecessary InstRegisterMove instruction(s) from Block1 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [SeqInstOpt]: Removing 7 unnecessary InstRegisterMove instruction(s) from Block1 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: seq_inst_opt finished after 0.006 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 57422 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running lower_sync 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=11556 blocks=4 instructions=57422 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: lower_sync finished after 0.018 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59204 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running lower_act 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=11556 blocks=4 instructions=59204 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: lower_act finished after 0.007 seconds 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: curr_vmrss: 406mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z USER 49129 [CoreForkPass]: Running lower_dve 2025-08-07T13:54:34Z INFO 49129 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:34Z INFO 49129 (sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json 2025-08-07T13:54:35Z USER 49129 [CoreForkPass]: lower_dve finished after 0.071 seconds 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: curr_vmrss: 411mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [CoreForkPass]: Running lower_ap 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [CoreForkPass]: lower_ap finished after 0.011 seconds 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: curr_vmrss: 411mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [CoreForkPass]: Running coloring_allocator_reg 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: allocating REG 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: main loop iteration 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: allocating REG 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: main loop iteration 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: renumber registers 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: size = 3 2025-08-07T13:54:35Z INFO 49129 []: find first defs for local reg 2025-08-07T13:54:35Z INFO 49129 []: find first defs for global reg 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: live range analysis 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: find costs 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify interference graph 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: initialize low and high 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: lo = 3 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: hi = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: inf = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: total = 3 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: new candidates = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: select ranges 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: no more spills 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: allocating REG 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: main loop iteration 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: renumber registers 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: size = 1 2025-08-07T13:54:35Z INFO 49129 []: find first defs for local reg 2025-08-07T13:54:35Z INFO 49129 []: find first defs for global reg 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: live range analysis 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: find costs 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify interference graph 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: initialize low and high 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: lo = 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: hi = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: inf = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: total = 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: new candidates = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: select ranges 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: no more spills 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: Allocating functions 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [ColoringAllocator::Rep]: linearize and check 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: allocating REG 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: main loop iteration 1 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: renumber registers 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: size = 4 2025-08-07T13:54:35Z INFO 49129 []: find first defs for local reg 2025-08-07T13:54:35Z INFO 49129 []: find first defs for global reg 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: live range analysis 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: find costs 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify interference graph 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: initialize low and high 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: lo = 4 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: hi = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: inf = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: total = 4 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: simplify 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: new candidates = 0 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: select ranges 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: no more spills 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation 2025-08-07T13:54:35Z USER 49129 [CoreForkPass]: coloring_allocator_reg finished after 0.079 seconds 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: curr_vmrss: 414mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [CoreForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: nc_parallel_pass finished after 1.299 seconds 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: curr_vmrss: 414mb, ru_maxrss: 716mb (delta=198mb) 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [ModuleForkPass]: Running birverifier 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [ModuleForkPass]: birverifier finished after 0.063 seconds 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: curr_vmrss: 420mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.065 seconds 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: curr_vmrss: 420mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: Running subgraph_parallel_pass 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [SubgraphForkPass]: Running lnc_verifier 2025-08-07T13:54:35Z INFO 49129 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds 2025-08-07T13:54:35Z INFO 49129 [SubgraphForkPass]: curr_vmrss: 420mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [SubgraphForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: subgraph_parallel_pass finished after 0.002 seconds 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: curr_vmrss: 420mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: Running mod_parallel_pass 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [ModuleForkPass]: Running codegen 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Total compiler allocated DRAM tensors: 0.00985336 GB 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: ┌────────────────┬─────────────┐ │ TensorKind │ Size (GB) │ ├────────────────┼─────────────┤ │ ExternalInput │ 7.6285 │ │ ExternalOutput │ 0.0703125 │ │ Const │ 0.000101335 │ └────────────────┴─────────────┘ 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Total runtime managed DRAM tensors: 7.69892 GB 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Instruction Stats: 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: ┌─────────────────────┬───────┐ │ Opcode │ Count │ ├─────────────────────┼───────┤ │ MATMUL │ 47625 │ │ LDWEIGHTS │ 47474 │ │ ACTIVATE │ 6708 │ │ EVENT_SEMAPHORE │ 1782 │ │ UNKNOWN(0xd4) │ 1035 │ │ PSEUDO_DMA_TRIGGER │ 742 │ │ MATCH_VALUE_LOAD │ 441 │ │ FIND_INDEX8 │ 224 │ │ MAX8 │ 224 │ │ MATCH_REPLACE8 │ 217 │ │ UNKNOWN(0xd3) │ 185 │ │ TENSOR_SCALAR_ADDR │ 151 │ │ TENSOR_TENSOR │ 150 │ │ POOL_BUFFER_LOAD │ 99 │ │ GATHER │ 99 │ │ UNKNOWN(0x8b) │ 97 │ │ MEMSET │ 30 │ │ UNKNOWN(0xda) │ 25 │ │ TENSOR_SCALAR │ 23 │ │ PSEUDO_BRANCH_LABEL │ 20 │ │ UNKNOWN(0x8a) │ 16 │ │ TENSOR_REDUCE │ 16 │ │ UNKNOWN(0xd2) │ 15 │ │ ACT_TABLE_LOAD │ 12 │ │ COPY │ 11 │ │ CAST │ 10 │ │ UNKNOWN(0xcf) │ 10 │ │ PSEUDO_DMA_REARM │ 10 │ │ UNKNOWN(0x8d) │ 8 │ │ UNKNOWN(0xd9) │ 7 │ │ UNKNOWN(0xe8) │ 7 │ │ RECIPROCAL │ 5 │ │ MOVE │ 4 │ │ UNKNOWN(0x92) │ 4 │ │ STREAM_SHUFFLE │ 4 │ │ LOAD_MASK_SELECT │ 4 │ │ IOTA │ 3 │ │ ALU_OP │ 2 │ │ UNKNOWN(0xe5) │ 2 │ │ TENSOR_SCALAR │ 1 │ │ PSEUDO_TENSOR_LOAD │ 1 │ │ RNG │ 1 │ └─────────────────────┴───────┘ 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: ┌────────────┬───────┐ │ Engine │ Count │ ├────────────┼───────┤ │ Unassigned │ 0 │ │ GPSIMD │ 2332 │ │ Scalar │ 8112 │ │ Tensor │ 95421 │ │ SyncDMA │ 0 │ │ Vector │ 1565 │ │ Sync │ 94 │ │ All │ 0 │ └────────────┴───────┘ 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Total instructions: 107524 (0.00640893 GB) 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Total DynamicDMA instruction count: 1035 2025-08-07T13:54:35Z USER 49129 (sgLnk) [Codegen]: isa_gen finished after 0.410 seconds 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: ┌───────────────────────────┬────────────────┐ │ Queue Instance │ RT Descriptors │ ├───────────────────────────┼────────────────┤ │ qActSpillReload0_defId_0 │ 5120 │ │ qActSpillReload0_defId_1 │ 2048 │ │ qActSpillReload0_defId_2 │ 2476 │ │ qDVESpillReload0_defId_2 │ 8 │ │ qPoolIO0 │ 2 │ │ qPoolPIOParam0 │ 72 │ │ qPoolSpillReload0_defId_0 │ 10240 │ │ qPoolSpillReload0_defId_1 │ 256 │ │ qPoolSpillReload0_defId_2 │ 1030 │ │ qSPIO0 │ 18442 │ │ qSPSpillReload0_defId_0 │ 514 │ │ qSPSpillReload0_defId_1 │ 768 │ │ qSPSpillReload0_defId_2 │ 1054 │ └───────────────────────────┴────────────────┘ Total descriptors: 42030 (0.000626296 GB) 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Number of DMA engines used by each queue: ┌───────────────────┬──────────────────────┐ │ Queue │ DMA Engines │ ├───────────────────┼──────────────────────┤ │ qPoolDynamic │ 16 │ │ qSPSpillReload0 │ 16 │ │ qSPIO0 │ 16 │ │ qActSpillReload0 │ 16 │ │ qPoolSpillReload0 │ 16 │ │ qPoolIO0 │ 16 │ │ qDVESpillReload0 │ 16 │ │ qPoolPIOParam0 │ 16 │ ├───────────────────┼──────────────────────┤ │ TOTAL │ 128 (must be <= 176) │ └───────────────────┴──────────────────────┘ 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Tensors with largest descriptor count: ┌────────────────────────────┬──────────┬──────────┬──────────────────┐ │ Tensor Name │ Kind │ Src Type │ Descriptor Count │ ├────────────────────────────┼──────────┼──────────┼──────────────────┤ │ dot.7-buffer-1531_sg0001 │ Internal │ bfloat16 │ 4 │ │ dot.14-buffer-2748_sg0002 │ Internal │ bfloat16 │ 4 │ │ 950.1676_i3_sg0000 │ Internal │ bfloat16 │ 4 │ │ dot.4-buffer-1868_sg0000 │ Internal │ bfloat16 │ 4 │ │ dot.11-buffer-1536_sg0001 │ Internal │ bfloat16 │ 4 │ │ transpose.1_sg0000 │ Internal │ bfloat16 │ 16 │ │ all-reduce.531.1544_sg0001 │ Internal │ bfloat16 │ 35 │ │ add.4_sg0001 │ Internal │ bfloat16 │ 36 │ │ all_gather.1_sg0000 │ Internal │ bfloat16 │ 64 │ │ convert.59_sg0002 │ Internal │ float32 │ 599 │ └────────────────────────────┴──────────┴──────────┴──────────────────┘ 2025-08-07T13:54:35Z USER 49129 (sgLnk) [Codegen]: dma_desc_gen finished after 0.024 seconds 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Estimated peak DRAM usage: 7.71581 GB 2025-08-07T13:54:35Z INFO 49129 (sgLnk) [Codegen]: Generating debug info 2025-08-07T13:54:35Z WARNING 49129 (sgLnk) [Codegen]: Found 127 instructions with more than 100 dependencies. For each such instruction, skipping writing more than 100 dependencies into the built-in NEFF debug info to prevent excessive compile time and NEFF size. For those instructions, the Neuron profiler will not display the skipped dependencies. 2025-08-07T13:54:35Z USER 49129 (sgLnk) [Codegen]: debug_info_gen finished after 0.146 seconds 2025-08-07T13:54:35Z USER 49129 [ModuleForkPass]: codegen finished after 0.597 seconds 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: curr_vmrss: 468mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [ModuleForkPass]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: mod_parallel_pass finished after 0.600 seconds 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: curr_vmrss: 468mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: Running neff_packager 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=4 allocs=11556 blocks=4 instructions=59216 Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1124_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_1405_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0000_t1879_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1144_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0001_t1547_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.24_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.25_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26-809-913_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1055_CRSM.npy 2025-08-07T13:54:35Z INFO 49129 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint 2025-08-07T13:54:35Z WARNING 49129 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json 2025-08-07T13:54:35Z INFO 49129 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.neff 2025-08-07T13:54:35Z INFO 49129 [NeffFileWriter]: IR signature: a8a3756d7053d8c4542e3fd51c392fbe for neff artifacts 2025-08-07T13:54:35Z USER 49129 [BackendPassManager]: neff_packager finished after 0.088 seconds 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: curr_vmrss: 469mb, ru_maxrss: 716mb (delta=0mb) 2025-08-07T13:54:35Z INFO 49129 [BackendPassManager]: Output has 1 module(s), 4 function(s), 11556 memory location(s), 4 block(s), and 59216 instruction(s). Max writers: 594 Max Readers: 19699 2025-08-07T13:54:35Z INFO 49129 [BackendDriver]: HBM scratchpad usage summary (post-allocation): ┌──────┬───────────┬────────────────────────────────────────────────────────────┬─────────────┐ │ Core │ Subgraph │ Description │ Value │ ├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ │ nc00 │ sg00 │ Peak scratchpad usage: local │ 0.003418 GB │ │ nc00 │ sg00 │ Total size of allocated tensors: local │ 0.003418 GB │ │ nc00 │ sg01 │ Peak scratchpad usage: local │ 0.004883 GB │ │ nc00 │ sg01 │ Total size of allocated tensors: local │ 0.004883 GB │ │ nc00 │ sg02 │ Peak scratchpad usage: local │ 0.002945 GB │ │ nc00 │ sg02 │ Total size of allocated tensors: local │ 0.004242 GB │ │ nc00 │ Max │ Peak scratchpad usage: local │ 0.004883 GB │ │ nc00 │ Post-link │ Peak scratchpad usage after intermediate tensor allocation │ 0.009853 GB │ │ nc00 │ Post-link │ Total size of allocated intermediate tensors │ 0.070526 GB │ ├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ │ Max │ Max │ Peak scratchpad usage │ 0.009853 GB │ │ Max │ Max │ Peak scratchpad usage (page-aligned) │ 0.500000 GB │ └──────┴───────────┴────────────────────────────────────────────────────────────┴─────────────┘ 2025-08-07T13:54:35Z INFO 49129 [BackendDriver]: Backend completed successfully, tearing down. 2025-08-07T13:54:36Z INFO 47514 [job.WalrusDriver.0]: new_lnkState: {"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sgLnk/sg00", "state_id": "sgLnk"} 2025-08-07T13:54:36Z INFO 47514 [job.WalrusDriver.0]: MTBackend: completed successfully. 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 2025-08-07T13:54:36Z INFO 47514 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/sgLnk/sg00", "state_id": "sgLnk"}' --pipeline BIRLinker 2025-08-07T13:54:36Z INFO 47514 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5 2025-08-07T13:54:36Z INFO 47514 [job.BIRLinker.0]: Linking already done. 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.Kelper.0 2025-08-07T13:54:36Z INFO 47514 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.Kelper.0 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: Processing input #0 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: Start NeffWrapper 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.hlo_module.pb --neff /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/model.MODULE_f4171003694760566af4+a9cd68fb.neff --io_transposes /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/io_transposes.json --output /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/wrapped_neff.hlo --netlist /home/ubuntu/qwen3/context_encoding_model/_tp0_bk0/neuronxcc-vebk23i5/hlo_netlist.json 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: There are no io transposes nor zero-sized parameters. Output will not be produced. Hlo neff wrapper finished successfully. Have a wonderful day :D 2025-08-07T13:54:36Z INFO 47514 [job.NeffWrapper.0]: Job #0 finished 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Finished pipeline Pipeline 2025-08-07T13:54:36Z INFO 47514 [pipeline.Pipeline.0]: Job #0 finished 2025-08-07T13:54:36Z INFO 47449 [root]: Subcommand returned with exitcode=0