🔥 UPGRADE in Kai: 30B Scaling! 🔥 NoesisLab/Kai-30B-Instruct NoesisLab/Kai-30B-Instruct We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! 🚀 If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we. Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training. The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward pass—no external scaffolding required. At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks. 🧪 Test Kai yourself in our new Space: NoesisLab/Kai-30B-Instruct 📦 Model Weights: NoesisLab/Kai-30B-Instruct Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! 🧱💥
PHP-Code-Large is a large-scale corpus of PHP source code comprising more than 12 million lines of PHP code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and static program analysis for the PHP ecosystem.
By providing a high-volume, language-specific corpus, PHP-Code-Large enables systematic experimentation in PHP-focused model training, domain adaptation, and downstream code understanding tasks.
PHP-Code-Large addresses the need for a dedicated PHP-only dataset at substantial scale, enabling focused research across backend systems, CMS platforms, APIs, and full-stack PHP environments.
Ethos: In our team at UT Austin, we train students to become full-stack researchers—and increasingly, designers of the systems that do research. Our students learn to carry projects end-to-end: from idea generation and theory to data creation, analysis, and iterative refinement across diverse subfields. Using modern AI (including agentic workflows) and scalable computation, students build reproducible pipelines that can ingest and update planetary-scale data—like satellite imagery and other high-dimensional sources. But the goal isn’t tool use for its own sake: students learn to set the objectives, constraints, and evaluation standards that guide these systems through large spaces of hypotheses, while grounding results in causal inference and careful measurement. The outcome is scholarship that can rigorously test policy counterfactuals and translate evidence into durable, responsible improvements in societal well-being.
We welcome students at every stage to engage with projects—from motivated high-schoolers to undergraduates, graduate students, and those from highly non-traditional backgrounds.