I feel that I have become more and more obsessed with studying some "primitive" CLI operations recently.
Compared to the so-called MCP and Skill, enabling AI to understand and use CLI is actually more feasible, explainable, and powerful in terms of code.
I recently deployed a website for my OpenSoul on Vercel. In the past, I might have needed to spend a lot of cognitive effort or time to understand how to operate on the Vercel page, and I would have had to spend a great deal of time reading documents (smarter people might directly feed the documents to AI and let AI summarize feasible and reliable steps).
But in fact, after ChatGPT told me that Vercel actually has a CLI, I directly asked my Copilot in VS Code to download this command line, clearly stated my needs, and it quickly solved everything else. The only thing I actually needed to do was log in to Vercel and create a key.
This suddenly reminds me of a blog post I read earlier that interviewed the father of Claude Code. The reason why Claude Code did not develop front-end pages and the like is precisely because he believes that we should focus most of our energy on the most meaningful interaction logic.
So, in an era where AI capabilities are becoming increasingly strong, perhaps what we really need is to pick up those tools that we used with the sole goal of achieving functionality when computing power was tight. What do you think?
Recently, I've come across some practices in the community where skills empower intelligent agents, and I'd like to share my thoughts on the future of agents inspired by these practices.
Hugging Face (hf) recently created a skill related to kernels and achieved good results in two tests. We know that AI infrastructure is actually a relatively high-threshold task that requires considering many variables and pursuing ultimate performance. However, when we internalize the skills in this field into a single skill, it truly brings about tremendous changes.
Perhaps we need to refocus our attention on this function.
Currently, many of our AI products spend a lot of effort on redundant tasks such as prompt engineering and workflow building. But when these AI products are being developed, they fail to consider that the essential capabilities of large models (context, memory, collaboration, logical reasoning) are actually constantly improving. We don't really need these complex tasks.
I believe that "great truths are simple" is the only solution.
Currently, most tasks can be accomplished with a command-line tool, a skill, and one or more large models. There's no need for any other complex logic—just bash.
Maybe there aren't that many things we need to do right now. Find a sufficiently vertical field, internalize the knowledge within that field into a skill (which can also take other forms), create interfaces for any channels you can think of in the form of command lines, and allow AI to thrive in as many tasks as possible.
Then leave everything to AI. The power of bash is beyond your imagination. @AdinaY@burtenshaw@clem@evalstate
Recently, I have open-sourced an AI emotional companion product based on openclaw, called opensoul.
On this platform, you can create a "soulmate" that matches your personality, and configure it with the skills, tools you want it to have, as well as the platforms it can integrate with (such as Telegram, Discord, etc.). You can even create group chats, invite multiple agents and your friends to chat about recent events, discuss projects together, and so on.
On the one hand, I hope it can better accompany you in daily life by virtue of its unique memory mechanism, self-feedback and iteration mechanism, and the modeling of users' emotions. On the other hand, I also hope it can help you better handle your work with its unique skills, tools and ability to deal with complex task scenarios.
Although the entire product has taken shape, I think there are still many areas that need adjustment and optimization. I also hope to rely on the strength of the community to do a good job in AI emotional companionship.
Friends of the community, I have recently had some new ideas.
Some time ago, I came across a research analysis from two investors at a16z. In the past year of 2025, ChatGPT actually tried to promote some new AI functions in fields such as shopping, but in fact, the effect was not good.
I think the fundamental reason lies in the user's mindset, or rather, the user's interaction logic in vertical fields. The most prominent and distinctive feature of ChatGPT is that all-encompassing dialogue box, which is also a common problem with many homogeneous AI products nowadays (it seems that without a dialogue box, the AI's capabilities are sealed off).Although it can be adapted to many scenario fields, it will appear very boring in more vertical scenarios
Ask yourself, would you prefer the image-text waterfall flow interaction in shopping scenarios like Xiaohongshu, or the monotonous search box of ChatGPT? The answer is actually obvious from the start.
For all vertical scenarios, the interaction logic was already very well-developed before the emergence of AI. The user experience brought by such interaction logic is definitely not something that a single dialogue box can replace.
And if we want to create a good AI product in a vertical field, we should think more about how to silently embed the powerful capabilities of AI into the original interaction, and continuously iterate to provide users with a better experience.@lilianweng@clem@AdinaY
Recently, I developed a tool in the field of AI learning: when a user inputs any knowledge point, the intelligent agent can write animation storyboards based on that knowledge point, generate the corresponding animation code for the storyboards through a coding agent, and finally render it into a video using the Manim engine.
The overall effect is similar to the videos from 3blue1brown. I hope that through such a tool, everyone can freely learn through videos of the same quality as 3b1b's.
However, I have recently encountered a problem regarding the video content. It is difficult to position geometric figures, symbols, etc., in the correct positions in the video, that is, there is a problem with positioning. I tried extracting video frames after generating the video and submitting them to a VLM for review to identify visual issues, and continuously modifying the prompts to optimize the generation quality, but the results were not satisfactory.
I wonder if anyone has any good methods to solve this positioning problem in the video.