普通视图

发现新文章,点击刷新页面。
昨天以前informal

AI on academic research

作者 Benson
2026年3月5日 00:00
AI usage on academic research is one of the best reference for AI application. Terence Tao The earliest application of AI on academic research is from Terence Tao, who use and share the usage of cutting-edge AI products at early stages of AI development. I kept adobt at that time because AI is till a little bit dump sometimes, Academic research is the hardest and most sacred area. But the following application proved that AI is very useful on academic research. Terence use AI to solve many the most complex issue in mathmatics fields Tao co-found sair with other professors to scale scientific discovery with AI, and ground AI in science at 2026. Autoresearch Another example is autoresearch from karpathy, this project use AI to optimized the parameter, code, even models to automate the iteration process. It’s not a perfect project because it hiden some details in iteration process, some of them might be buggy or be blackbox. But nobody can totally explain how LLM works, it’s still blackbox because it has too many parameters, but it eventuall works. This is a good evidence to provde that autoresearch might works in similar way with LLM. If we have to talk about white box, we are still able to check the iteration logs to know what happened, to revised it if something wrong. This example shows a big progress of agentic ability of AI. My research I was using codex for my research paper as well in the past two months. It can help for brainstorming, investiagte new ideas, coding, validate ideas, write reports. AI cannot make decision for me, I’m the one to point out the direction of next step, I enjoyed the process because it’s smooth. I spent most of the time on thinking, I don’t have to do other works, it’s enjoyable. Of course, the premise is AI can follow my instruction correctly, for example, I found gemini flash 3.0 is much worse than gpt 5.4, gemini 3.1 pro is still a little bit worse. Summary AI application on academic research is a evidence that it’s worthwhile to try in any field. What human should do is thing above the line of AI capability.

How AI Will Change the Mobile Ecosystem

作者 Benson
2026年2月24日 00:00
How mobile development will be next year There are many mobile development platforms for non-technical users nowadays, like Rork and Lovable. They are not perfect now but will become much better in a year, just like how ChatGPT changed in the year after it was published, and like how much vibe coding agents have progressed in the last year. These platforms can generate everything we need to submit to the App Store and Google Play Store, like app packages, screenshots, privacy policies, and descriptions. Maybe next year, everyone can build and publish one app in 3 hours. I don’t mean a professionally designed app like Photoshop or Figma, but it’s very possible for a standard music app or management app. Z-Library & Copyright Many users use Z-Library to download books rather than paying for them, but they reject it when someone wants to clone their app. What is the reason? I think it’s because we’re developers, and we make up the majority of voices on the internet. We take advantage of Z-Library, but we lose our advantage with an application copy. This means copying applications will become more like Z-Library if we ignore our attitude. Moreover, how do we define copyright for an app? Unlike a book, it’s hard to identify whether an app violates copyright or not. Based on the two arguments above, I think copying will happen more in the future; some users will choose the cheaper ones because of the price. Media news Another example is transferring from newspapers to UGC platforms. Nowadays, we obtain information from lots of websites other than official news media, like Hacker News, Twitter, and personal blogs. It takes zero cost. Oh, it’s not zero—we have to pay for cloud costs through advertisements. It’s not much, just like the cost to build a new app next year. Nowadays, there is too much information generated by non-professional writers, and most users read information from them over professional news media. New media didn’t kill news, but made the ecosystem richer. It might be the same in the future for applications. Better market: Games We usually use apps to solve a problem; it’s different from reading books. We usually don’t want to explore new apps when we don’t have a demand. Books are more like games—we like to explore new games for fun. So, I think AI will have more influence on games, like other creativity fields. Summary AI will bring strong creativity to the app ecosystem. The costs will become lower for users, and the market will become more competitive for creators and the ecosystem will become richer for stores.

Look ahead

作者 Benson
2026年2月23日 00:00
ChatGPT was published in November 2022, it’s 3 years and 3 month till now. Look back, it’s hard to believe that it change so much for computer science and programming. How AI change programming When ChatGPT was published at frist time, we were surprised that it can answer unlimited questions, after that, we started to use it for every question while ridicule, we shared the hallucination with colleagues and friend when we met the ridiculous problem. But that didn’t change the frequency people use it. The appear of cursor stand new stage of application on programming of AI, I didn’t try cursor at first, I thought VS code will with in the end, as past competition of editor. of course, I think I was totally wrong. Next stage is coding agent, like claude code, antigravity and codex. We don’t need to write code in editor window, it’s like the premium version of autocomplate in cursor. I can see that they are becoming better and better in the past half year. Right now, I become too lazy to write code by myself. I havn’t use claude code much now, based on my expericen on antigravity and codex, the coding agent is perfect, but I know the boundary of AI’s capacity, I know that they can edit where I want them to. I don’t have to change code by myself as long as AI coding agent can locate correctly based on my instruction. When I open VS code now, I think it’s like the product of last century. It’s hard to predict This remind me that some people said this years will be the best year in next 10 years in 2019, nobody believed that. But it’s true. We usually don’t belive the thing that havn’t happened, it’s hard predict the future. But it’s much better if we can realize that it’s hard to belive, as a result, it’s more probably to predict part of the future. We also can have more willness to believe new products. Influence AI bring mobile ecosystem AI have significantly reduced the cost of mobile development. There are more and more mobile development platform now, they automate the whole process of mobile deveopment, you don’t need to code 1 line and don’t need to open editor. What you need to is chating and validate the result. The interesting topic of mobile development is how it will change the mobile ecosystem future. Today, I remind the influence electronic book brought to paper book. Of course, it’s not totally same. One similar point is that new mehtod reduce the cost significantly. How much we are willing to pay for the creator of application if its cost is low. Of course, some users will pay for the beautiful appearance, future support and trust, but most users pay for current APP, not others. If we copy currently great app with low cost, how many users will chose the low-price one? it’s hard to say. What about the app that need backend? it depends on what data the backend has, copyright data or just store user’s data. The worlding is change faster than what we thought, keep looking ahead

Goodbye 2025

作者 Benson
2026年1月13日 00:00
Previous Plan My first blog was summary of 2021, that is the start of the blog, it’s a little hard to believe that I write this blog for five years, I like this feeling. See back the plan of 2025, it’s just very simple. English: fluent Exercise New Products Read My english is much better than me in the start of 2025, one of the most important reasons is that I came to Canada and start to learn here, but not bad, I think achieve well for this goal even I not very fluent. Speaking of Exercise, I didn’t exercise much during majority of 2025, but I exercise regularly recently, that makes feel I finished this task. Yeah, people usually care more what happend recently over in the past, That’s why we need to record what happend and what we are feeling now. Regarding of new products, I really build a lot of product this years, most of them are useless, only me care about them. In particular, I build a lot recently, This is the frist time I have strong feeling that I have so strong productivity with AI. I like the feeling that produce. I didn’t read much this year, including recently. It’s too busy. In addition, I lost interest and passion to read old information when I want to build something new. Others in 2025 I like the new era of AI, I think it bring more productivity to everyone. We can create more tools, products as we want. In the past, some of reasons that I like AI was from my work, now, I touched the power of AI. I want to automate everything that can be done by AI recently, it’s possible. It seems that I’m too excited recently, I don’t want to keep this status for a long time, it’s not good for my physical body. I need a better balance. Antigravity is the pivot for me to realize the power of AI because I have so many tokens to use, to program. That makes me understant why it’s more hard for poor people to earn money, because they have more resource to use, there are more space for them to use their creativity. After arriving Canada, I change a lot for my eating habit, I don’t want to eat fully now, it’s not healthy for my physical body. I also want to eat more simple food, that means food that are created easily. I obtained these knowledge from naval books. I also start to by stock in end of 2025, even I don’t have much money to invest, but it’s a new start. 2026 What I want to achieve: Health: Eating Exercising Sports Products: Build some great products APP Saas Paper Investment: Save money to prepare Annual blog 2024 2023 2022 2021

Introduction of Fraud detection

作者 Benson
2025年9月15日 00:00
In short, Target of fraud detection is to detect fradulent traffic and filtering or block them. What mekes this interesting and difficult are what we are detecting are human, it might be as hard as you can imagine to defense with huamn. Strategy Rules Most of the traffic are produced by pruely machine/device without any human involved, like Ddos. Rules usually can handle these kind of fraudulent traffic because they are too obvious. The core ideas to design a rule is to fine a dimension and feature, like count of traffic from same IP. What we need is find a proper dimension and valid features for current case. Models After we block the huge amount of fraudulent traffic, fraud rings usually will add more human-liked feature into the traffic, or create the behaviour totally manually. What they care is RIO, just like what we cared. If the profit is high enough for the manual operation time, it will worth. We need more features and signals to detect fraudulent traffic when they are more likely produced by human, that’s why we need statistic algorithm and machine learning models to detect. Anomaly deteciton Linear regression Tree-base models Graph-base models Deep learning models Mixture of above fraud similation We can simulate the fraud rings to attack our business, which can improve anti-fraud quality help use to estimate the cost to break anti-fraud productions. architecture Online Online Service usually reponse result within 50ms, errors in online product might block all of users of bussiness, and it some times really happens becuase it is designed to blcok users. So we should make sure we don’t give false positive result to business. Streaming We need streaming system to calculate faeture values, these feature will be used in rules and models. Offline Some complex algorithm cannot be implemented in online and streaming system, we can keep high-productivity iteration in offline stage. We also can try various different methods without considering efficiency. In addition, we can detect the hardest fraudulent traffic in offline stage, which is benefitical for online model training. Risk alert Risk alert can prevent from false negative samples. We can reduce the threshold of online rules and models to create risk alert methods, it’s also a great method to apply low-accuracy algorithms. Monitor service Monitor the traffic that we judge fraudulent, prevent from false positive. Montiro and analysis feedback from business, the have more clear view about these traffic. Summary Fradu detection is a mixture area that need engineering, statistics, algorithms and security, it’s a very interesting topic if you like it. It will be more challenging influenced by AI because there are less and less difference between huamn and AI.

Hacker News to Kindle

作者 Benson
2025年12月17日 00:00
Why kindle is better I usualy read kindle on the bed before I fall asleep, I think the big advantage of reading kindle is that kindle leads less distractions. When I use computer and browser the website, it’s so eary to do other think when a notification jump, like text messages app or gmail. In contrast, When I read on kindle, I only focus on kindle and nothing else. In the past, I just readed books on kindle. But recently, I found that I can send weekly The Economist edition to kindle to read news. But the weekly update cannot meet my requirement, because my wife spend much more time on preparetion on falling asleep. I realize that I can send more news into kindle to read. It’s a much better choice than reading on the computer or Phone. HN2Kindle The best choice is Hacker News, that is the most popular news source for tech. So, I build a project to send Hacker News contents to kindle. I think kindle or books is definitely a better choice to obtain information than computer or Phone, it can lead less distractions and improve the efficiency. The problems of books and kindle were slow update, but it can be solved by thiry-party service. We can solve this problem as long as we can fine high-quality data, even AI generated contents. Disadvantage I just found a disadvantage of reading news in the evening, it makes me excited. As a result, it will be hard to fall asleep. But it doesn’t matter if you read it at day. Idea comes from new thing I usually have a new idea when I try something new, like finding out new idea about entrepreneurship, taking driving test, university ddl-alert, reading news on kindle. Most things I do are not my choice, but it doesn’t matter, I can have a new idea when I do something new. So, I should try new thing in life when I have new idea. Another method is mining demand from other people, like reddit, quora, that need more ability to simulate what others are experiencing, and it cost more time to fine the valuable ideas from the whole market.

Another project

作者 Benson
2025年11月28日 00:00
Project The last month goes fast, because I almost spned all my last month into a new project. Right now, I finish the first relative muture version of the project. Of course, I still realize some feature that I intended to implement later, I just cannot help myself to optimize that part when I was thinking of that. I really enjoyed the feeling that I put all myself into a thing that I’m total responsible and liked. Even I must pend the project because winter is coming, many thing in canada will stop in winter, outdoor activities becomes rare in winter. There is no enought clients for my new project. Reflection Yeah, I don’t want to share more details about this project now, but I want to write relection about the project. The success of a project is very very hard, it will influeced by lots of reason, you can success only when you have the all requirements. Generally, the most important reason about a project is not you, or the most element that have largest influence for the project is not the people how participate the project, that maybe something you can not make decision, like economic trend, or governmental policy. But the owner of project should be responsible to indentify these crucial and necessay elements. For example, I should know that there are few or almost zero clients during winter season for my project, as a reuslt, I can spend less efforts that I’ve done in the project. I also learn some great things from the project: Build the entire service for users rather than half-service, clients prefer to purchase the entire product with high price voer half-service with low price. Reaching out your target clients is the most hard part, don’t be shy when you are readching out, do anything that can help you to reaching out the clients. Publish the first experimental version as soon as possible, to get feedback ASAP and push the positive circle. I also learnt that publish the first muture version as late as possible from someone else, but I don’t have opportunity to apply that in my project. Influence in life I always put anything away when I have a great project to participate with, I’m happy during do the project. But after I failed and quitted the project, I lost so much thing. I need to meet the consequence resulting from my careless for my life, like the relationship with my partner, the poor assignments that I finished without enough care. It happends a lot that a project failed in the end, I should balanace my life better when I meet my next project. More specificly, I should be more peaceful when I meet next great project, that can help me to counter the challenges in life.

How to imporve english

作者 Benson
2025年10月11日 00:00
English proficiency I though I can improve my english ability after staying in english speaking country for some time naturally, this is definitely wrong. English proficiency can be imporved only when I practice it. Changing naturally is just the cover of true life. Recently, I took two interviews, that’s the direct event that I realize I still have huge disadvantages on english proficiency. First one is a basic english recorded-video interview, I just introduce myself and failed in that interview. Second is a coding interview for a project, the problem is reading for the materials, I read so slow, as a result, I don’t have enough time for coding. How to improve When I was preparing the standard english test in the past, I use ChatGPT a lot. The most I used ChatGPT is for speaking, it has native speaking ability and talk with me all the time, talking is the more direct and efficient way to imporve speaking ability, talking is practicing for speaking. In addition, I also use ChatGPT to help me correct the sentences or essay I wrote, and use ChatGPT build some automatic tools for listening. After I passed the exam, I use it less and less for learning english. I don’t have the clear target for learning english now, so, I don’t have a clear plan and assign dedicated time for learning english. As a result, I never use ChatGPT for learning english after that. These thing doesn’t influence the fact that ChatGPT is the best tool to help people learn english. Just like saying in The Almanack of Naval that I was reading recently, any opinions cannot influence facts, we should know what is really fact without influence of any kinds of emotions. Back to orignal topic, for input end, we can just practice more listening and reading, there are so much materils we can use. For writing part, we also can practice by ourself and let ChatGPT to correct me for uncorrect spells and obvious grammar errors. For speaking part, I also can practice with ChatGPT, the correctness can be improved from writing, as a result, I can learn english all by myself without helps from others. Diffculty It’s easy to execute with a clear plan, but I realise that I looked for some execuses to avoid the efforts I should put in learning english, this just like the old saying. This makes me have a more deep understanding that we need to recognise as much as fact in life, human are good at looking for execuse for what the do naturally. Hope I can execute the plan well.

PopTranslate

作者 Benson
2025年9月14日 00:00
Introduction PopTranslate is a translation chrome extension, main features: Pop up translated contents immediately after selecting contents without extra click Show english dictionary when selecting a word Show translated contenst in the right above corner user experience Sub window will show quickly after selecting, ensuring real-time experience for users User can click any other place but sub window to close pop-up windows, it’s easy to close Availability I use free google translate API for translation service, free dictionaryapi for dictionary service. This project cost zero for me, so this extension can be available for a long time without extra cost investment.

Last day in netease

作者 Benson
2025年8月22日 00:00
Today is the last day for me in netease, I don’t much as much feeling as that in baidu, on the one hand, I spend much more time in baidu, almose five days, one the other hand, that’s my first job, first time is always different. What makes my surprise is yesterday, the boss, all people in my small group have a bye-bye lunch, leader and boss are both friendly even I only stay here over four month. This is a friendly team, we don’t have the rules of big tech company, like how to flattery for leaders, says some words what are needed but we don’t want. In the middle of lunch conversation, we talked about the flattery atmostphere in baidu, what made me surprise is that some people think flattery issues is more serious than I thought. In the last moment, I suddenly remembered that we don’t have a cheer up, so, I ask for one. Boss jokingly said that the ex-baiduer want a cheer in the end, I also realise that. Actually, it’s usually not my job in the common dinner. What makes me unforgottable is freedom and flexibility in this netease team, we totally dont have to care the reporting relationship in daily works and chatting, which makes the atmosphere great, maybe perfect. In the back road from restaurant to company, I was told much information about last era of Internet and companies in that time. There are much opportunities in that era, there are so much products they can do, some of them became totally successful. I always like the story of history, especially sounded from a person how experience by himself/herself, it’s like Qin emperor teach me the history of Qin dynasty.

Better idea between Copilot-typed and CLI-typed assistant

作者 Benson
2025年8月12日 00:00
Copilot-typed GitHub Copilot, cursor, GPT-codex, winsurf, trae are representative products of Copilot-typed tools, their function is to help users complete code automatically when necessary. Of course, they provide interactive conversation for users with code as content, which is common feature. This is the direct usage intruction in coding field when LLM appears. CLI-typed Claude-code is the first CLI-typed coding assistant, after Gemini-CLI join the wars among CLI-typed coding assistants, cursor-CLI also was published on 8th August. CLI-typed tools are the representative products in vibe coding era, which means people can release themself from editors, What they only need to do is talking with LM and express their thoughts and opinions. I think the step from Copilot-typed to CLI-typed(vibe coding) is too big to be workable, LLM cannot transfer an idea or demand into product by himself. The main disadvantages of CLI-typed (vibe coding) is that LLM cannot solve complex projects because it’s not good at designing the solution. but developers are good at that, especially for junior developers. It’s not necessary to use LLM to solve all tasks. Better tool A better product should provide more space for junior developers to express their thought of design, and also provide more control space for users, some users really like the feeling of controlling and enjoy that, this also can help LLM program better code because the instrction is more clear. More specifically, I think develpers can design their project in any depth from generality to details by the new ideal tool, and LLM can program code based on the design from developers. In additon, there’re more advantages in this way. Designing of developers could be seen as ideal documentation of project It’s easy to save the checkpoint of the project in the process of developing. Developers are not probably mad for multi-round conversations. Maybe this can be solved based on CLI-typed tools, but I’m not sure how to proceed in detail.

Gemini-cli

作者 Benson
2025年7月7日 00:00
Gemini-cli Gemini-cli is command line tool supported by Gemini-2.5-pro model, it’s a similar product with Claude-code by Anthropic but free, what’s more, Google open source all code of Gemini-cli in github and receive 20k stars in only one day, 40k in three days, this’s scene that never happened for a long time. I use Gemini-cli to generate a project which can transfer model from huggingface into modelscope, I need this feature for a long time. As a result, I complete this project in three hours, this make me suprise. Gemini-cli was good at search various api and to use them, which will consume much time for human. Gemini-cli is strong for me because Add/delete/modify file in command, which create more space for llm to use their ability. execute code and receive output after exectuing, iterate the code by executing on turn and another. When meet name alreadly be used, it will change a new name to test it, and recover the name before finis the code, small but interesting design. Gemini-cli really spent so much tokens, shows the determination of Google on AI. hf-ms-transfer I’m unwilling to start a new project or update a existing project naturally, it’s hard to start, but, it’s eary to continue after starting. In my opinion, this is the difference between nature and maturation. This also happens on hf-ms-transfer project, even it’s not hard to finish. I think this is why many people always say that just do it rather than talk a lot, it’s reasonable. it’s everyone should overcome.

LLM Post-Training experience

作者 Benson
2025年6月23日 00:00
Prompt Prompt is the most direct way to influence response, tips for good prompt: Clear instruction about our demand Provide necessary context, role, tone, format guide LLM output reasoning process before final answer More instructions, less constraints Exampler can ensure the constructure is as same as example The purpose of prompt in post-training is building best reasoning architecture in response, training could optimize other detailed contents in response One shot learning One example in prompt (one shot) can ensure output architecture is as same as example. In binary-class tasks, one example probably result in answer trend to that in example. In binary-class tasks, two examples probably result in unstable of answer. Experience / Conclusion Model size of model to train is related with information volume of datasets Larger model need more information volum to fine-tune We can use small-size model to test whether the solution is feasible with low cost Smaller model has better stability of response Amount of data is positively correlated with model performance Quality of data is positively correlated with model performance Training process The purpose of training is to ensure performance on test dataset increase in stable trend and range. ensure the loss/reward curve and performance on test dataset change with same trend If performance of test dataset don’t increase as expected, overfitting / reward hacking occur. If loss cannot reduce as expected, there is something wrong in training dataset adjust learning-rate and regularization penalty by observing loss curve with training steps If loss decreased slowly, raise LR. If loss curve is unstable, lower LR. When overfitting occur, raise regularization penalty. If loss can not increase in late stages, try to lower it. verify idea with pure control experiment retry total same experiment to exclude influence of random make LLM output intermediate reasoning process before output final answer For specific task, put as much logit as in rule rather than in prompt if possible Thinking rewrad is valid and necessary in RL model reward even multi model reward is helpful in RL Multi-stage training The purpose of dataset is to provide information to model to learn, in the late stages, model already know more than before, more extra information should be sent to model. So, in the late stage, we should increase information diversity How to increase information diversity: put hard samples in late training stages increase temperature in late training stages for GRPO select samples which have unstable results for GRPO Reference Google prompt engineering Six Key Elements of AI Agent Prompt Engineering

Papers I readed recently about LLM application

作者 Benson
2025年6月22日 00:00
How much do LLM memorize? key definition unintended memorization: memorize a specific dataset generalization (intended memorization): contains about the true data-generation process calculation method: by information entropy and mutual information double desent appear on the changing points from unintended memorization into generalization GPT-models store 3.6bits data per parameters value of float32 is 9% higher than float16 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters trade-off between pre-train model size and inference-time(inference length) performance can ouperform 14x size model performance is better in easy and medium problem, judge easy, medium or hard question based on the pass rate two ways to increase inference-compute-time best-of-N: sample N outputs parallel and choose the best one based on learned verifier or reward function. revise response: revise original response Prolonged Reinforcement Learning tempurature: increase tempurate to avoid entropy collapse decoupled clip to increase exploration space dynamic sampling: erase all truely right or wrong samples calculate loss from sample level into token level – DAPO KL-regularization and reference model reset illusion of thinking for hanio tasks lower performance in simple-level question for reasoning model than general mdoel because it get wrong answer when thinking even already get correct answer [over thinking] better performance in medium-level question zero-performance in hard question Gemini 2.5 tech report dataset ensure dataset quality fliter and drop duplicates post-training verifiable reward and model-base generative rewards to provide sophisticated and scaleable feedback signals verifiable reward model-base: more sophisticated and scaleable feedback signals update LR-method to improve the stability of training result: learning in complex space

Weekly-#4 First insight of LLM accelerate

作者 Benson
2024年9月15日 00:00
Product process PopTranslate Finally find the root reason why my chrome extension was rejected, I put the api_key in .gitignore file and it can’t upload to chrome extension store successfully when I upload the zip file. Actually I found the root reason thought the crx file generated by chrome extension. After I pointed out this reason, google extension team also responce to my email and tell me this reason, this make me feel valued. I also published this product on several media platforms, what left me a deep impression was reddit, I put my project so hard on this platforms, as a result, it banned my account after five hours, I deserve it. So I register another account, I need to culture it for several days to express myself more free. The result after publishing haven’t totally show, so, I will summary that in the next week. unsloth Learn “Fast Cross Entropy Loss” in unsloth blog, I already undertstood the theory. I also need to porve that based on my own code. This most important thing I have learnt in this week is I finally understand why llm-accelerate projects can work. Most of code of llm design and writed for common ustage, right now, loss function or transformers are used for million times with same format. As a result, we can make this code and theory more specific and make it can make the current network and task well, changing common function and theory to specific one always can improve the efficiency of training and finetune. YouTube I upload two videos about the solutions of leetcode problem this week, it have plenty of benefits, for example, practicing my english speaking ability, prepare for english interview, try to be a youtuber. Daily 9.9: running, prepare huggingface token 9.10: reproduce code, but result is different, maybe cause is the random seed 9.11: read related HN post and blogs about unsloth 9.12: read fast_crossEntropyLoss code to understand why it works 9.13: Try to solve another issue but failed 9.14: publish PopTranslate 9.15: Rest Personal life Reading alphaxiv: it’s perfect for both idea and value Front-end UI: many great design even typescript so many post tools with AI Exercise Runing five time this week, half hour every time, it’s a good start, If I can insist on this week, maybe I should price me some equipments. Thought 1) unsloth also reference some other paper, there are plenty works can be done about this topic, it’s also hard, I think it’s worth it. 2) publish new project is a great think, I can receive thank for others even it’s a small thank. Own proejct can give me a feeling of children, it’s totally different. This week Focuse on unsloth: less than expectation PR Reproduce Know the theory of acceleration and product a essay Reading: spend some time on papers Exercise: Running in the mornings: better than expectation Social account Next week Focuse on unsloth Validation of thought Know more Try new thought Reading more Exercise Produce more valued content

Weekly-#3 PopTranslate

作者 Benson
2024年9月8日 00:00
Product process PopTranslate I finally finish this project, it works on my computer and my gf computer, I like this project. Sad news is my review for this extension was rejected two times by google, I didn’t know the reason ans ask for further review. Other product attempts 1) Text-to-voice in-browser After know about transformer.js which aimed to bridge the gap between web and LLM, I tried this method to realize this project, but it failed again. I also meet some errors that I cannot solve in short time. In terms of the reason, transformer.js is not mature enough, more deep knowledge and techniqu in audio field are necessary to solve it. Maybe I will do that if I have free time in the future. Consequently, I hold this proejct again. 2) Analysis 500 startups supported by YC Due to the lack of fresh idea, I analyse the 500 startups supported by YC in 2024, some conclusion as below 50%+ is LLM or AI related based on title, real value is higher than that. 64% is belong to B2B category. This is majority area of AI application. 10% is belong to healthcare categoty as the second biggest part. Education is smallest category, occupying 1.2%. There will be more opportunity if AI became more accurate. Projects that are great and suitable for me unsloth: LLM finetune accelerate gpt-pilot: Making AI deveop like a real developer, this will be the next stage of AI code transformer.js: bridge the gap between web and LLM, from huggingface 3) LLM finetune accelerate Request my first PR for this project and got merged, meanwhile, it’s the 1000th PR, just a coincidence. The open source version of this project can accelerate most of open source models finetune stage 2x, pro version can provide more support, best performance is 30x. It worths more exploration. Daily 9.2: PopTranslate 9.3: Publish PopTranslate 9.4: Reading 9.5: Reading 9.6: Try to reproduce a specific issue of unsloth 9.7: submit PR of unsloth. Try transformers.js 9.8: Rest Personal life Reading limu’s speech in shjt universion: recommendation for five stars AI tools help you grow fans Greppability is an underrated code metric: Useful and practical for developer, especially for how worked in big company Interest-base community have great growth: I have the same sense, interest is the final way to socializing Cost of LLM will continue to decrease – Andrew NG: Moore’s law will work in the latest field. Raise 1B with one html – ssi Photo has more power than text in terms of creation Next product of openAI will charge 2k/month? Architecture is the main problem when traing 400B+ model – limu Exercise three times this week, sad news. But I start to running in the morning, just start in the monday. keep insisting. Thought 1) Creating new product is a hard problem, I realize this more after I start to do it. 2) Creating entertainment content is a difficult task even TikTok is so popular all over the world. Hot content always need all necessary thing, fresh idea, capturing details, and so on. This is hard to created that by AI. In other words, AI is not enough smart to produce content which can attarct consumers. Consequently, AI is suitable for small business, this is the majority ustage of AI. 3) I’m tried, but I enjoy. Next week Focuse on unsloth PR Reproduce Know the theory of acceleration and product a essay Reading: spend some time on papers Exercise: Running in the mornings Social account

Weekly-#2 The fail of first product

作者 Benson
2024年9月1日 00:00
Product process Voice correction From monday to wednesday, I finish the frist verison of my first product, which is voice correction. It always have several drawbacks duc to the lack of high performance of my computer. This price of cloud computer is high, that is the main reason I don’t continue this project temporary. The first version is available for me or my girlfriend, but it is too shy to share to public. There are several problems. some voice will not converted to text sometimes, because of the lack of high performance of computer. better computer can illeviate this issue. The order of text is not totally the same with voice, because I didnot reroder it after convert voice to text with multi threads, I have own the related techniques. I will do that if I restart this project in the future. the performance of voice-to-text and text correction models is not always good, bigger model and better machine can diminish this issue. I still learn a lot from this project, for examle The definition of async function and how to use that in python, it is essentail in stream or speaking scene. The price of machine and the need of LLM for machine How to use discord bot, which is powerful The potential of web-gpu, which make lots of LLM product possible Other product attempts Collection of web-tools The collection of web-tools, the define of web-tools is that this kind of product don’t rely serivce. This kind of product have low cost and are easy to copy, there are less powerdown or other related issues for that. After explore for several hours, I found there is a open source product named (it-tools](https://it-tools.tech/) which is great. I don’t need to do that again. I move my sight to web-tools of AI product, it is also not an ideal prodcut becuase most of LLM product are not abvailable in only-web. Text-base: there is a related porduct named mlc-llm, this is fine. image-base: more of image model are 12B, it is too big to run in-browser video-base: it’s more impossible compare to image-base audio-base: voice-to-text: whisper.wasm is great text-to-voide: there is not related proudct temporary, I can do that. Text-to-voice in-browser Influenced by My collection of web-tools idea, I want to make a text-to-voice product just like whisper.wasm, there are several advantages. service is not necessary, I can host it on github pages, it can be free for public. it run only in-browser, there are not privacy concerns. it is useful for public, for example, I need that when I prepare my ielts test Direct translation extension Inspired by the pop up windows of a english learning platform, I want to make a translation extension, it can show the translation after user double-click the content that he/she select. Compared to other related products, it can show translation directly with only one operation, other proudcts need two operation to trigger translation. Now, I finished the front-end code, I need to add requesting translation API funciton to my project. I forgot to push code to github again, I don’t know it is essential in the initial step of a product. Personal life ITLTS The review of ielts speaking have a result, I got the score that I need. This is a great thing because I can put more attention on my personal projects. Reading I didn’t read as much as last week, because my products gone not very well. The CEO of telegram was arrested by frence government. Telegram is always a wired proudct, it is famous by security but it don’t open e2e as default, there are so powerful other function in this platform which are not related with security like bot, channel. The small tools in whatsapp: There are several so simple product but can bring so much profits. The profits are not alwasy related with its complexity remove background in using web-gpu: I think this kinds product are powerful, It will be great if we can make web-gpu support more LLM, like models in huggingface. exercise I start to do small exercise this week, I did at least finve times in this week, this is a good start. Next week plan for next week product realize the direct translation extension publish it to public generate more ideas reading more exercise take part in social group

Weekly-#1 First week of indie develop

作者 Benson
2024年8月25日 00:00
Backgroud When I received my ielts score at 8.19, I still didn’t get my necessary score for speaking part, But the difference between this time and last several times are that I think I already try my best in the ielts speaking part. So, I don’t want to continue to prepare for ietls because it is expensive and filled with random factors. Meanwhile, My study visa havn’t passed till now and I confirmed I trigger security investigate which will spend more time than regular visa application. I will delay my master program with high possibility if there is no suprise in next week. It is wired it need so much time for ircc to process the visa application, maybe I need other preparations after next week. As a result, for english, Firstly, I had already ask for a score review for ielts speaking part, I had already submit the application for Cambridge English. In addition, I will change exam from ielts to duolinguo, whose price are more reasonable and are relative easy to pass the exam if I have at least two months to prepare for that. More important, I am able to start my personal deveop career right now. This week is the first week for my to develop by myself towards to my personal goal even I had quit my job for over three months. Product Idea source When I prepared for ielts speaking test, I asked help for enlish techer online, I found the best services from teacher to me are show the grammar issues of my speaking, actually, I can provide enough long sentence to answer the question because I have already practice these topics for several times. My major problem is is grammer issues, like past tense or present tense, singular or plural, I made so much these kinks of mistakes. So, I want to make a product to point out the mistakes that peole who are learning english made. I will made it as a discord bot first. Maybe I will build a app for website in the future. Realize receive audio file or real-time speaking audio convert audio to text message correct the errors in sentences display the errors and correction Process Right now, it can receive audio messages and output what I want it to display as a discord bot. I am trying to realize the real-time speaking input, which is the area that I am unfamiliar with, consequently, I need more time to learn some new knowledge. Obtained new information receive voice data is sensitive for playforms: there are not offical API for discord.py to receive real-time voice data, I need to ask help from extension, it’s luck that there is this kind of tool. Running LLM on only web is possible: it’s not suitable for every LLM, but it is a low-cost way to achieve some specific products. There are also great tools that support it, like web-gpu. There are plenty of LLM with small size, which are friendly for personal developer or small business. Discord API is so friendly, My love for discord is increasing as I know this platform more and more. cloud machine is expensive, I need to spend almost 200 every month if I want to run the product in the future. Alternative product Several products are great but there are always some drawbacks All of them are expensive All of them can not provide best correction display format Some of them cannot keep correction function for a long time day records 8.19: organize my though about this product 8.20: explore text-to-speech and text correction model 8.21: try only js solution and give up. start to connect discord bot ensure text correction model and start to development 8.22: realize the whole process though audio segment input 8.23: connect real-time speaking and save data, but it isn’t match the whole project. Another solution need. 8.24: rest Reading Reading can help me to maintain the connection with latest development of techniques, meanwhile, it can provide more fresh information to me. levelsio: indie hacker who pursue extreme freedom made numous products but only several of them achieve success, which are enough for him to living and pursue his goal without much life related pressure. most of his successful products are related with AI, which bring lots of opportunities to the internet. DeskHub: hardware product of github bars, so fun learning english though printing subtitle with youtube Run python code in javastript: I believe I will use this tool in the future Build products in several platforms though the same code What product are we building PDF to markdown: It use LLM to realize, which cost so much resources. discord music bot: for fun real-time face swap video: this is useful in xxx fields, wired. write two solutions for leetcode, I want to make some videos if I have extra time. Thought I am happy that I had started my product right now. The passion for new product will diminish and flucate with time, it is the most strong period before you start to realize it. It is hard and crucial to maintain the balance between passion and clear thought. I should push code to github, it seems like I take part in huge open source project. I am too heavy, I need to loss weight I start to write english content like this essay, it is nature and comfortable to continuous writing since I started. Plan for next week product: realize the real-time speaking make a landing page publish it in discord channel spend some time on exercises some time for duolinguo test reading and generate new ideas for next product record videos for leetcode solutions 2024.8.25

slack迁移discord

作者 Benson
2024年7月19日 00:00
背景 之前加入的一个slack留学群组,历史记录里有很多宝贵的经验和文件,但slack免费版本只能显示最近三个月的消息。 最近社区有网友提到了这个事,建议群主把历史信息迁移到discord平台,群主也把历史记录down下来了,就万事俱备了。 恰好呢,我7.16日刚考完雅思,等结果的这两天可以把这事儿搞一下。 迁移 之前调研过,有slack迁移discord的工具特别好用,几乎无缝衔接,让我比较惊叹的是,一条消息下的回复列表,也能按照原格式迁移,几乎是无损迁移了,有了这个关键点就能很大程度保证迁移后的用户体验,其他问题感觉就不那么重要了。 具体步骤按照原作者给的github的操作路径执行就可以,比较费事的是,怎么给discord群组按照一定权限加机器人,我找了半天,竟然是要自己构造一个链接,机器人的权限信息就在链接的参数里,访问这个链接点击确认,就加进去了,这个用户体验着实有点别扭。也可能是我没找对姿势吧。 主要问题 真正迁移的过程中,遇到的大问题就一个,每次程序跑了3-5小时,就会因为网络原因(或者其他原因,我看着像网络问题)断开,而且,整个程序是先初始化了 discord 的 client,再一条条把消息通过bot发到discord群组,网络原因嘛,第一方案就是重试,但发现只重试 “发送消息” 这个小步骤没有用,还会持续报一样的错误,这里我确实找不到原因了。 所以呢,就只能在最外层加重试,也就是每次失败后,重新初始化client,这样能解决失败的问题,引入的新工作量是,需要加一个缓存(姑且叫他缓存),记录下上次失败时迁移到哪个位置了,这次从这个位置继续,避免重新迁移。 所以最终运行的代码版本,相比原版,就增加了外层重试和缓存的逻辑,算是成功迁移了。最终版的代码,提了PR,但感觉加的功能比较挫,大概率不会merge,留在那儿给后人提个醒也行。 效果 支持能的功能如下 评论:每条消息下的评论,能按照原格式迁移,无缝衔接,很棒 内容:除了文字、链接,图片、附件也能正常显示;过大的文件无法迁移 用户:能迁移头像和用户名,太强了 时间:新平台的内容发送时间为迁移时间,非原消息的实际发送时间;新平台每条消息前面会附加真实的发送时间,作为替代方案,这是原工具的功能 附件:有附件的消息新平台会分成两条发送,第一条是附件以外的文字信息,第二条是附件 详细记录 这个过程总计耗时2天,程序跑了24小时左右,详细的调试过程如下 7.17 14:00 跑一个小时失败,网络不稳定,换香港服务器重跑 7.17 21:00 跑3.5小时失败,网络不稳定,给await加了10次重试,每次5分钟 7.18 07:40 跑了4小时失败,迁了一万条消息,运行时间、成功迁移的数量跟上次差不多;问题是上次加的重试覆盖范围不足,扩大了范围,9:45 重跑;关注下内存和cpu 7.18 12:32 跑到 college 申请群 2024-03-18 出错,跟上次出错不是同一个位置,非内容问题 修改成出错超过10分钟就跳过,重新跑,12:32 开始跑 结论:1)内存稳定,不是内存问题 2)内容跟上次出错位置不同,非内容问题 7.18 15:13 跑到 社区主频道 2022-11-01 出错; excption 代码逻辑出错导致,修改重跑, 15:36重跑 报错信息:疑似网络不稳定的问题 结论:重跑能成功,非内容问题 7.18 20:24 18:47 断了,跑了三个小时,未知原因, 7.19 8:38 添加外层重试和缓存,从主频道开始跑,跑了10个小时跑完 共72943条消息,社区主频道之后迁移失败消息有134条,一共预估300条左右,占比不到0.5% 致谢 感谢原作者的代码 感谢群主提供的机会
❌
❌