During the COVID lockdown, the only way that we communicated with each other was over social media channels because we couldn’t go out and meet up with people. Suddenly we had a large amount of time on our hands too, and one of the social media groups that I was joined into was a “Bears be Baking” social media group, where bears would post up pictures of their baking creations they were making with all of the time on their hands.
I did not bake at this point (and i think you can tell from the video this is clear), however after ordering a few items from Amazon, who were still delivering during the pandemic, I wanted to use my Iphone camera and record a video for the channel (at this point the group just contained pictures), and so I recorded around 45 mins of video, terribly edited it together into around a 30 minute video and then I posted it up onto Youtube and sent it round to a few friends and family. It had 76 watches at the time i posted up this article.
So, lets start with the AI capabilities built directly into Youtube itself now, on re-uploading the video the AI was able to offer me 3 generated thumbnails for this uploading as shown below.



So first off, none of the content above appears in my video, but I assume that its Youtube’s algorithm spicing up my content or thumbnails to make them look like users would want them to appear like, in order to click on my content. They are worrying in that they actually represent content that I haven’t made though. I don’t get covered in cake mix like in the first image, that’s not a picture of the cake I made, or a cake stand i own, and I have no idea I could look that angry making a cake, but YouTube users must want me to want to murder the cake! You can also see that Roperto has gone and sat himself back on a shelf that he’s built that wasn’t in the original kitchen, probably to get away from my murderous vibe!
The original video can be seen on this link here, which includes lots of stuff i wanted to remove from the video, which includes me stammering a lot, and just a lot of long pauses so that it takes 30 minutes to essentially see me make a cake so simply, you dont want to watch me do it in the first place! But will I get that angry, or cover myself in flour!?!?!?
“Descript” is the reason i wanted to make this post and I uploaded the video about into its service. Descript is a modern audio‑ and video‑editing software company best known for its AI‑powered, text‑based editing tools. At its core, Descript lets you edit audio and video simply by editing text, making production dramatically faster for podcasters, YouTubers, and other digital creators. Its an an AI/ML‑driven media creation platform offering tools to record, edit, transcribe, collaborate, and publish audio and video content. It began as an audio‑only editor but expanded into full video editing as creators increasingly work across multiple formats.
Key Services & Features
- Text‑based editing — Edit audio/video by editing the transcript; deleting a word in text deletes it from the media.
- Automatic transcription — AI‑generated transcripts for podcasts, interviews, and videos.
- Multitrack recording & editing — For podcasts, voiceovers, and video projects.
- Screen recording — Capture tutorials, demos, and presentations.
- AI voice cloning & overdub — Generate or correct voice lines without re‑recording.
- Video editing tools — Cuts, captions, green‑screen removal, templates, and more.
- Collaboration features — Shared projects, comments, and versioning for teams.
- Publishing & exporting — Direct sharing to YouTube, podcast platforms, and social media.
These features reflect Descript’s belief that the traditional divide between audio and video editing tools is outdated, given that creators often produce both. Descript’s origins trace back to its initial mission: make audio editing as easy as editing a document. The company started with a simple but revolutionary idea — turn recorded audio into editable text, allowing creators to manipulate sound by manipulating words. This innovation became the foundation of the platform and later expanded into video editing as creator needs evolved.
The company’s growth aligns with the explosion of digital content creation, as YouTube now has over 2.7 billion active users. AI tools now allow automated re-dubbing and removal of content. You can see a list of the AI tools available on the video here

I wanted to test 3 things and how it impacted the video :
- Take away any “err”, “umm” vocals from myself
- Reduce any long pauses in audio
- Recommend any cuts I could make to the video of scripted content that I could then review and take out
When you upload a video onto the system, it transcribes ALL of the audio from the people speaking and presents you with a video timeline that as you select a point in the timeline, it will show you the word that I’m saying and i can go in and remove any words I want and therefore cut that content of the video away. The remove filler words option does a lot of the work for you, and it can then look at word gaps and reduce them too, and so I ran these and then exported my video back to Youtube with its enhancements
The video inevitably jumps a lot as its taken out the erms, pauses and even some of my unfunny jokes (for example what Roperto does with eggs). But given that its done all of this work automatically, it took away over 7 minutes of content that just simply “chuff”. On a 30 minute video that 23% of content editing i cannot deny this needed and that id never be able to do on my own using the video tools that i have.

