Table Of Contents
Project SuccinctCut🎥🧹✂
About The Project

Preface
We are building a web audio and video editing service that allows users to:
- Remove utterances and other disfluencies from the original clip
- Optimize clip length by slicing the video at parts where there are substantial silence
- Transcribe the audio with captioning [ on the video ] and a fully formatted copy.
In exchange for:
- Some of your computer’s CPU resource
- Time
Performance
on i5-8600 CPU desktop computer
| Video size/mb |
Video duration/min |
Final Video Duration/min |
Time taken for audio analysis/min |
Time Taken for video editing/min |
| 21 |
1.04 |
0.53 |
0.4 |
11 |
| 130 |
6.44 |
5.04 |
3.53 |
73 |
more info on https://github.com/Ennnm/succinct-cut
Planning Documentation
Problem Statement(s)
<aside>
🧑🏾💻
We understand that there are limitations ( in terms of performance, speed and memory ) that we have to contend with. The scope of the app is to allow users to at least complete the optimization is a standard YouTube-length (of 13 to 14 mins) without substantial loss of performance.
</aside>
A common pain-point for developers and content creators, especially those having to do product reviews or presentation on webcam, is to remove glaring disfluencies (think of the uhmms, ahhhs), reduce the amount of pauses during scene transition, and also to have a (sort-of) auto transcription and captioning of the recorded video.
The typical workflow, not using any premium or paid software, would be to go to a provided Mac/Windows video editing application, and painstakingly identify the disfluencies and insert the captioning manually. The goal of this application is to provide a simple and efficient way to shorten this workflow without sufficient drop in video/audio quality.
Requirements
[ Base / MVP (70%) ]
- [x] User can upload video into the browser and be able to crop, cut and shift video content chronologically
- [x] App will extract the audio in the background, send the audio to a speech-to-text machine learning model and get the full transcript
- [ ] App will show the transcript, line-separated by speakers, pauses and other utterances
- [x] App will show the timestamps and the cuts to be made on the video file, as well as the portions to be removed
- [x] Once the transcript is done, scrolling across the video/audio timeline shows the specific text within the transcript that's synchronized with the audio / video
- [ ] User can edit and change the text, audio and video accordingly
- [x] App will splice and merge the files and allow the user to download
[ Comfortable (100%) ]
- [x] loading bar/ suspension loading for processing of video
- [ ] Meaningful transcription
- [ ] user tools for cutting, speeding up segments of video
- [ ] server-side storage of assets
- [x] login auth
- [ ] server-side ffmpeg processing of assets
- [ ] making video nft on santa site
- [ ] apply naive bayes!
User Flow Diagram

Pipeline of our working MVP.
Pipeline
Wireframes
Video Editor

We’d wanted to create a full-stack application where users can manually edit their videos while the transcription services run its processes. Users should be able to edit, splice and cut the videos.
Wireframes
Components
- Editing Timeline
- Transcription
Transcription

Since we’re using a Speech-To-Text service, it would make sesne for the users to edit the returned transcription and embed it in the final video edit. This allows more customisation and utility in the tool.
otterai clone
Components
Possible Stack
- Frontend
- Backend
- Data Pipeline
Proof of Concept
GitHub - wongsn/ffmpeg-transcribe: Use ffmpeg to extract audio and then transcribe using IBM Watson STT
https://github.com/wongsn/otteraiclone
ffmpeg
- 17s to cut 1 min video into 32 pieces
Miscellaneous
Test Resources
Resources
Resources
Retrospective
Retrospective
Future Works
- A better UI and more functionality
https://www.loom.com/share/c709ded22cac48b18ab2b69e7af22e7a