Table Of Contents

Project SuccinctCut🎥🧹✂

About The Project

Untitled

Preface

We are building a web audio and video editing service that allows users to:

Remove utterances and other disfluencies from the original clip
Optimize clip length by slicing the video at parts where there are substantial silence
Transcribe the audio with captioning [ on the video ] and a fully formatted copy.

In exchange for:

Some of your computer’s CPU resource
Time

Performance

on i5-8600 CPU desktop computer

Video size/mb	Video duration/min	Final Video Duration/min	Time taken for audio analysis/min	Time Taken for video editing/min
21	1.04	0.53	0.4	11
130	6.44	5.04	3.53	73

more info on https://github.com/Ennnm/succinct-cut

Planning Documentation

Problem Statement(s)

<aside> 🧑🏾‍💻

We understand that there are limitations ( in terms of performance, speed and memory ) that we have to contend with. The scope of the app is to allow users to at least complete the optimization is a standard YouTube-length (of 13 to 14 mins) without substantial loss of performance.

</aside>

A common pain-point for developers and content creators, especially those having to do product reviews or presentation on webcam, is to remove glaring disfluencies (think of the uhmms, ahhhs), reduce the amount of pauses during scene transition, and also to have a (sort-of) auto transcription and captioning of the recorded video.

The typical workflow, not using any premium or paid software, would be to go to a provided Mac/Windows video editing application, and painstakingly identify the disfluencies and insert the captioning manually. The goal of this application is to provide a simple and efficient way to shorten this workflow without sufficient drop in video/audio quality.

Requirements

[ Base / MVP (70%) ]

[x] User can upload video into the browser and be able to crop, cut and shift video content chronologically
[x] App will extract the audio in the background, send the audio to a speech-to-text machine learning model and get the full transcript
[ ] App will show the transcript, line-separated by speakers, pauses and other utterances
[x] App will show the timestamps and the cuts to be made on the video file, as well as the portions to be removed
[x] Once the transcript is done, scrolling across the video/audio timeline shows the specific text within the transcript that's synchronized with the audio / video
[ ] User can edit and change the text, audio and video accordingly
[x] App will splice and merge the files and allow the user to download

[ Comfortable (100%) ]

[x] loading bar/ suspension loading for processing of video
[ ] Meaningful transcription
[ ] user tools for cutting, speeding up segments of video
[ ] server-side storage of assets
[x] login auth
[ ] server-side ffmpeg processing of assets
[ ] making video nft on santa site
[ ] apply naive bayes!

User Flow Diagram

Pipeline of our working MVP.

Pipeline

Wireframes

Video Editor

We’d wanted to create a full-stack application where users can manually edit their videos while the transcription services run its processes. Users should be able to edit, splice and cut the videos.

Wireframes

Components

Editing Timeline
Transcription

Transcription

Since we’re using a Speech-To-Text service, it would make sesne for the users to edit the returned transcription and embed it in the final video edit. This allows more customisation and utility in the tool.

otterai clone

Components

Player
Metainformation

Possible Stack

Frontend
Backend
Data Pipeline

Proof of Concept

GitHub - wongsn/ffmpeg-transcribe: Use ffmpeg to extract audio and then transcribe using IBM Watson STT

https://github.com/wongsn/otteraiclone

ffmpeg

17s to cut 1 min video into 32 pieces

Miscellaneous

Test Resources

Audio
Video

Resources

Retrospective

Future Works

A better UI and more functionality

https://www.loom.com/share/c709ded22cac48b18ab2b69e7af22e7a