Table Of Contents

Project SuccinctCut🎥🧹✂


About The Project

Untitled

Preface

We are building a web audio and video editing service that allows users to:

  1. Remove utterances and other disfluencies from the original clip
  2. Optimize clip length by slicing the video at parts where there are substantial silence
  3. Transcribe the audio with captioning [ on the video ] and a fully formatted copy.

In exchange for:

  1. Some of your computer’s CPU resource
  2. Time

Performance

on i5-8600 CPU desktop computer

Video size/mb Video duration/min Final Video Duration/min Time taken for audio analysis/min Time Taken for video editing/min
21 1.04 0.53 0.4 11
130 6.44 5.04 3.53 73

more info on https://github.com/Ennnm/succinct-cut

Planning Documentation


Problem Statement(s)

<aside> 🧑🏾‍💻

We understand that there are limitations ( in terms of performance, speed and memory ) that we have to contend with. The scope of the app is to allow users to at least complete the optimization is a standard YouTube-length (of 13 to 14 mins) without substantial loss of performance.

</aside>

A common pain-point for developers and content creators, especially those having to do product reviews or presentation on webcam, is to remove glaring disfluencies (think of the uhmms, ahhhs), reduce the amount of pauses during scene transition, and also to have a (sort-of) auto transcription and captioning of the recorded video.

The typical workflow, not using any premium or paid software, would be to go to a provided Mac/Windows video editing application, and painstakingly identify the disfluencies and insert the captioning manually. The goal of this application is to provide a simple and efficient way to shorten this workflow without sufficient drop in video/audio quality.

Requirements

[ Base / MVP (70%) ]

[ Comfortable (100%) ]

User Flow Diagram

Pipeline of our working MVP.

Pipeline of our working MVP.

Pipeline

Wireframes

Video Editor

We’d wanted to create a full-stack application where users can manually edit their videos while the transcription services run its processes. Users should be able to edit, splice and cut the videos.

We’d wanted to create a full-stack application where users can manually edit their videos while the transcription services run its processes. Users should be able to edit, splice and cut the videos.

Wireframes

Components

Transcription

Since we’re using a Speech-To-Text service, it would make sesne for the users to edit the returned transcription and embed it in the final video edit. This allows more customisation and utility in the tool.

Since we’re using a Speech-To-Text service, it would make sesne for the users to edit the returned transcription and embed it in the final video edit. This allows more customisation and utility in the tool.

otterai clone

Components

Possible Stack

Proof of Concept

GitHub - wongsn/ffmpeg-transcribe: Use ffmpeg to extract audio and then transcribe using IBM Watson STT

https://github.com/wongsn/otteraiclone

ffmpeg

Miscellaneous


Test Resources

Resources

Resources

Retrospective

Retrospective

Future Works

  1. A better UI and more functionality

https://www.loom.com/share/c709ded22cac48b18ab2b69e7af22e7a