Transcribing audio, i.e., converting audio files into text, can seem challenging at first. Where do you start? Do you have to type everything yourself? What tools are available?
Don’t worry: In this step-by-step guide, we’ll show you in simple language how beginners can transcribe audio — clearly structured and practical.
We’ll present two methods:
- The classic manual approach
- The modern automatic method using AI tools
This allows you to choose the path that best suits your time, budget, and accuracy needs.
Preparation: Save and Select Your Audio Recording
Before you can begin, you need to make sure your audio file is ready:
- Make or obtain a recording: If recording yourself (e.g. an interview), use a quality mic or recording device to ensure clear audio.
- Get a usable format: For existing audio (e.g., podcast, meeting), ensure it's in a standard file format like MP3, WAV, or M4A.
- Make it accessible: Save the audio file on your computer. It should be easy to open and play. If the audio is from YouTube or similar, you may need to download the audio first (be mindful of copyright laws).
Decide: Manual or Automatic?
Now choose your method:
- For short recordings or when maximum accuracy is critical → Manual transcription
- For longer recordings or limited time, → Automatic transcription
We’ll walk you through both options — just follow the steps that match your choice.
Option 1: Manual Transcription
Step 1: Prepare Your Workplace and Tools
(Skip to Step 5 if using the automatic method.)
Good preparation is key for manual transcription. Here’s what you need:
- Computer/laptop: You’ll need this to play the audio and type your transcript. A large screen or two monitors is helpful, but a regular laptop works, too. Avoid phones — they make the task harder.
- Headphones: Use quality headphones to hear all details clearly. Laptop speakers often won’t cut it, especially if the audio is quiet or noisy.
- Playback software: Install a suitable audio player. Options include:
- VLC Media Player (free, hotkeys supported)
- Express Scribe (popular for manual transcription)
- oTranscribe (web-based, with integrated text field)
- Play/pause easily (keyboard shortcuts)
- Jump back a few seconds.
- Text editor: Open a place to write. Use Word, Google Docs, Notepad, or a built-in text field (e.g., in oTranscribe). Keep audio and text windows side by side. Choose a readable font and optionally enable line numbers.
- Optional – foot pedal: If you have a USB foot pedal, set it up. It lets you control playback with your foot so your hands stay on the keyboard. Not necessary for beginners, but useful if available.
- Create peace: Work in a quiet space. Close distracting apps like email or chat. Let others know you’re not to be disturbed. Concentration is key when transcribing.
✅ Now you're ready. Headphones on? Audio file open? Fingers on the play/pause key? Great — let’s transcribe!

Step 2 (manual): Listen to the Recording in Full Once
Before you start typing wildly, here’s a pro tip for better results: Listen to the entire audio once — without transcribing.
Why?
This gives you a feel for:
- Topics and context: You’ll understand the content and flow of the conversation. Fewer surprises.
- Speaker identification: You’ll recognize different voices — helpful when marking speakers later.
- Difficult areas: You’ll notice unclear or tricky passages in advance and can stay alert.
- Technical terms and names: You’ll pick up on names or terms — write them down with tentative spelling to save time later.
Yes, this first round costs as much time as the recording itself, but it often saves more time later. You transcribe more confidently and precisely. You can also check the sound quality or adjust playback settings.
So: sit back (or stay alert), press play, and listen all the way through. Have a sip of water and relax — the hard part starts next.
Step 3 (manual): Transcribe — Type Out Piece by Piece
Now the real task: Write down what you hear. Here’s how:
Play the audio and type along as best you can. At first, the speaker will talk faster than you can type — that’s normal. Write until you feel you’re falling behind.
Pause (or rewind): Press pause (e.g. ESC) when needed. In a good player, you can jump back a few seconds. A proven method: Let the audio run ~10 seconds, then rewind ~5 seconds to catch the end again. This overlap helps you not miss anything.
Proceed in stages: Work through the audio step by step — sentence by sentence or phrase by phrase. Frequent breaks at first are normal. With time, you’ll handle longer segments.
Punctuation and paragraphing: Improve readability as you go:
- Add punctuation where it fits.
- Start new paragraphs with topic shifts or speaker changes.
- Mark speaker changes clearly (e.g. Interviewer: ... Answer: ...).
It’s fine if the phrasing isn’t perfect — you’ll refine it later. Focus first on getting the words right.
Marking incomprehensible parts: If a word is unclear after multiple listens, mark it as [unintelligible] or ___, ideally with a timestamp (e.g. [unintelligible 05:32]). Don’t get stuck — keep going.
Fillers and sounds: Depending on the purpose, include “uh,” “hm,” or laughter, etc. For scientific interviews, usually yes; for summaries, usually no. As a beginner, write everything down first. It’s easier to delete than to re-listen.
Save continuously: Crucial! Save your transcript regularly. Nothing is worse than losing 30 minutes of work. Hit Ctrl+S often or use auto-save tools (like Google Docs).
Take it section by section. It takes patience. For long audios, break it up — maybe 20 minutes today, the rest tomorrow. Focus on quality over speed. You can skim paragraphs as you go, but full correction happens in Step 4.
Continue until you’ve written down the full recording. Well done — the hardest part is over! What you now have is your raw transcript.

Step 4 (manual): Revision and Proofread
Now you have a first draft of the transcript. It’s likely still rough or contains errors — totally normal. Now it’s time for correction and fine-tuning:
Short break: Step away for 5 minutes. Rest your ears and hands. With distance, mistakes are easier to spot.
Listen again and read along: Play the recording again (or just specific segments) and read your transcript at the same time. Are the words correct? Did you mishear anything? Fix these mistakes.
You probably don’t need to replay the whole file — focus on flagged parts like [unintelligible].
Spelling and format: Now, without the audio, review your text:
- Fix typos
- Add correct punctuation
- Choose consistent spelling (e.g. “Okay” or “Ok”)
- Ensure speaker names are correct and consistently used
- Add paragraph breaks for readability
Clarify unclear sections: For each [unintelligible] tag:
- Try again with good headphones and high volume
- If still unclear, you may either:
- Leave the tag if it’s just for internal use
- Make an educated guess in brackets (e.g. “[project name unintelligible]”)
- Just mark clearly that it’s not verbatim
Optional — get feedback: For critical transcripts (e.g. academic interviews), ask someone else to cross-check. Fresh ears may catch what you missed. For casual transcripts, this isn’t necessary.
Now your transcript should be clean and complete. Save the final version — you’ve just turned audio into text manually.
If it’s for publication, apply final formatting (fonts, layout, dialogue table, etc.). But the core work is now done.
(Note: Yes, it’s a lot of work. In the next section, we’ll show how to make things faster with AI tools — but knowing the manual steps always helps, especially when proofreading AI transcripts.)
Option 2: Automatic Transcription
Step 5 (for automatic transcription): Select an Appropriate Tool
If you prefer to let AI do the work, you’ll need a tool. There are many (see our article “The Best Tools of 2025”). For beginners, here’s an overview:
Online transcription services: Tools like Transcriptor, Trint, Sally, etc. let you upload audio and deliver a transcript within minutes.
- Advantages: No installation, fast results, often free trials
- Disadvantages: Mind the privacy rules (especially with US-based services), Internet connection required
Google Docs (voice input): Did you know that Google Docs has a dictation feature? You could play the audio into your mic and let Google Docs transcribe it. Works better for live speech than for pre-recorded files. Worth trying, but not ideal for long files.
Open-source tools (like Whisper): If you’re tech-savvy, OpenAI Whisper (or tools like Whisper GUI, MacWhisper) runs locally with great results. For beginners, setup may be tricky. But if you’re curious: these tools are offline and privacy-friendly. You’ll need a decent machine, though.
Beginner tip: Use a web service with free trial. For example, Sally offers 4 weeks for free.
Example with Sally:
- Go to the site and create an account
- Navigate to “Upload recording”
- Select your file and start uploading
Step 6 (automatic): Upload a File and Have AI Transcribe
This step is surprisingly easy:
- Load audio file into the tool: As mentioned earlier, select your file in the chosen service. Make sure it’s the correct audio file. Confirm the upload.
- Set language (if necessary): Some tools detect the language automatically; others require you to manually select (e.g. “German”). Ensure the correct language is set to optimize recognition.
- Start transcription: Click the appropriate button — often labeled “Transcribe” or “Start”. Now the magic begins: the AI processes your audio. This takes a few minutes, depending on the service and length of the file. A 1-hour recording usually takes just a few minutes.
- Wait patiently: You can do something else in the meantime, but stay nearby. Most services display progress or send an email once the transcript is ready.
- Get transcript: When the process is done, a text editor window typically opens on the web, displaying the transcript. Alternatively, you can download the file (e.g. as a Word document).
Congratulations — you’ve generated an automatic transcript in a fraction of the time manual typing would’ve taken. But beware: raw AI output isn’t perfect. Continue to Step 7!
Step 7 (automatic): Review and Correct the AI Transcript
Even though AI gets a lot right, you shouldn’t blindly trust the text. Now you step in to improve transcript quality:
- Read the transcript carefully: Identify obvious mistakes. For instance, names — the AI might spell “Mr. Meier” as “Herr Mayer.” English words or terms like “COVID-19” are also prone to errors. Get a quick sense of where things went wrong.
- Play problematic passages: Tools like Sally let you click on words to hear the corresponding audio. Use this: If a sentence seems odd, listen and correct what was really said.
- Formatting and names: If the AI labeled speakers as “Speaker 1,” “Speaker 2,” etc., replace them with real names (if known). Also, Add paragraphs to improve readability. Some tools auto-recognize speakers — verify and correct if needed.
- Check technical terms: AI often mishears specialized language. For example, “transcription rule” could become “transcribing regularly.” Watch for odd phrases. In some tools (like Sally), you can even define technical terms in advance to improve recognition.
- Remove filler words if needed: Depending on your goal, you may want to delete “um,” “so to speak,” repeated words, etc. Some tools clean these automatically; others leave them in. If you’re aiming for a polished transcript, clean up these bits.
- Mark unclear areas: As in manual transcription, if the AI flagged sections as “[inaudible]” or similar, listen again. If you still can’t understand it, leave the marker or replace with “[unintelligible]”.
- Final run-through: Ideally, listen to the full audio once more — perhaps at 1.5x or 2x speed — while reading along. This lets you quickly catch and fix big mistakes. Most of the transcript should be solid, but this step ensures quality assurance.
After these corrections, your transcript should be reliable and clear. Now save or export the final version — and you’ve successfully created a finished transcript with AI and your own expertise.
Save, Export, and Use
Whether manually or automatically, once your transcript is complete, here’s what to do next:
Save the end result: Preferably as a Word document (.docx) or PDF, depending on what suits your needs. That way, you can reopen and reuse it easily. Give the file a clear name (e.g., “Transcript Interview Reed 2025.docx”).
Make a backup: Save a copy on a USB stick, in the cloud, or email it to yourself. Better safe than sorry.
Usage: Now you can make use of your transcript. For example:
- Browse: Use Ctrl+F to search for keywords — a major advantage over raw audio. You’ll find statements quickly.
- Quote: Need it for a report or article? Simply copy and paste the relevant text.
- Share: Forward it to colleagues, your professor, or others who need access.
- Generate subtitles: If the transcript is from a video, you can export it as a subtitle file (e.g., SRT). Many services offer this format directly.
Observe data protection: If the content is sensitive, remember to delete files after use or protect them properly. Especially when using cloud services, it’s a good idea to delete audio and text from your online account if you don’t want them stored permanently.
Now you’ve successfully transcribed audio! Whether you typed it yourself or let AI do the heavy lifting, you’ve turned spoken words into written text.
Transcribing Audio for Beginners — A Few Final Tips
- Start with short recordings to get a feel for the process. Transcribing a 5-minute voice memo is great practice.
- Increase gradually to longer recordings so you don’t overwhelm yourself.
- Be patient with yourself: Transcribing is a skill. With just a few tries, you’ll become faster and more accurate. Promise!
We hope this step-by-step guide has helped ease your concerns and shown you that transcribing is absolutely doable.
Good luck and enjoy creating your own transcripts! You can try our service Sally for free by the way.
Test Meeting Transcription now!
We'll help you set everything up - just contact us via the form.
Test NowOr: Arrange a Demo Appointment