Converting OneNote to Markdown for Obsidian

After evaluating Obsidian and considering migrating from OneNote to Obsidian I had to figure out how I could migrate my content. That is the topic of this post. Migrating my notes.

As of 2024-03-18, the auto-conversion process has been a major pain and largely unsuccessful. I will try to highlight key points as I understand them in case you are in my boat and trying to migrate. The export options have degraded over time and has made things difficult. The disappearance of .docx as an export format in particular I think really hurt the process as a lot of the existing options seems dependent upon this.

This whole experience is solid grounding that my choice to get away from proprietary formats is right for me. But it's a personal decision. While I absolutely love the tool, what is their next change going to be and will I be able to move at that point if that change doesn't work for me? There have already been a few WTF moments where I thought I had lost everything over the years. At least with open formats I don't risk losing access to my own data or having to paying some fee.

Backup and Fail-safe

Using the OneNote App downloaded and installed locally (Not 365 not web) allowed me to export notebooks to PDFs. They look fairly intact to me. I fell I can always go back to these if I need to. In fact this may be enough for you. Just start Obsidian fresh. Out with the old an in with the new. This removes all technical cruft and ensures the most compatible smooth path forward. That is if you can live without those notes in Obsidian or are willing to rewrite them.

The Easiest - Technically Speaking

If you just want something easy and don't mind time consuming, good old copy & paste does a pretty good job compared to most. OneNote Ctrl-A select all Ctrl-C copy > Obsidian Ctrl-V paste. Just be sure to double check everything comes over. Images were often borked. Often MOST of the images were replaced with garbled text. But it seemed like this was the result of some formatting issue in the .md file and once I fixed the first broken image all the ones below were fixed. Some times I was able to simply backspace at the right spot to fix the issue. But I could not find a reliable spot to backspace. So I suspect if we could fix the paste or find a reliable fix for the formatting issues of images we could have a fairly simple path. So this is easy but time consuming. Though, I don't trust ANY of the methods fully, so if the data is important you had better review in detail anyway.

The Most Complete - Not all Text

pdf2docx This maintained the tables and images of the methods the best. and the text is arguably the easiest to copy reliably. So this might be a winner. But there were obvious blocks of missing text. YMMV.

The Most Complete - No Tables

There is probably a better way to do this but this is what has yielded the best results so far as having the most complete and accurate copy of the original data. Though it still lost my Tables. So if the tables are more important you might favor other methods. Hold on to your hats folks. The ride is about to get bumpy. See the various sections below for more details. What I did:

  1. Export PDFs On Windows via OneNote App (Not the Online Office Version). Exported each NoteBook as a single PDF. Highlight Notebook to Export > File > Export > Notebook > PDF
  2. Add a Cover Page to PDF Open the PDF in Libre Office Draw. Added new Page 1. Grabbed a screenshot and pasted into Page 1. Export as PDF with 1. Lossless Compression 2. unckeck Reduce image resolution. (Unless you want to make them smaller). I added "draw" to the filename (or new folder) and removed spaces OrigFilenameDraw.pdf. Technically you can skip this step. For mine I lost the first page as it convertted it to the cover page of the "book". YMMV. But when I added the "cover" page I shows up twice. So I may be misunderstanding what is going on. Better twice then lose things.
  3. Import PDF into Calibre Calibre For some reason this failed until I used the FileSystem rt-click to Open In > Other Application > E-book Viewer.
  4. Calibre Convert Book Select Title > Convert > Individually > Output Format: DOCX (Upper Right) > Ok. I don't see anything for image quality.
  5. Save DOCX to Disk Select Title > Format "DOCX" (rt-Click) > Save DOCX to Disk.
  6. Open .docx save as .odt Open the .docx file in Libre Office Writer. Save As, change to .odt.
  7. Convert ODT to MD using Pandoc Pandoc pandoc --extract-media ./<foldername_for_images> -i filename.odt -o filename.md e.g. pandoc --extract-media ./attachments -i AWSServerlessDraw.odt -o AWSServerlessDrawOdt.md
  8. Copy to Vault Via the Filesystem I copied the directory and the .md file to my vailt inside a folder called OneNoteConverted and pasted them in. They automatically loaded into Obsidian fore review.
  9. Deal with Tables As I said in the opening paragraph, this method still doesnt handle tables well. It might make sense to also use one of the methods below that does the tables well and copy those over. Use the original PDFs as your reference for both. Perhaps pdf2docx

More Details and Conversion Methods

Loosely in order from best (top) to worst:

  • Calibre Calibre (Brief, more below) - Starting with Exported PDFs from OneNote App. Pay note of image quality reduction at each step and set to lossless or 100%. Open in Libre Draw. Open PDF in Calibre. If it won't read it open it in Calibre Viewer from FileSystem. Convert Individually.
  • Copy and Paste - I've had decent success with just copying and pasting. I use the online web onenote.com from Linux. But any OneNote should work. Scrolling the entire page to load it. Ctrl-A to highlight everything. Ctrl-C to copy. Ctrl-P (Or rt-click) to paste. Some images come over as images and some are garbled text. Be sure to switch to reading mode before deciding they are hosed. Ctrl-E to toggle views in Obsidian is your friend. The work around is to carefully use a screenshot tool ala "snipping tool" in Win to "flatten" the image (remove layers and just give a bitmap) then paste the image into Obsidian and then delete the bad text. After deleting the bad text version, toggle views and recheck as the others often got better and I only had to do the first bad one to fix them all. I suspect the issue is really just one of poor text formatting from special chars in the image. I was able to successfully a few by simply backspacing where the "colored bar of the left" starts. YMMV. If we could find a reliable way to detect this we could macro or regexp a fix. The bad about C&P process is it takes a long time. It wouldn't be too bad if we did not have to fix images. The good part is it works pretty will, all attachments go to the right place per settings, and you know you get everything by going slow and being careful. But probably not feasible for large scale data.
  • OneNote App Download - Windows. PDF Export. Exported entire notebooks to PDF. Highlight Notebook to Export > File > Export > Notebook > PDF. Decent fail-safe option to have a copy of notebooks if you are willing to use Windows and install the OneNote app locally. THis is NOT the online Office 365. Pay attention to Portraight vs Landscape? Check after for all content? I should point out there used to be a .docx export in 2016 that seems no longer available in this version.
  • Obsidian Importer for OneNote Importer - This was probably more trouble than it was worth for me. GIve it a shot. Don't put images into the folders. It repeats ALL images in ALL folders. Start small. Just convert 1 section at a time maybe even a page. I had a lot of things that looked converted but were empty or not complete. Go slow and double check everything. But since I had to do that much effort, why not just copy and paste?
  • OneNote Online onenote.com - Go to a page, File>Print This Page. Some times this only prints the first page not the entire page? I also used Ctrl-A then Ctrl-P allows printing a single page only as far as I can tell. Is this possibly a postscript conversion to print prior to saving as PDF and if so is it a better output? Can also export ".one" files if using onenote.com but not if launched from OneDrive online. These can be used in online converters. TBH I still have problems finding the export . OneNote.com web > NoteBook Changer (Dropdown @ top left) > More Notebooks >Rt-Click > Export Notebook. outputs .one file.
  • Online Converters to PDF - meh. take a long time, the site gets to see your content, only PDFs. But no need to have Windows or install OneNote app.
  • Online Converters to MD -
    • https://products.groupdocs.app/conversion/one-to-md - VERY slow. 2 section notenook. 1 of them empty. The other had only 1 page with less then 1 KB of text and only 1 image took. Gave up after like 20 minutes later. A different .one failed to convert. YMMV. Maybe others out there.
  • Pandoc Pandoc - Pandoc is a general purpose document converter. Unfortunately it doesn't allow PDF as an input fomat But if we can somehow get it to one of the other input formats...
  • GitHub / node / python ... - OneNote to Markdown Converters - lots of poorly documented paths that ultimately got me nothing. Many require OneNote 2016 which is not the Downloadable format. Some require Word download (Which is only available via Office 365 download?). Lots of possibilities here but most have to be figured out and don't really give enough details. I suppose they are not without risk either you could always be downloading something nefarious. Caveat Emptor.
    • https://gitlab.com/pagekey/edu/onenote-to-markdown - Neither python nor powershell version worked on Win 11 with OneNote App Local. Might require OneNote 2016. Might require Word local so Docx works. They are not mentioned in the requirements. With lack of troubleshooting in the docs its hard to guess what to do next. Word Download wanted a credit card for a "trial". I bailed on that. YMMV. I think this is the video
    • https://github.com/theohbrothers/ConvertOneNote2MarkDown - this one has a lot of good error resolution tips if you do go down the windows side of things rabbit hole.
    • marker this looked very promising but I could not get it installed and working. Evidently python dev has gotten a bit crazy. Poetry vs venv? Pip vs pipx? WTH is going on python?!?! I gave up because I couldn't even get this to work.
    • pdf2docx This actually worked pretty well and was fairly simple to install and use. The downside is that it was missing blocks of text. But the images and tables seemed intact. Since the text is the easiest to copy and paste reliably this might be another work around. Convert then copy missing text from PDF. Then we should be able to use pandoc. Would a side by side tool help here? ala VSCode. Only if it can open both formats.
    • pdf2md - Only text outputs. Tables flattened to just text. I used the node command line instead of npx to get it working.

Programs

  • Calibre Calibre - This mostly works. But some text gets turned into images. I dragged the PDF into the view and tried to convert but it doesn't recognize the format as something it can use. I noticed "E-Book Viewer". It is able to read the format. Though it loses some things like the table but it looks otherwise intact with pics. Though I see no way to export. Note the "inspector" gives access to HTML. Interesting, after I use the viewer I tried to import again. It now reads the PDF. Rt-click (PDF)>convert>individually>Page Setup. No MD output formats. Using the default (Generic E-ink?). Okay everything is there but some of the text items looks like they were turned into images (I.e. ) but it is the first page and it save "cover". can we trick it by giving it an actual cover? Rt-Click the blue Format: "EPUB" and save to disk. Pandoc can convert with images. pandoc --extract-media ./AWSServerless -i AWSServerless.epub -o AWSServerless.md file renamed to remove spaces. If spaces use single or double quotes around filenames. Maybe we can use Scribus or Libre Office Draw to add a cover page to the PDF. Draw > new Page > Screenshot>Export PDF. Calibre > Import. We DO get a cover page. Hmm we actually get it twice. Convert to EPUB again. So after finally dragging these into Obsidian it's not perfect. There is a DROPdown at the top of Convert for output format and DOCX/HTML etc. May lead to more options.
  • Libre Office Draw - The file opens up well here but I find no converters from this format and each piece of test is completely separate draw canvas and too much work to separately highlight each block.
  • VSCode Plugins - VSC will open the PDFs and shows the binary data. Searching for "pdf", I don't see anything relevant by reading the descriptions but they are not always complete. Lots of Markdown to PDF but not the latter.
  • AbiWord - Linux installed AbiWord and AbiWord to OpenDocument converter (abw2odt ) via package manager. It opend the PDF but it only contains the text.
  • E-Book Editor by Kovid Goyal - no luck opening the file.

File Formats

  • .docx export no longer seems supported. It seems like the missing piece of glue that is reliable here and what many of the converts require. Might be a way to code against core WIndows object models but may require install of Word to get this installed or a dev license to include them into a binary package. But that may be against "allowed". For example this csharp(?) repo seems to talk about it but I cannot figure out how to run this. onenote2md maybe needs to be compiled or perhaps this relates to dotnet System.Command
  • PDFs Quote "PDF is a really, really bad  format to use as input" I think the comment here about the embedded images sometimes actually being OCR embedded text also sounds like an explanation of the poorly pasted OneNote images.
  • .ONE - the file format seems documented here .ONE

Other Ideas
- Use JavaScript to isolate and extract the Live HTML WebPage generated from OneNote.com online views to convert HTML to MD. Not sure exactly where to begin here but this seems highly interesting idea. Run with it please.
- Power Automate on Windows
- AutoIT
- GUI Based UI automation. Perhaps something that can automate the UI might work around the embedded images issues.
- Digging into the object model of OneNote formats. Using a Windows based language to use and include the libraries into a binary. That is if you can get those without a paid developer subscription.
- Continue down the Github Rabbit Hole
- Try GutHub copilot to see if AI could help generate a viable solution.
- Try other programming libraries to extract PDFs to MD