title: Converting OneNote to Markdown for Obsidian
After evaluating Obsidian and considering migrating from OneNote to Obsidian I had to figure out how I could migrate my content. That is the topic of this post. Migrating my notes.
As of 2024-03-18, the auto-conversion process has been a major pain and largely unsuccessful. I will try to highlight key points as I understand them in case you are in my boat and trying to migrate. The export options have degraded over time and has made things difficult. The disappearance of .docx as an export format in particular I think really hurt the process as a lot of the existing options seems dependent upon this.
This whole experience is solid grounding that my choice to get away from proprietary formats is right for me. But it's a personal decision. While I absolutely love the tool, what is their next change going to be and will I be able to move at that point if that change doesn't work for me? There have already been a few WTF moments where I thought I had lost everything over the years. At least with open formats I don't risk losing access to my own data or having to paying some fee.
Using the OneNote App downloaded and installed locally (Not 365 not web) allowed me to export notebooks to PDFs. They look fairly intact to me. I fell I can always go back to these if I need to. In fact this may be enough for you. Just start Obsidian fresh. Out with the old an in with the new. This removes all technical cruft and ensures the most compatible smooth path forward. That is if you can live without those notes in Obsidian or are willing to rewrite them.
If you just want something easy and don't mind time consuming, good old copy & paste does a pretty good job compared to most. OneNote Ctrl-A
select all Ctrl-C
copy > Obsidian Ctrl-V
paste. Just be sure to double check everything comes over. Images were often borked. Often MOST of the images were replaced with garbled text. But it seemed like this was the result of some formatting issue in the .md file and once I fixed the first broken image all the ones below were fixed. Some times I was able to simply backspace at the right spot to fix the issue. But I could not find a reliable spot to backspace. So I suspect if we could fix the paste or find a reliable fix for the formatting issues of images we could have a fairly simple path. So this is easy but time consuming. Though, I don't trust ANY of the methods fully, so if the data is important you had better review in detail anyway.
pdf2docx This maintained the tables and images of the methods the best. and the text is arguably the easiest to copy reliably. So this might be a winner. But there were obvious blocks of missing text. YMMV.
There is probably a better way to do this but this is what has yielded the best results so far as having the most complete and accurate copy of the original data. Though it still lost my Tables. So if the tables are more important you might favor other methods. Hold on to your hats folks. The ride is about to get bumpy. See the various sections below for more details. What I did:
pandoc --extract-media ./<foldername_for_images> -i filename.odt -o filename.md
e.g. pandoc --extract-media ./attachments -i AWSServerlessDraw.odt -o AWSServerlessDrawOdt.md
Loosely in order from best (top) to worst:
Ctrl-A
to highlight everything. Ctrl-C
to copy. Ctrl-P
(Or rt-click
) to paste. Some images come over as images and some are garbled text. Be sure to switch to reading mode before deciding they are hosed. Ctrl-E
to toggle views in Obsidian is your friend. The work around is to carefully use a screenshot tool ala "snipping tool" in Win to "flatten" the image (remove layers and just give a bitmap) then paste the image into Obsidian and then delete the bad text. After deleting the bad text version, toggle views and recheck as the others often got better and I only had to do the first bad one to fix them all. I suspect the issue is really just one of poor text formatting from special chars in the image. I was able to successfully a few by simply backspacing where the "colored bar of the left" starts. YMMV. If we could find a reliable way to detect this we could macro or regexp a fix. The bad about C&P process is it takes a long time. It wouldn't be too bad if we did not have to fix images. The good part is it works pretty will, all attachments go to the right place per settings, and you know you get everything by going slow and being careful. But probably not feasible for large scale data..docx
export in 2016 that seems no longer available in this version.Ctrl-A
then Ctrl-P
allows printing a single page only as far as I can tell. Is this possibly a postscript conversion to print prior to saving as PDF and if so is it a better output? Can also export ".one" files if using onenote.com but not if launched from OneDrive online. These can be used in online converters. TBH I still have problems finding the export . OneNote.com web > NoteBook Changer (Dropdown @ top left) > More Notebooks >Rt-Click > Export Notebook. outputs .one file.pandoc --extract-media ./AWSServerless -i AWSServerless.epub -o AWSServerless.md
file renamed to remove spaces. If spaces use single or double quotes around filenames. Maybe we can use Scribus or Libre Office Draw to add a cover page to the PDF. Draw > new Page > Screenshot>Export PDF. Calibre > Import. We DO get a cover page. Hmm we actually get it twice. Convert to EPUB again. So after finally dragging these into Obsidian it's not perfect. There is a DROPdown at the top of Convert for output format and DOCX/HTML etc. May lead to more options. Other Ideas
- Use JavaScript to isolate and extract the Live HTML WebPage generated from OneNote.com online views to convert HTML to MD. Not sure exactly where to begin here but this seems highly interesting idea. Run with it please.
- Power Automate on Windows
- AutoIT
- GUI Based UI automation. Perhaps something that can automate the UI might work around the embedded images issues.
- Digging into the object model of OneNote formats. Using a Windows based language to use and include the libraries into a binary. That is if you can get those without a paid developer subscription.
- Continue down the Github Rabbit Hole
- Try GutHub copilot to see if AI could help generate a viable solution.
- Try other programming libraries to extract PDFs to MD