Using the Spotlight Importer to Extract Text from a PDF

Ever have a PDF that you need to grab a bunch of text from and don’t want to spend a lot of time selecting and copy/pasting? You can use the Spotlight Importer command-line utility to extract the text on OS X.

/usr/bin/mdimport -d2 /path/to/pdf/file.pdf >& pdf.txt 

It’s the raw text that Spotlight is using to index the file, so you’ll need to do some clean up, but your mouse hand will thank you.

So, What’s This Final Cut XML Stuff Good For?

Since I’ve been posting about Final Cut XML for a while, I’ll occasionally get a, “so this Final Cut XML stuff seems pretty cool, but I can’t imagine what kind of stuff I might do with it” comment. And I’ll admit, it can seem sort of esoteric. You’ve got this spec and cool new stuff like Apple Event support but until you start to see practical examples you can’t wrap your mind around it (although I’ve tried to start the ball rolling with things like my sequencing utility).

Here’s a practical example to start your brainstorming. What if you took David Thorpe’s Compressor API Framework which is an Objective C framework for programmatic access to Apple’s Compressor application and combined that with the Apple Watch Folder example and created a workflow that watched for things like markers indicating an after hours script should create…

  • automatic client approvals for your extranet
  • content for your web site
  • DVD content

..and that’s just a for instance. There is really a ton of juice here and the tools for technically minded folks with help from people like David to really get a lot of “what ifs” off the ground.

