Tuesday, October 11, 2011

Getting the Text - Harder Than it Sounds, Way Easier Than I Thought

Hey all, Caty here, with some thoughts about the process of grabbing text for our ebook!

So, the three poems that we picked for our anthology are The Raven (Poe), Haunted Houses (Longfellow), and Darkness (Byron). In the process of finding the text for the poems online, I tried a few different things before figuring out the most efficient way to do things. Because I've been spending my nights sleeping in McPherson Square as part of Occupy DC, I've had to put my parts of the project together on many different machines, many different programs, at many times of day and in many different kinds of places - every time I forgot to e-mail something to myself, I had a whole new opportunity - as I came to see it - to find a better way to get the text than the way I had gotten it before.

My first "strategy" was to copy and paste the text and then mark it up in HTML. This took a million years and was boring, and realizing I hadn't e-mailed myself the document when I went to work on it the next day was a huge bummer. My next idea was to grab the text and then load it into Sigil, using their WYSIWYG editor to put in the line breaks and such. This was kind of a pain, since a hard return in most WYSIWYG editors gives you a paragraph break - not ideal for coding poetry!

When the computer onto which I had downloaded Sigil - and, of course, my files - was donated to our tech team at McPherson Square, I had to start all over again, all over again. But this time I had a belated flash of brilliance - open up the poems in Firefox, find the div containing the poem text, and lift the entire div! This made things way easier. When combined with the discovery of Smultron's awesome "automatically create all that scary info at the beginning of an XML doc" feature, this technique reduces a project that had taken me a couple of hours down to a matter of mere minutes!

This is another example of one of the most amazing features of the interwebs - the fact that so much of the work is already done for you. Some people might see that as a negative or a cheat of some kind, but I think that it speaks to the true nature of the web as a tool for creative expression. When you don't have to spend two hours going through putting
at the end of every line in a 200 line poem, you have a lot more time for things like coming up with good metadata, a beautiful CSS, etc.

Over and out!

No comments:

Post a Comment