Friday, September 12, 2008

Unbreak Copied Text From PDF Documents

Unbreak Copied Text From PDF Documents
Posted by Martin in Windows, software original posted at: link
Tags: , , , ,
Users who want to copy and paste text out of pdf documents might have noticed that the text in the destination document will have line breaks just like the original pdf document had. This is usually something that is not wanted and while it is not a big problem to remove the line breaks manually when short paragraphs have been pasted it becomes a bigger problem for longer texts.
Auto Unbreak is a small 22 Kilobyte tool that has only one purpose. It takes text from pdf documents and removes the line breaks of those texts before it provides the user with an option to copy the newly formatted text to the clipboard again.
Auto Unbreak is a portable application that can be executed from any location of a computer system. It ships with two files that define merge and exception rules which might come in handy for users who deal with specifically formatted text.
The rule files can be edited in every text editor.
The homepage of the developers have been suspended, please download the tool from this link. It is temporarily hosted here at Ghacks until the developers announce their new website.

Microsoft Office dumped by Science and Nature

Publicado originalmente por ZDNet Australia link 18 June 2007 10:03 AM
Aaron Tan, ZDNet Asia


Respected academic journals Science and Nature will no longer accept manuscripts written in Microsoft's Office 2007 suite.
The decision was made because the latest version of Word is no longer compatible with Mathematical Markup Language (MathML), the de facto standard for writing equations in text documents, according to recent notices posted on the Web sites of both Science and Nature journals. In Office 2007, Microsoft's own Office MathML (OMML) is used for equations.
"Because of changes Microsoft made in its recent Word release that are incompatible with our internal workflow, which was built around previous versions of the software, Science cannot, at present, accept any files in the new .docx format produced through Microsoft Word 2007, either for initial submission or for revision," Science journal stated on its site.
Likewise, Nature said: "It currently cannot accept files saved in Microsoft Office 2007 formats [because] equations and special characters -- for example, Greek letters -- cannot be edited and areincompatible with Nature's own editing and typesetting programs." Murray Sargent, an Office software development engineer, noted on a Microsoft developer blog that Microsoft had looked at the need to maintain robust performance when it chose to integrate its OMML instead of MathML.
Sargent said: "Naturally there's been a lot of discussion as to why we even have OMML, since MathML is really good." to include Word-oriented features, such as images, comments, revision markings and formatting into maths zones, but MathML is geared towards allowing only mathematical data in maths zones. Maths zones are areas in which users can input mathematical components and equations. "A subsidiary consideration is the desire to have an XML [document] that corresponds closely to the internal [standard] format, aiding performance and offering readily achievable robustness," he said, adding that since both MathML and OMML are XML-based, they can be converted from one into the other. "So it seems you can have your cake and eat it too," Sargent said. However, Science maintains that Word 2007 users should be aware that equations created with the default equation editor included in Microsoft Word 2007 and used in revisions will not be accepted by the academic journal, "even if the file is converted to a format compatible with earlier versions of Word". Science said this is because "conversion will render equations as graphics and prevent electronic printing of equations and because the default equation editor packaged with Word 2007 -- for reasons that, quite frankly, utterly baffle us -- was not designed to be compatible with MathML". Responding to the issues highlighted by Science and Nature, Sargent said in a separate blog posting that Word 2007's new mathematical facility is a huge improvement over previous approaches. "But anytime such big improvements occur, there can be, and evidently are, problems with upgrading," he said. "I think the trouble is well worth it in both user convenience, and the marvellous typographic quality." Microsoft was unable to respond by press time.
However, he said, the main problem is that Word needs to allow users