Item5492: Black font tags are usually unwanted

Priority: Enhancement
Current State: Confirmed
Released In:
Target Release: n/a
Applies To: Extension
Component: WysiwygPlugin
Branches: master
Reported By: TWiki:Main.MartinCleaver
Waiting For:
Last Change By: CrawfordCurrie
When pasting from word or open office I often get font tags in the output. This is very annoying, especially when they are black fonts, or excessively pre, something like this:

1 <pre><font color="#000000"><font size="2">Login details</font></font></pre>
1 <pre><font color="#000000"><font size="2">Web interface</font></font></pre>
1 <pre><font color="#000000"><font size="2">Hosting Account</font></font></pre>
1 <pre><font color="#000000"><font size="2">SSH level</font></font></pre>
1 <pre><font color="#000000"><font size="2">Web admin level</font></font></pre>
1 <pre><font color="#000000"><font size="2">domain set-up</font></font></pre>
1 <pre><font color="#000000"><font size="2">mysql interface</font></font></pre>

A couple of iterations through WYSIWYG and TML and, by the magic of, we end up with something like:
   1 <pre>%BLACK%All the functionality in mrjc-feedwordpress-filters &ndash; can we find something that does similar but:%ENDCOLOR%</pre>
      1 <pre>%BLACK%This might be to merge the functionality of mrjc-feedwordpress-filters into feedwordpress%ENDCOLOR%</pre>
         1 <pre>%BLACK%Do they have a wiki?%ENDCOLOR%</pre>
         1 <pre>%BLACK%Is the code base open yet?%ENDCOLOR%</pre>
      1 <pre>%BLACK%Has manageabilty by administrator%ENDCOLOR%</pre>
         1 <pre>%BLACK%Has a UI for both the manager and the user%ENDCOLOR%</pre>
         1 <pre>%BLACK%Able to select what items get filtered in%ENDCOLOR%</pre>
         1 <pre>%BLACK%User gets the opportunity to select what categories, keywords they want%ENDCOLOR%</pre>
         1 <pre>%BLACK%Should be able to retrospectively rescan the content.%ENDCOLOR%</pre>
      1 <pre>%BLACK%Can show a table of all blogs syndicated, the number of entries, whether the feed is automatically updating and the date of last update. Also the ability to kick off another update%ENDCOLOR%</pre>
      1 <pre>%BLACK%Can filter in extra categories / tags%ENDCOLOR%</pre>
         1 <pre>%BLACK%User should be able to state what tags should get added for all their postings%ENDCOLOR%</pre>
         1 <pre>%BLACK%Administrator should have the %ENDCOLOR%</pre>

If you can't see the %BLACK % tags in the above, see and have some related entries.


  1. Is there an existing sequence of tinymce button clicks that normalises this? Is this something that can be done with
  2. If its not a job, where would code go to fix it?
  3. What approach should someone take? Is this a Javascript task?
  4. How much work might it be to implement? I'd consider doing it myself or paying a student to do if its fairly trivial.


Crawford added: Irritating, isn't it? Unfortunately the only filtering mechanism I was able to implement stops at the "are there any attributes" level; it can't parse attributes to determine if they have any value (and frankly, in the above case, the decision can't be made without parsing and analysing the CSS as well). Add to this the fact that M$ make heavy use of "M$ private" attributes that are only meaningful to IE.....

My advice is to do what I have done, and develop whatever bespoke pre-filters work for you (and most especially your client's HTML, as it can differ quite widely depending on how Word is used). I personally use HTML::Parser to create an in-perl DOM tree that I run plug-in analysers on to remove such no-op constructs.

-- TWiki:Main.MartinCleaver - 02 Apr 2008


> I'd consider doing it myself or paying a student to do if its fairly
> trivial.
Trivial to do trivially (regexes). Harder to do properly (HTML::Parser). Very, very difficult to do well (generic no-op detection and rewriting).

-- TWiki:Main.MartinCleaver - 02 Apr 2008

Of course, you would also have to detect this case: Some blue some green some black some more green some more blue .
%BLUE% Some blue %GREEN% some green %BLACK% some black %ENDCOLOR% some more green %ENDCOLOR% some more blue %ENDCOLOR%
Tricky, huh?

-- TWiki:Main.CrawfordCurrie - 04 Apr 2008

Another way would be to have a "strip fonts" control available to the user; that could work on either the entire or selected part of the textarea.

I wondered whether the search and replace widget could be made available in plain text mode, but the regexes needed to remove fonts is probably beyond most users anyway.

-- TWiki:Main.MartinCleaver - 13 Apr 2008

I note that talked about an inverse problem: at some point WYSIWYG plugin was stripping font tags, supposedly because KEEP_WS was not set. (I say supposedly because KEEP_WS seems to influence whitespace).

-- TWiki:Main.MartinCleaver - 13 Apr 2008

At a JS level, this might help:

-- TWiki:Main.MartinCleaver - 13 Apr 2008

The plugin does strip font tags, if you tell it to. As I said, there are controls that allow selection of tags to strip based on what attributes they have, but you want something more; you want to strip tags based on the value of an attribute and that is much harder.

This is an enhancement, not a bug.

-- CrawfordCurrie - 16 Apr 2008

ItemTemplate edit

Summary Black font tags are usually unwanted
ReportedBy TWiki:Main.MartinCleaver
Codebase 4.1.2, 4.2.0
SVN Range TWiki-5.0.0, Sun, 09 Mar 2008, build 16496
AppliesTo Extension
Component WysiwygPlugin
Priority Enhancement
CurrentState Confirmed
Checkins distro:91888e717503
TargetRelease n/a
CheckinsOnBranches master
masterCheckins distro:91888e717503
Topic revision: r8 - 20 Jan 2015, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy