Thursday, 7 May 2009

Strip Formatting on Paste using YUI Rich Text Editor

I’ve been using the Yahoo User Interface Library (YUI) in my web app, and one particularly cool component is the YUI Rich Text Editor: its cross-browser compatible, fully extensible, and best of all it’s free :)

For all its greatness, one thing I’ve struggled with is that if you copy & paste stuff from another app into the YUI Editor, all of the original formatting is maintained. Most of the time for us, the “other app” is Microsoft Word, which does a particularly heinous job of generating HTML from a formatted document. This almost always wreaks havoc if the user subsequently tries to change text styles in the editor, as the underlying HTML is a total mess.

So, the solution for us was to try and strip out all of the formatting when somebody pastes stuff into the Editor, resulting in nice clean HTML that plays well with the YUI Editor formatting functions. Now, unfortunately no such feature exists in the YUI Editor, and Googling around just led to dead ends, so I was left to build my own.

CleanPaste for YUI Rich Text Editor

Download from CodePlex

For information on what’s supported, please read the notes on CodePlex, as I will keep this updated as I make bug fixes etc.

To use the CleanPaste script, follow these steps:

  1. Ensure you’ve already installed the YUI components & created a Yahoo Editor on your page.
  2. Place the CleanPaste.js file somewhere in your project.
  3. Include the script in your page using the following code:

    <script type="text/javascript" src="CleanPaste.js"></script>

    Ensure the src attribute points to the directory where the CleanPaste.js script is located.
  4. In the Javascript where you create your Yahoo Editor object, create an instance of the CleanPaste object, passing in the editor as the parameter:

    var myEditor = new YAHOO.widget.Editor('editor', myConfig); myEditor.render();

    var cleanPaste = new CleanPaste(myEditor);

That’s it, the editor should now strip the formatting out of pasted text.

Of course this is still under development, so if you have any problems or feedback please post on CodePlex and I will get back to you.

Cheers, Anthony.

64 comments:

Anonymous said...

Hello, my name is James Star. Works well on IE, but not Google Chrome. Appears to duplicate. Is this a major fix?

Anthony said...

Thanks for the feedback. I'll test in Chrome and see what I can do. Anthony.

Anthony said...

The issue with Google Chrome has been resolved, you can download the latest library from CodePlex. Anthony.

Anonymous said...

Anthony - you are brilliant :-)

Anonymous said...

is it possible to use this without including the entire YUI library? I primarily use jQuery, so it would be a big drag

Anthony said...

Yes, the CleanPaste script does not require any additional YUI libraries that the rich text editor doesn't already require. It really only needs the yahoo-dom-event library.

Anthony.

Andrei Iarus said...

Great job, Anthony :) Just a minor problem: could you replace the   in the copied text with spaces?

Andrei Iarus said...

Hello again,
I have some problems with Firefox (cur. last stable version: 3.0.10), as it does not strip the first ugly block. The problem appear when pasting from Word Viewer 2003. A patch/workaround for this to work in Firefox (and other stuff like HTML comments) is to replace in your code the last part with this (notice the whitespace \s char, which include the NewLine character):

html = html.replace(/<(\/)*(\\?xml:|meta|link|span|font|del|ins|st1:|[ovwxp]:)((.|\s)*?)>/gi, ''); //Unwanted tags

Andrei Iarus said...

Something else:

probably you should consider also this rule:
html = html.replace(/(class|style|type|start)=(\w*)/gi, ''); // Unwanted sttributes (class=CLASSNAME) (different from class="CLASSNAME")

Anthony said...

Hi Andrei, Thanks for contributing to the project. I've tested your code and committed a new release to CodePlex with the changes. Cheers, Anthony.

Dan Slack said...

Thanks for the script, really helped us out.

The only thing that I ran into was a timing issue in FF and IE when pasting.

No real testing, but, it seems when I pasted a large chunk of text, there was a chance that the timer would run before YUI had put the text into the container. This usually resulted in either a null container, or (strangely) an underscore (in FF).

Anyways, as a hack, I added a timer variable in the prototype, and wrapped the CleanPaste code in a try/catch. If an error was thrown, I incremented the timerCount, if it was less than 5 (an arbitrary value on my part), I reran the CleanPaste method. Seems to work fairly well.

Anthony said...

Thanks for the feedback, I'll put this fix into the next release. Cheers, Anthony.

DMc said...

Anthony, ran across this while working on my project. I was able to get it running, but had a question. I copied some paragraphs from Word into the editor and alerted the editors contents after the paste.

I still see some text like the following in some P tags:

P class=MsoNormal... text here etc

Are those Microsoft class names still supposed to show up in the filtered text?

Good job on the utility.

Thanks, DMc

Anthony said...

Things like class=MsoNormal should definitely be stripped out. If you paste your text into the example html file provided, does it also not work?

If it doesn't work, can you please add an issue using the issue tracker on CodePlex, and attach the word file that doesn't work so i can try and resolve.

DMc said...

Anthony, false alarm. I figured out the issue and it was on my end. What does your utility do that YUI doesn't do when setting the filterWord configuration attribute to true? I noticed your example also sets this attribute anyway, so I was just wondering.

Thanks, DMc

Anthony said...

The filterWord function built into the YUI editor only fires when a document is initially loaded into the editor, it does not fire when additional content is pasted in by the user.

Secondly, the filtering that the YUI component performs only removes a small amount of Word garbage, hence I had to add a bunch more filters to fill the gaps.

DMc said...

Awesome. Thanks again, DMc

Björn said...

Hi, thank you so much for this script. It really saved me here. It's very smart in that it only strips the garbage but leaves the good things (e.g. lists).

Mekon said...

Hi Anthony. Thanks for sharing this. It appears to be exactly what I'm looking for, but I can't find the download on CodePlex. Has it gone or am I looking in the wrong place?

Anthony said...

Hi. Click on the "Source Code" tab in CodePlex. From there you can download the latest copy. Cheers, Anthony.

JP said...

Hi Anthony,

This script is very useful as there are very little or no resources available to achieve the result by just using YUI RTE.

I have came across a particular test case in which your script fails completely. The div with an id 'Cleaner' is not being inserted as a result the whole code fails.

If I insert some content in the editor textarea that starts with a p element like the following <p> </p> followed by the actual content of the editor that I want to load, the script fails.

Christian Sonne Jensen said...

Hi,

Can you show an example on how you listen to the paste event on the YUI editor?

Anthony said...

Hi JP, can you please attach a sample HTML file that causes the script to fail using the Issue tracker in CodePlex. I'll then take a look. Anthony.

Anthony said...

Hi csjtheman, unfortunately there is no single "paste" event supported by the rich text editor, so I had to use a number of hacks to achieve the same thing. You can see this in the init() function.

That said, I could probably get my script to expose a paste event of its own if you think that'd be useful...

Christian Sonne Jensen said...

Thnak you for getting back so soon.

It would be extremely useful to me with a "paste" event, as I'm making a CMS system with the YUI editor as a base.

Right now, the users can paste all kind of weird html into the editor, and it will be displayed like that in view mode, ignoring all my carefully crafted styles :-)

So I - for one - would very much like to see such an event.

Kind regards,

Christian Sonne Jensen

Anthony said...

Hi Christian, I've added 2 events to the script: OnBeforePaste and OnAfterPaste. You can see how to use them in the Example.htm file provided with the script.

Anthony.

Unknown said...

Works great, thanks!

Unknown said...

Hey great script! But if I copy this code:

{intro assets}
The following questions are about assets and loans.

The space between 'and loans' is gone.

this happens once in every sentence .

Any ideas?

Anthony said...

I tested your sentence above but it appeared to work fine for me. Can you please attach the original HTML as a file using the Issue tracker in CodePlex and I'll try it again.

Anthony.

Unknown said...

Thanks Anthony, it worked like magic :) ..

Anonymous said...

Anthony, great job on the script! I have found an issue for me - the line to stop the context menu for mozilla browsers seem to be causing an issue in IE7 - it is clearing all the content when you right-click in the editor.

Anthony said...

Thanks for spotting this, I'll take a look and see what I can do. Anthony.

GuruFocus said...

When I copy a paragraph from Word, the sentences automatically wrap. How do I fix that problem?

Anthony said...

Hi GuruFocus, sorry I don't understand the problem. Perhaps you could take a screenshot of the problem and create a new issue on the CodePlex site. I'll then take a look.

Cheers,
Anthony.

GuruFocus said...

Thank you for the quick response. I copied a paragraph from MSWord, and posted, then I pasted in the editor, there is some format info left .. (You blog does not allow me to post those as it considers them as html code)

Also if I copy a paragraph like like:

MICHAEL HARTNETT: Yes, I think a little bit. Everything looks, technically, a little extended. But I’d still be a cyclical bull. I think the bottom line this year is, for me, the fundamental valuations are not so important. You just had an unprecedented bear market, an unprecedented macro meltdown, an unprecedented sort of policy response and I think you're in the midst of unprecedented rally in risk of the back of very oversold levels. And I think that we're not at the end of that risk rally and we won't be until the central banks end their quantitative easing policies.

It became:

MICHAEL HARTNETT: Yes, I think a little bit. Everything
looks, technically, a little extended. But I’d still be a cyclical bull. I
think the bottom line this year is, for me, the fundamental valuations are not
so important. You just had an unprecedented bear market, an unprecedented macro
meltdown, an unprecedented sort of policy response and I think you're in the
midst of unprecedented rally in risk of the back of very oversold levels. And I
think that we're not at the end of that risk rally and we won't be until the
central banks end their quantitative easing policies.

You can see it created a lot of new lines.

Maybe you can test it here:

http://www.gurufocus.com/test/test_yui_bbcode.php


Hope you can help. Thanks!

GuruFocus said...

Hi, Anthony!

I was using Firefox. When I use IE, it works perfectly.

Can you help with IE?

GuruFocus said...

Anthony,

If I use CleanPast in IE, it works fine. But in Firefox, the style, xml, object codes are still there. am I doing something wrong?

Any help is gratefully appreciated.

Unknown said...

Thanks for this script!

If you bold some text in the editor and apply formatting to it (let's say, bold) and then copy and paste that, it is inserted into the rte as strong. At that point you can't unapply the formatting since it is expecting a bold tag. Of course it will save it as a strong tag when you submit anyway...

In any case, fixed this by adding html = this.Editor._cleanIncomingHTML(html) after your html = this.Editor.cleanHTML(html);

Regards,
McKinley

Unknown said...

This code seems to invoke a problem if some text have been selected and the user left clicks to copy it.
then it replaces the selected text with a underscore. which is far from ideal.
Any suggestion how to avoid this behaivour?

SimonH said...

Great Stuff. Many thanks for this. I found that I got better results using the multi line replace described here:

http://wolfram.kriesing.de/blog/index.php/2008/javascript-multiline-replace

Anthony said...

Hi GuruFocus, sorry for the long delay. I tested the paragraph you sent me in Firefox (v3.0), however I did not experience the same problem. Perhaps you can attach a sample word document to the Issue Tracker in CodePlex and I'll try that.

Regarding your second problem about xml nodes etc still being there, it sounds like you have not correctly initialised the clean paste script. There is a sample HTML file included with the script, try pasting your text into that and see if the problem still occurs.

Cheers,
Anthony.

Anthony said...

Hi Lars, sorry for the delay. Can you tell me which browser & version you are using where the problem occurs?

Thanks, Anthony.

Unknown said...

Hey Anthony,

Using the latest FF and have confirmed it in FF 3.0 as well, I have identified that it happens when you use the contextmenu event.

Anthony said...

Thanks Lars. I tested in FF 3.0.15 (Windows) by left-clicking on some selected text, however it worked fine for me.

Are you on a different platform? Does the problem occur for you in the Example.htm provided with the script? Or only in your implementation?

Anthony.

Unknown said...

Anthony,

Thanks for your efforts with this they are most appreciated.

I'm getting similar behavior to Lars. I'm using IE 8.0.6 on WinXP with your unchanged example.htm
Select a word and right click it - the word is replaced with an underscore on a new line. Using the keyboard (Ctrl+X) works okay

FF 3.5.3 the right click is ignored
Chrome 3.0.1 - When I right click, the word is deselected and the context menu is not displayed
Opera 10.01 - Seems to work okay, although when I cut a word out using the context menu the text to the left of the cut word is shifted to the right effectively indenting the line?? weird??

All these borwsers work okay in a RTE without cleanpaste.

Cheers Al

Anthony said...

Thanks Allan. There is code in the script to try and disable the context menu, as it contains a Paste item which would bypass the script.

I'll take another look and see what's going on.

Anthony.

Unknown said...

Anthony,

It appears that with IE8 the onbeforepaste event fires on right click.

All the modern browsers that I've tested except opera 10 support the onpaste event

IE 8.06 - Yes
IE 7 - Yes
IE 6 - No
FF 3.5.3 - Yes
Chrome 3.0.1 - Yes
Opera 10.01 - No
Safari (on a Mac) - Yes

Would it be possible to test for onpaste event support before disabling the context menu?

Cheers Al

Paul Yu said...

Anthony. I'm not a js programmer. I'm trying to use your code in an app that has multiple YUI editors on a page. I can see the CleanPaste.js in the header, but when I paste into the editor the text is not getting cleaned.

Anthony said...

Hi Paul. See the 4 steps at the top of this post. For each YUI editor you need to add this line of javascript:

var cleanPaste = new CleanPaste(myEditor);

Where "myEditor" is the variable name of the editor.

Cheers,
Anthony.

Anonymous said...

Hey Anthony,

first I'd like to thank you for this great addon for the YUI RTE. It's very bad how often people copy and paste some texts from Word documents into online Richtext editors.

I have an enhancement for one regular expression found in the CleanHTML method of the CleanPaste addon:

At line 127 the RegExp might be extended to

html = html.replace(//gim, ''); // HTML comments

This will include conditional comments and multi line code in between comment lines.

Tested it in Firefox 3.6

Best regards,

tommy

Francisco said...

Regarding the comments stripping, that regexp worked much better for me

/<(?:--[\s\S]*?--\s*)?>\s*/gi

Regards.

Anonymous said...

does it support simpleeditor as well?

Anthony said...

Yes.

Lost Soul ( Nitin Pande ) said...

Heya .. thanks a lot for this script!
One issue i am facing is with IE8, wherein if i do a CTRL+V (it works on Context menu paste) and paste a text into an editor that already has some text, it is giing a JS error saying Object required (line 85). Also sometimes if there is text already and i paste some text, the whole text gets repeated, so each time i do a paste the text in the editor doubles.

kevin collins said...

Thanks for the great project here!

I experience the following error with Safari and Chrome (webkit?) from line 85 of CleanPaste.js using the example html ->

"TypeError: Result of expression 'container' [null] is not an object."

---

I'll dig into it, but want to see if anyone has travelled this path already :)

Thanks again!

Cron said...

Thanks Anthony - this looks cool. Do you have or know of a standalone version that will work with other editors?

Anonymous said...

Hi, I found that in IE when you make right mouse click test in RTF is removed. In other browsers right click is disabled.

David said...

I love you. This just saved me a few days of work!

Gordon said...

For those with IE8 errors on line 85, switching to a setInterval method seemed to work for me. It looks like IE8 was taking too long to create the container div and didn't make it before the setTimeout occurred.

My code looks something like this:

containerCreatedInterval = window.setInterval(function() {
if (this.Editor._getDoc().getElementById("Cleaner")) {
window.clearInterval(containerCreatedInterval);
handlePaste();
}
}, 10);
var handlePaste = function() {
var container = this.Editor._getDoc().getElementById("Cleaner");
var sourceText = container.innerHTML;
var cleanText = cleanHTML(sourceText);
var newText = document.createElement('span');
...
}

Gordon said...

The call to execCommand('inserthtml', "<div id='Cleaner'>_</div>") seems to fail if you paste text from Word into the middle of an existing paragraph. Any idea why?

Anonymous said...

Hi Anthony,
I am facing a issue with CleanPaste utility. I have modified the cleanPaste library for removal of most of the tags but whatever the contents i am copying from word document is not shown in same line which line it should be and breaked into multiple line.

these are the tags i am using....

// Remove additional MS Word content
// html = html.replace(/<(\/)*(\\?xml:|meta|link|span|font|del|ins|st1:|[ovwxp]:)((.|\s)*?)>/gi, ''); // Unwanted tags
// html = html.replace(/(class|style|type|start)=("(.*?)"|(\w*))/gi, ''); // Unwanted sttributes
// html = html.replace(//gi, ''); // Style tags
// html = html.replace(//gi, ''); // Script tags
// html = html.replace(//gi, ''); // HTML comments
alert("HTML"+html);
html = html.replace(/<(\w[^>]*) class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
html = html.replace( /<(\w[^>]*) style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
html = html.replace( /\s*style="\s*"/gi, '' );
html = html.replace( /]*>\s* \s*<\/SPAN>/gi, '' ) ;
var re = new RegExp("(]*>.*?)(<\/P>)","gi") ;
html = html.replace( re, "" ) ;
html = html.replace( /]*><\/SPAN>/gi, '' ) ;
html = html.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
html = html.replace( /(.*?)<\/SPAN>/gi, '$1' ) ;

html = html.replace(/\s*<\/o:p>/g, "") ;
html = html.replace(/.*?<\/o:p>/g, " ") ;
html = html.replace( /\s*mso-[^:]+:[^;"]+;?/gi, "" ) ;
html = html.replace( /\*mso-[^:]+:[^;"]+;?/gi, "" ) ;
html = html.replace(//gi, '');
html = html.replace(//gi, '');
html = html.replace(//gi, '');
html = html.replace(/<(\/)*(\\?xml:|meta|link|span|font|del|ins|st1:|[ovwxp]:)((.|\s)*?)>/gi, '');
html = html.replace(/<(a){1}.*?>/i,'');
html = html.replace(//gi, ''); // Style tags
html = html.replace(/(class|style|type|start)=("(.*?)"|(\w*))/gi, ''); // Unwanted sttributes
alert("HTML after Final"+html);
//html = html.replace( /<[^<>]+>/g, ''); //remove all tags



And Word file which i am copying is having this content where the data is breaked into lines. In editor its showing in same line but when i am looking into alert box or in pdf its breaked into mulitple line.

So need your help..


Single sign on from Portal to Lotus is achieved through SAP Logon ticket, which is issued by SAP Portal and stored as browser cookie, which is accept by a lotus Domino(DSAPI Filter).


2. Implementation

The solution implements the approach which makes use of LtpaToken to SSO from EP to domino servers running on non-windows platform.

a) The DSAPI filter needs to be installed only on the Domino locator server and not
on each and every domino server in the landscape.

b) A single lotus transport needs to be created in the portal corresponding to the
Domino locator server, since locating the mail server of the user is handled
internally.

Domino Side Configuration

1. The landscape can have multiple domino servers out of which one has to be the Domino Locator server.

Anonymous said...

Hi Anthony,
I am facing one strange issue. I am pasting some text from word document to YUI Editor and i have used CleanPaste utility.In IE whatever the text i am pasting is replacing by the unformatted text but in firefox its duplicating the content which are inside the editor.

So need your help.

Regards
Kam

Anonymous said...

Thanks for your hard work with this, it is incredibly useful.

Cherry said...

cleanpaste works fine without formatting,but once the cleaned content is inserted into rte (rich text editor) if u select the text and right click your mouse, the context is getting disappeard and is replaced by a character which is mentioned in execCommand(). It would be great if some one wil help me for the same. myid(charan.cse@gmail.com)