Twitter message parsing


Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380

Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /membri/maips21/wp-content/plugins/wp-syntax/wp-syntax.php on line 380

In a Magnolia CMS integration, I produced user tweets looking on some attributes (tag, Twitter search, Twitter user id..). For this, I used Twitter4J, a beautiful Java API to interact with Twitter.

Twitter production was really easy, 15 minutes of simple Java & Magnolia implementation.

The critical part was to parse every Tweet text in order to split ‘@’, ‘#’ and links, to make them clickable and CSSable.

For this, I used a Freemarker function, built for the occasion, and I want to share it here, for future purpose / users.

This is the function:

 
[#function parseTwitterText text]
 
	[#assign innerText = text]
 
	[#assign twitterUserRegexp = r'@(\w+)'] 
	[#assign twitterUserReplaceStr = r'<a href="http://twitter.com/$1" target="_blank">@$1</a>' ]
	[#assign twitterHashRegexp = r'\s#(\w+)']
	[#assign twitterStartingHashRegexp = r'^#(\w+)']
	[#assign twitterHashReplaceStr = r'&nbsp;<a href="http://search.twitter.com/search?q=%23$1" target="_blank">#$1</a>']
	[#assign twitterLinksRegexp = r'((http|https|ftp)?:\/\/([-\w\.]+)+(\/[\w\/\.*]*)?(\?\S+)?(\&\S+)?(\#\S+)?)']
	[#assign twitterLinksReplaceStr = r'<a href="$1" target="_blank">$1</a>' ]
 
	[#if text?has_content]
 
		[#assign innerText = innerText?replace(twitterLinksRegexp, twitterLinksReplaceStr, 'r')]
		[#assign innerText = innerText?replace(twitterUserRegexp, twitterUserReplaceStr, 'r')]
		[#assign innerText = innerText?replace(twitterHashRegexp, twitterHashReplaceStr, 'r')]
		[#assign innerText = innerText?replace(twitterStartingHashRegexp, twitterHashReplaceStr, 'r')]
 
	[/#if]
 
	[#return innerText /]
 
[/#function]

Just few comments:

  1. Links must be replaced first: otherwise, @replacements or #replacements would be processed twice.
  2. Regexp must be written in a raw format (r’xyz’), to avoid Freemarker automatic escaping.
  3. I build regexp looking on some website example (1 :: 2)
  4. I wrote my code inspiring to Justin Shacklette article: Parsing Twitter with RegExp

This is a test FTL for testing this function:

 
[#assign testTexts = [
	"this is a simple text",
	"this is a simple text with @rule. and a point", 
	"this is a simple text with @rule and a word", 
	"this is a simple text with @rule", 
	"@rule this is a simple text with 2 @rules",
	"this is a simple text with #rule. and a point", 
	"this is a simple text with #rule and a word", 
	"this is a simple text with #rule", 
	"#rule this is a simple text with 2 #rules",
	"this is a simple text mixed rules #rule @rules",
	"this contains a url: http://www.domain.com?test=true"
	"this contains all items: http://www.domain.com, @rule and #rule, repeated: @rule and #rule", 
	"this contains all items: http://domain.com, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element/, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html?test=true, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html?test=true&test=false, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html?test=true&test=false#id, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html?test=true#id, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com/element.html#id, @rule and #rule, repeated: @rule and #rule",
	"this contains all items: http://www.domain.com#id, @rule and #rule, repeated: @rule and #rule"]]
[#list testTexts as text]
<p>${text} :: ${parseTwitterText(text)}</p>
[/#list]

In case you can not copy / paste this code, you can download the sample code here.

Latest articles

Matteo Pelucco Written by:

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *


9 × three =