Magnolia CMS green bars… why not red?

You have in page a lot of paragraphs and Magnolia CMS green bars, unless their little user interface occupation, are too huge? You can differentiate them, applying easily few CSS classes and a little of jQuery:

The first step is to apply an outher div to Magnolia Edit bar. This will make your code more flexible:

<div class="mgnlPage">[@cms.editBar /]</div>

Now, I used these following CSS rules for highlight in-page properties edit bars (like intros, article kickers..), to make them more easy to find.

/** MGNL RED BARS ********************/
.mgnlEditBar.mgnlPage .mgnlControlBar, 
.mgnlEditBar.mgnlPage .mgnlControlBarSmall{
	border-color: #FFCCCC #000 #000 #FFCCCC !important;
}
 
.mgnlEditBar.mgnlPage .mgnlControlBar td, 
.mgnlEditBar.mgnlPage .mgnlControlBarSmall td {
	background-image: none !important; 
	background-color: #B70000 !important;	
}
 
.mgnlEditBar.mgnlPage .mgnlControlBar_PUSHED td, 
.mgnlEditBar.mgnlPage .mgnlControlBarSmall_PUSHED td {
    background-color: #FF9999 !important;
    border-color: #FFCCCC #FF9999 #FF9999 #FFCCCC !important;
    border-style: solid !important;
    border-width: 1px !important;
}
 
.mgnlEditBar.mgnlPage .mgnlControlButton, 
.mgnlEditBar.mgnlPage .mgnlControlButtonSmall, 
.mgnlEditBar.mgnlPage .mgnlControlButtonTransparent, 
.mgnlEditBar.mgnlPage .mgnlControlButtonTransparentSmall {
	border-color: #FFCCCC #000 #000 #FFCCCC !important;
	background-color: #B70000 !important;
}
 
.mgnlEditBar.mgnlPage .mgnlControlButton.down, 
.mgnlEditBar.mgnlPage .mgnlControlButtonSmall.down {
	border-color: #000 #FFCCCC #FFCCCC #000 !important; 
}

And yes, you need also a little of jQuery (or similar), because Magnolia JS (control.js) has some hard coded colors inside. Add these at the end of your page:

$('.mgnlControlButtonSmall').mousedown(function() {
  $(this).addClass("down");
});
$('.mgnlControlButtonSmall').mouseup(function() {
  $(this).removeClass("down");
});
$('.mgnlControlButtonSmall').mouseleave(function() {
  $(this).removeClass("down");
});
$('.mgnlControlButton').mousedown(function() {
  $(this).addClass("down");
});
$('.mgnlControlButton').mouseup(function() {
  $(this).removeClass("down");
});
$('.mgnlControlButton').mouseleave(function() {
  $(this).removeClass("down");
});

And the game is done. Remember also to change the “moving shadow image”, it is a simple background image binded to the div with id “mgnlMoveDivShadow”:

#mgnlMoveDivShadow {
    background-image: url("../../.resources/admin-images/mgnlMoveShadow.gif") !important; 
}

Portale tributi Regione Lombardia: un colabrodo!

Per effettuare alcune verifiche, ho dovuto collegarmi al sito web della Regione Lombardia, nel portale tributi.
Era da tempo che non assistevo ad uno scempio simile, per l’ennesima volta ho verificato di persona come i soldi dei contribuenti siano stati BUTTATI al vento nella realizzazione di un servizio:

  • approssimativo
  • di scarsa qualità (relativamente alle pagine correttamente visualizzate)
  • incredibilmente fatto male (la grafica degli anni 80 probabilmente è fatta meglio)
  • bacato (nel senso informatico del termine, ovvero pieno di difetti)
  • che non risolve il problema del cittadino

Qui di seguito il resoconto della mia pessima sessione di navigazione.

Nella fase di registrazione, nel momento della conferma tramite inserimento del codice assistito (sulla fantomatica carta regionale dei servizi) ottengo un errore applicativo. Per i tecnici, è un BANALE errore di buffer overflow (configurazione!!!). Scandaloso. Ma come hanno testato la fase di registrazione?

Mah, proviamo a fare logout (che qui chiamano misteriosamente logoff):

Rinuncio alla creazione di un nuovo utente (mio suocero) e procedo ad un nuovo login, questa volta con i miei dati. Avevo infatti effettuato la registrazione qualche giorno prima, sempre per verificare alcuni dati. I miei dati di accesso mi avevano consentito di effettuare anche il pagamento di una tassa, ma questa volta la sorpresa:

Utenza non conosciuta?? Ma come?? Se ho usato il portale fino a pochi giorni fa? Va beh, ci riprovo.. niente, neanche la seconda volta (tra l’altro, notata la grafica?). Rassegnato, ed anche un pò scocciato, comincio a compilare il form di contatto (noto che da Spring si è passati ad una banale pagina JSP).

E qui le sorprese: la pagina presenta dei “bozzi” col punto di domanda (nuovamente, un banale errore di encoding, abc dell’ingegneria del software e una delle prime cose da testare quando si rilascia un’applicazione):

Ma le sorprese non finiscono qui! Guardate quando cerco di inserirei il mio comune (con quel “comodo” widget -4.0):

E’ si, i comuni, al posto che essere diminuiti dai recenti tagli alla burocrazia e alle spese pubbliche, sono addirittura raddoppiati!

Questo è purtroppo l’ennesima dimostrazione di come spesso le attività pubbliche siano assolutamente senza controllo, affidate ad incompetenti (sia nella direzione / gestione che nella realizzazione finale) e alla fine risultino assolutamente inutili.

    How to set up a clustered repository for Magnolia in 5 minutes.

    One of the big advantages of using Magnolia is the underlyng database reliability, provided by Apache Jackrabbit.

    Since Jackrabbit 1.6 version is possible to build “clustered” repository. This option, combined with the Author/Public dualism of Magnolia CMS can lead a lot of advantages:

    • save on public reflect instantly on author and viceversa
    • no need to activate content

    Let see how to configure a clustered repository between a Magnolia Author instance and a Magnolia Public instance.

    Ingredients:

    • Magnolia > 4.3
    • MySQL database
    • 5 minutes of time

    Let’s start.

    As first thing, we need to install Magnolia as usual. Assuming to have it up and running, we now stop the web server.

    Magnolia uses WEB-INF/config/default/repositories.xml for defining repositories.
    Open it and add as following:

    <!-- magnolia shared repository -->
    <repository name="magnoliaShared" provider="info.magnolia.jackrabbit.ProviderImpl" loadOnStartup="true">
            <param name="configFile" value="${magnolia.repositories.jackrabbit.shared.config}" />
            <param name="repositoryHome" value="${magnolia.repositories.home}/magnoliaShared" />
            <!-- the default node types are loaded automatically
            <param name="customNodeTypes" value="WEB-INF/config/repo-conf/nodetypes/magnolia_nodetypes.xml" />
            -->
            <param name="contextFactoryClass" value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" />
            <param name="providerURL" value="localhost" />
            <param name="bindName" value="${magnolia.webapp}Shared" />
            <workspace name="usersShared" />
        </repository>

    and add a Map node inside the RepositoryMapping:

    <map name="usersShared" repositoryName="magnoliaShared" workspaceName="usersShared" />

    We now open the magnolia.properties file, located under WEB-INF/config/default folder and add this line:

    magnolia.repositories.jackrabbit.shared.config=WEB-INF/config/repo-conf/jackrabbit-bundle-mysql-shared-search.xml

    The last part is to create a MySQL database and add the right infos on jackrabbit-bundle-mysql-shared-search.xml file. Assuming you have a running MySQL on localhost:3630 with repoShared as schema name, user and password, the config file should be:

     
    < ?xml version="1.0" encoding="UTF-8"?>
    < !DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 1.4//EN" "http://jackrabbit.apache.org/dtd/repository-1.4.dtd">
    <repository>
    	<filesystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
    		<param name="path" value="${rep.home}/repository" />
    	</filesystem>
    	<security appName="Jackrabbit">
    		<accessmanager class="org.apache.jackrabbit.core.security.SimpleAccessManager"></accessmanager>
    		<loginmodule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
    			<param name="anonymousId" value="anonymous" />
    		</loginmodule>
    	</security>
    	<workspaces rootPath="${rep.home}/workspaces"
    		defaultWorkspace="default" />
    	<workspace name="default">
    		<filesystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
    			<param name="path" value="${wsp.home}/default" />
    		</filesystem>
    		<persistencemanager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
    			<param name="driver" value="com.mysql.jdbc.Driver" />
    			<param name="url" value="jdbc:mysql://localhost:3306/userShared" />
    			<param name="schema" value="mysql" /><!-- warning, this is not the schema name, it's the db type	-->
    			<param name="user" value="userShared" />
    			<param name="password" value="userShared" />
    			<param name="schemaObjectPrefix" value="${wsp.name}_" />
    			<param name="externalBLOBs" value="false" />
    		</persistencemanager>
    		<searchindex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    			<param name="path" value="${wsp.home}/index" />
    			<param name="useCompoundFile" value="true" />
    			<param name="minMergeDocs" value="100" />
    			<param name="volatileIdleTime" value="3" />
    			<param name="maxMergeDocs" value="100000" />
    			<param name="mergeFactor" value="10" />
    			<param name="maxFieldLength" value="10000" />
    			<param name="bufferSize" value="10" />
    			<param name="cacheSize" value="1000" />
    			<param name="forceConsistencyCheck" value="false" />
    			<param name="autoRepair" value="true" />
    			<param name="analyzer"
    				value="org.apache.lucene.analysis.standard.StandardAnalyzer" />
    			<param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
    			<param name="respectDocumentOrder" value="true" />
    			<param name="resultFetchSize" value="2147483647" />
    			<param name="extractorPoolSize" value="3" />
    			<param name="extractorTimeout" value="100" />
    			<param name="extractorBackLogSize" value="100" />
    			<param name="textFilterClasses"
    				value="org.apache.jackrabbit.extractor.MsWordTextExtractor,
                   org.apache.jackrabbit.extractor.MsExcelTextExtractor,
                   org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
                   org.apache.jackrabbit.extractor.PdfTextExtractor,
                   org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
                   org.apache.jackrabbit.extractor.RTFTextExtractor,
                   org.apache.jackrabbit.extractor.HTMLTextExtractor,
                   org.apache.jackrabbit.extractor.PlainTextExtractor,
                   org.apache.jackrabbit.extractor.XMLTextExtractor" />
    		</searchindex>
    	</workspace>
    	<versioning rootPath="${rep.home}/version">
    		<filesystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
    			<param name="path" value="${rep.home}/workspaces/version" />
    		</filesystem>
    		<persistencemanager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
    			<param name="driver" value="com.mysql.jdbc.Driver" />
    			<param name="url" value="jdbc:mysql://localhost:3306/userShared" />
    			<param name="schema" value="mysql" /><!-- warning, this is not the schema name, it's the db type	-->
    			<param name="user" value="userShared" />
    			<param name="password" value="userShared" />
    			<param name="schemaObjectPrefix" value="version_" />
    			<param name="externalBLOBs" value="false" />
    		</persistencemanager>
    	</versioning>
     
    	<!-- clustering: author: node0, public1: node1, public2: node2, ... -->
    	<cluster id="node0" syncDelay="2000">
    		<journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
    			<param name="revision" value="${rep.home}/revision.log" />
    			<param name="driver" value="com.mysql.jdbc.Driver" />
    			<param name="url" value="jdbc:mysql://localhost:3306/userShared" />
    			<param name="schema" value="mysql" /><!-- warning, this is not the schema name, it's the db type	-->
    			<param name="user" value="userShared" />
    			<param name="password" value="userShared" />
    			<param name="schemaObjectPrefix" value="journal_" />
    		</journal>
    	</cluster>
    </repository>

    IMPORTANT: each cluster node MUST have a unique ID.
    In our example we use id=”node0″ for author and id=”node1″ for public.

    This should be enough! Let me know if are there any points missing!!

    LoggingFilter in Magnolia: a simple way to monitor things

    There are moments during web development where things, for any reason, start to go wrong.

    In a recent project, we started to have charset encoding issues about request header (injected by a SSO system and a reverse proxy). So, how can we test what goes through the request, or better, through the Magnolia filter chain?

    First option: enable Tomcat RequestDumper valve or filter (see here or here). It is not ok, because Magnolia has a “nested” filter chain and sometimes, things changes between a filter chain step and the other. Tomcat RequestDumper only dump request when it is caught from client, not after it has been dispatched to Magnolia filter chain. And then, it is also not “hot”, we need to restart tomcat.

    Second option: inside a Magnolia page/template/paragraph we can use ${ctx.requestParam} to dump it. But again, it is not what we want (headers are not exposed, we need to know the exact parameter name..)

    So, what can we do? Developing a custom LoggingFilter can help us.

    First, write the LoggerFilter java class.

    package com.tinext.ecp.magnolia.common.filters;
     
    import info.magnolia.cms.filters.AbstractMgnlFilter;
     
    import java.io.IOException;
    import java.util.Enumeration;
     
    import javax.servlet.FilterChain;
    import javax.servlet.ServletException;
    import javax.servlet.http.HttpServletRequest;
    import javax.servlet.http.HttpServletResponse;
     
    import org.apache.commons.lang.StringUtils;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
     
    public class LoggerFilter extends AbstractMgnlFilter {
     
    	private static final Logger log = LoggerFactory.getLogger(LoggerFilter.class);
     
    	private static final String DEBUG = "debug";
    	private static final String INFO = "info";
    	private static final String WARN = "warn";
     
    	// filter settings
    	private String logLevel = DEBUG;
    	private boolean logRequest = true;
    	private boolean logHeaders = true;
    	private boolean logParameters = true;
     
    	@Override
    	public void doFilter(final HttpServletRequest request, final HttpServletResponse response, final FilterChain chain) throws IOException, ServletException {
     
    		if (isEnabled()) {
     
    			log("**************************************** LoggerFilter ****************************************");
     
    			if (logRequest) {
    				logRequest(request);
    			}
     
    			log("**************************************** ------------ ****************************************");
    		}
     
    		// important
    		chain.doFilter(request, response);
     
    	}
     
    	private void logRequest(final HttpServletRequest request) {
    		if (null != request) {
     
    			// request parameters
    			if (logParameters) {
    				for (final Enumeration e = request.getParameterNames(); e.hasMoreElements();) {
    					final String name = e.nextElement();
    					final String value = request.getParameter(name);
    					log("Request parameter [{}] has value of [{}]", name, value);
    				}
    			}
     
    			// request headers
    			if (logHeaders) {
    				for (final Enumeration e = request.getHeaderNames(); e.hasMoreElements();) {
    					final String name = e.nextElement();
    					final String value = request.getHeader(name);
    					log.debug("Request header [{}] has value of [{}]", name, value);
    				}
    			}
     
    		} else {
    			log("Request is null");
    		}
     
    	}
     
    	private void log(final String msg) {
    		log(msg, new Object[] {});
    	}
     
    	private void log(final String msg, final Object o1, final Object o2) {
    		log(msg, new Object[] { o1, o2 });
    	}
     
    	private void log(final String msg, final Object[] args) {
    		if (StringUtils.isNotEmpty(msg)) {
    			if (StringUtils.equals(logLevel, WARN)) {
    				log.warn(msg, args);
    			} else if (StringUtils.equals(logLevel, INFO)) {
    				log.info(msg, args);
    			} else {
    				log.debug(msg, args);
    			}
    		}
    	}
     
    	public String getLogLevel() {
    		return logLevel;
    	}
     
    	public void setLogLevel(final String logLevel) {
    		this.logLevel = logLevel;
    	}
     
    	public boolean isLogRequest() {
    		return logRequest;
    	}
     
    	public void setLogRequest(final boolean logRequest) {
    		this.logRequest = logRequest;
    	}
     
    	public boolean isLogHeaders() {
    		return logHeaders;
    	}
     
    	public void setLogHeaders(final boolean logHeaders) {
    		this.logHeaders = logHeaders;
    	}
     
    	public boolean isLogParameters() {
    		return logParameters;
    	}
     
    	public void setLogParameters(final boolean logParameters) {
    		this.logParameters = logParameters;
    	}
     
    }

    Now we need to configure the filter:

    Pay attention: a wrong configuration, at this point, can seriously break (read: make unavailable!!!) your Magnolia instance.
    Best practice: copy an existing filter node from its position to a safe place (e.g.: an existing /modules/xyz). Configure it in that place and then, better after a backup, move it under /server/filters position again.

    With this approach, you can set the log level (by default it is set to debug, but you can set it to warn or info) and you can select what to log (request, headers, parameters…). Changes reflects at runtime and the bypasses node avoid to log each (a lot!!) request to Magnolia AdminCentral.

    This is the output:

    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: **************************************** LoggerFilter ****************************************
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [host] has value of [www.mysite.local:8090]
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [user-agent] has value of [Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1]
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [accept] has value of [text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [accept-language] has value of [it-it,it;q=0.8,en-us;q=0.5,en;q=0.3]
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [accept-encoding] has value of [gzip, deflate]
    2011-10-26 10:13:05,249 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [accept-charset] has value of [ISO-8859-1,utf-8;q=0.7,*;q=0.7]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [connection] has value of [keep-alive]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [referer] has value of [http://www.mysite.local:8090/magnoliaAuthor/.magnolia/installer/status]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [cookie] has value of [JSESSIONID=310FC959199EE6F3FD8494532643BD66]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [pragma] has value of [no-cache]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: Request header [cache-control] has value of [no-cache]
    2011-10-26 10:13:05,250 DEBUG om.tinext.ecp.magnolia.common.filters.LoggerFilter: **************************************** ------------ ****************************************

    Hope it helps!!!

    PS: it is similar (but maybe more simple) to what info.magnolia.debug.DumpHeadersFilter does but in this case we can have different behaviours on different environments only changing filter configurations. For instance:

    • Env A: test
      • logLevel = debug
      • logRequest = true
    • Env B: pre-prod
      • logLevel = info
      • logRequest = true
      • logHeaders = false (because we don’t need them on that env)
    • Env C: prod
      • logLevel = warn (because log4j has been set to be “silent”)
      • logRequest = true
      • logHeaders = true
      • logParameters = false (because we don’t need them on that env)

    On the other hand, DumpHeadersFilter class let you to print even request headers after than the filter chain has ended (for that reason, I will update my LoggerFilter!!!)

    Rotating Messages in Magnolia (with text placeholders)

    In a project, an external resource repository is not available from 00:05 AM to 02:30 AM everyday. You need to inform your users about this. How can you do it?

    Simple way: using a static disclaimer text

    More complex way: a scheduler with a variable.. and so on..

    Building a new Magnolia component: Rotating Messages!

    Which should be the requirements (in this case is the project scope that drive them..)?

    1. i18n  based (in a standard Magnolia way)
    2. flexible
    3. configurable (best should be at runtime!!)
    4. extendible
    5. usable in an existing running environment (plug-and-play, transparent!)

    So, proceed!

    First, we need configurable mechanism. A Magnolia Module has already it, use it!

    With this setup, we can change the rotation policy (because SimpleRotatingMessage is an implementation of RotatingMessage interface) and we are free to develop RotatingMessages implementing the ScheduledMessage interface.

    For java people, the RotatingMessage simply declare a method public String getMessage(). The implementating class SimpleRotatingMessages uses the interface ScheduledMessage with a simple method public boolean isValid(). The first valid message found in config is considered.

    Now, we want as less impact as we can over existing dialogs / paragraphs set up.

    So, we need a new paragraph:

    Notice that we use a new modelClass RotatingTextImageModel extending the STK TextImageModel… see next in ftl why…

    At this point we need to define a dialog:

    Using inheritance, we use an existing dialog, based upon stkTextImage, and we provide only a new selection field, rotatingMessage. This automatically load its values from the configuration point said before. You change configs and fields options are up-to-date.

    Now, we need a simple change in ftl:

    Please notice the model.getParsedString(stringToParse) method. It is the only difference between the original STK TextImageModel and our new RotatingTextImageModel.

    And now use it:

    with this result:

    If we change the configs, the final output is:

    And that’s it.

    Drawbacks:

    1. Content is stored in locale.properties files. The project case requires this, but in other environments / projects you may want your users to edit text. It can be easily made with a group-paragraph pattern where you define each values with their validity periods. You loose flexibility and extensibility, but you gain in user friendlyness
    2. You have to change the ftl. This is because original stk templates uses direct content access (e.g.: ${content.text}) instead of model class invocation.
    3. Need to exclude from cache or to configure browser cache policy and cache policy to force flush when message should be changed. Thanks Jan!
    4. Any other? Please tell me!

    Basic i18n in Magnolia: quick tutorial

    I18n in Magnolia is quite simple. This tutorial will show how to manage multilanguage sites using Magnolia CMS (from v4.3.1).

    Magnolia supports internationalization (i18n) for Dialogs, Paragraphs and Templates. I18n is performed not only for content but also for application labels, like a form “send” button.
    To enable i18n for content, simply edit the Paragraph / Template Dialog and add the property “i18n=true” to the fields you want to be internationalized:

    Developers do not have to manage anything inside the template files, because i18n is handled by Magnolia in a transparent way. So just write “${content.errorTitle}” and this will output the current language content.
    For labels, just enable i18n adding the property “i18nBasename” to a Paragraph / Template definition:

    At this point, use this code fragment inside the template file to render a label: ${i18n[‘place.here.your.label.key’]}
    Magnolia will look inside the Java classpath, looking for a file called messages_xy.properties (where xy is the current language) placed into the package /info/magnolia/module/form .
    That file should be something like this:

    Magnolia will look inside this file to find a matching key to output.
    I18n settings are configured in the Site definition:

    This configuration says that for site ecp-sil the available locales are French, German and English. The default locale is French and it is shown when users visit http://mysite.com. If they visit http://mysite.com/de/ or http://mysite.com/en/ they will see the localized version of the site. If no content has been produced for en or de version, it would be used the one defined by fallbackLocale property. The same is for labels.

    VirtualBox: cloning a virtual machine

    You have a running VirtualBox virtual machine and you want to clone it to avoid that some dirty test break it.
    You need to clone the hard disk (.vdi) and then you can create a new virtual machine attaching the just cloned hd when asked.

    There is a command line tool (VBoxManage) which let you clone an hard disk:

    C:\Users\matteo>"c:\Program Files\Oracle\VirtualBox\VBoxManage.exe" clonevdi "Windows XP.vdi" "Windows XP test.vdi"
    0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

    That’s it. Simple. Quick.

    Web developer SEO quick reference / cheatsheet

    As a web developer, SEO (Search Engine Optimization) is a MUST.
    Reading the web and searching for the latest SEO guidelines, I summarized them in a document and I thought to write a blog post.

    Introduction: KeyPhrases
    KeyPhrases production steps

    1. Produce your content, following Web Content best practices and guidelines
    2. Fragment your content into pages. The main rule is: one topic per page.
    3. For each page produce a Keyword Phrase [KeyPhrase]. 1 page should be translated in 1 KeyPhrase and optimized towards it.
    4. The KeyPhrase should appear in the first and last sentence of the content produced for that page
    5. Prefer Long Tail Keywords [LTK*]
    6. Check (and keep checked over time) Keyword Saturation Rate between 3% and 7%:
      [KSR = K repetition in page / n. of words in K / n. of words in page]

    LTK*: acronyms for Long Tail Keyword.
    A LTK is a KeyPhrase with more than 3 words which aims to specify the better subject of the KeyPhrase. E.g.: having a KeyPhrase like “Italian hotels”, a LTK can be “Italian cheap hotels Livorno”. A LTK reduce your competitors and increase the conversion rate.

    On-Page SEO Best Practices

    You should:

    1. Semantic markup usage and good front end best practices

    Heading

    • <title>: is the SEO most important tag. It will be used in Search Engine Result Pages (SERPs) as page title. Put your KeyPhrase in it, following this best practice: “KeyPhrase – Category – Website Title”. Anyway, use the KeyPhrase as first as you can. It must be less than 65/70 characters.
    • <meta name=”description”>: it is used by Search Engines to provide a short description to SERPs. It should contain again the KeyPhrase (it will be rendered in bold). It must be less than 155 characters.
    • <meta name=”keywords”>: this meta is not “sufficient”, but it is still “necessary” and can help the indexing. Place inside the main KeyPhrase and other content-related keywords.
    • JS / CSS resources: do not use inline stylesheets / scripts. Use external files and try to compress them in order to reduce the overall page size. Use CSS sprites to avoid massive HTTP requests

    Body

    • Headings: use <h1>, <h2> and <h3> tags where it makes sense, only one triple per page. Put your KeyPhrase inside <h1>
    • Images: <img> tag requires alt and title. Even longdesc can be useful in order to increase the image indexing rate. Include at least one image per page and assign alt text with the KeyPhrase.
    • Links: <a> tag must be provided with a descriptive Anchor Text (the inner tag text) and title attribute. Put the target KeyPhrase inside the Anchor Text and the title attribute. Use rel=”no-follow” for external links (see robots indications for extra values). Pay attention at automatic navigation (menus). Try to overwrite the “defaults”
    • Avoid unnecessary code: use less classes as you can, avoid inline CSS / JS, old-school JS calls (onclick, onmouseover..)

    Content:

    • Use <b> (<strong> is better) and <em> around in-content KeyPhrase.

    Validate your markup against:

    • W3C’s validation tool (http://validator.w3.org)
    • Use a link validator / checker tool to locate broken links, Try to prevent broken links in serverside

    2. Use Friendly URL

    3. Avoid Client Side and Server Side errors

    • 4xx and 5xx family
    • Scripting errors (wrong included files, missing files, runtime scripting engine errors…)

    4. Create a custom 404 that re-orients the user

    5. Respect indexing limits:

    • Title length: no more than 70 characters
    • Meta description: no more than 155 characters
    • Page size: no more than 150 kB (without images, css and other attachments)
    • Gzip when possible! Compress / minify CSS / JS trying to obtain a good balance between performance and bandwith)
    • Page length: between 250 and 1000 words.
    • Number of links: no more than 100 unique links
    • Parameters in URL: no more than 2.
    • URL length: no more than 4 level of depth. User url-rewriting / virtual uri techniques if you need more depth

    6. Progressive enhancement

    • Animations and menu: provide a basic layout, try it disabling JS. All the contents / links must be accessible. Use “onload / ondomready” JS events to add a behavior layer on top of the basic structure.
    • Flash Replacement / Flash detection: display HTML for users without Flash (spiders!).
    • <script>: use the <noscript> tag. Useful for AJAX loading, provide a link to the AJAX loaded text.
    • Make sure your site is navigable without JavaScript: sitemap.xml is not enough.
    • Create text transcriptions of video / audio and provide it when Flash player is not available

    7. Provide spider meta informations:

    • robots.txt
    • sitemap.xml
      • loc, lastmod, changefreq, priority
      • notify all major search engines that support the format of its location
    • meta robots for each page
      • noindex, nofollow, noarchive [noodp, noydir, nosnippet]
    • link info: <a rel=”nofollow”>Anchor Text</a>

    8. If your site is a blog, ping all major blog indexing services when your new content is published

    9. Viral marketing over social networks

    • There are a lot of tool to share content over a lot of network (like AddThis / ShareThis).
    • Use proprietary marketing factors (e.g.: Facebook og-meta) if you want to heavy target a particular viral marketing network

    10. Take care of domain:

    • Pay attention using domain aliases and subdomains to access the same page. You loose (split) authority. Prefer the canonical form: www.domain.com and redirect (301) others to that.
    • Choose a domain name that contains your KeyPhrase
    • If you move your site, please provide a custom sitemap.xml to instruct spiders of this movement
    • Defend your authority
      • Respect the oldies: older domain has more authority than younger.
      • Google bombing and other viral attaks

    11. Provide the right HTTP status code:

    • 200: ok à “hey, the page is still there“
    • 301: moved permanently à “hey, the page has moved permanently to this location, reindex it and update your reference link” (remember to provide the new url)
    • 302: moved temporarily à”hey, the page has been temporarily moved to this url but probably it can be moved back late” (remember to provide the temp moved url!)
    • 404: ko, missing page à “Sorry, the page is not available here and anywhere. But it can be back late. Can you come back?”.
    • 410: ko, requested page no longer available à “Sorry, forget about this page. It has been deleted”.
    • 503: Service unavailable à “Ooops! I can not help you.. please come back again..” (but patience has limits..)

    You shouldn’t:

    1. Stop spidering

    • Content behind input forms
    • Session ID’s in URLs
    • Pages restricted by cookies
    • Frames
    • Logins
    • Links or Content in images or Flash. Try to use some image replacement techniques
    • Splash pages: home page is probably your website higher ranking page!
    • Ajax Content: content is king, and need to be present on your page.

    2. Non-semantic markup usage

    • Avoid <table> for layout purpose: a table is indexed as a content, not as a layout element.
    • Avoid inline CSS / JS
    • Avoid HTML structure overhead (too many tags, attributes, classes): be concise!

    3. Non textual content (Images, Flash, Ajax):

    • A spider read text. You can simulate a spider using a textual browser, like Lynx.
    • Provide textual content for non-textual media (images, video, audio, flash..)

    4. Spam search engine

    • Keyword Stuffing: it happens when you repeat too much the same keywords into the page / site
    • Use everytime the same anchor text for backlinks (almost for internal ones)
    • Manipulative Linking:
      • Adopt massive link exchange campaign
      • Incestuous or self-referential links (link networks)
      • Paid links
      • Blog / Forum / Directory spamming
    • Black Hat practices:
      • Keywords not relevant to content
      • Search Engine Cloaking: intercepting spiders (User-Agent detection) providing different content is a bad idea, but you can still do it for different media website (e.g.: website mobile version).
    • Low value pages
      • Copied content
      • Dynamically variated content

    5. Use the same (or similar) title on every page: <title> should be dynamic and content-dependant

    Off-Page SEO Best Practices

    1.    Produce good quality content

    • Make a good content
      • One topic per page
      • Include content on your site that serves your audience’s needs and interests.
      • Variate your content type (images, videos, text..)
      • Create home page or section page that effectively funnels your traffic into the content-rich areas of your site
      • Produce KeyPhrases that respect your content (do not do the reverse!)
      • Place KeyPhrase in strategic page locations
      • Include content on your site that serves your audience’s needs and interests.
      • Don’t copy
      • Defend your content against copiers
    • Make it accessible
      • Make sure that each page is accessible by at least one static text link
    • Make it sharable (backlinks!)
    • Make it indexable (text!)

    2.    Do competitive research and try to obtain the same (as much as you can) backlinks of your competitors.

    3.    Enforce press (offline / online) releases, you will gain authority

    4.    Backlink to your social world: when you post somewhere (blogs, guestbooks, forums, social networks..) about your site, remember to post a backlink. Feel authorized to do it!

    5.    Invite people to post on your blog, if you have one

    6.    Online and Offline marketing / promotion

    • Directories
    • Campaigns
    • Link exchange (do not spam search engine!)
    • Partnerships

    SEO Sources, SEO tools, SEO related sites:

    1.    Top SEO resources

    • SEOBook.com
    • SearchEngineLand.com
    • SEOmoz.org
    • SEOTrainingDojo.com

    2.    Search Engine Webmaster Tools:

    3.    Search Engine Commands: http://www.searchcommands.com/

    4.    Rank Checkers:

    5.    Other tools

    Top SEO api resources and tools

    Developing a project for my team, I investigate few days over the web trying to find the best, suitable, price-affording SEO (Search Engine Optimization) API package in order to develop a customer-based SEO tool.

    My top requirements over statistics data, given a page URL, was:

    • page rank
    • link analysis
    • content analysis
    • competitor analysis
    • keyword analysis

    Outside the SEO battle is very confusing: different opinions, trends, ideas, guidelines.. Anyway, this is what I found.

    1. Web SEO Analytics (http://www.webseoanalytics.com)
    pro: the most complete among this list, good price, it relies on different search engines
    cons: daily limits on requests / reports

    2. Alexa Web Information Service (http://aws.amazon.com/awis)
    pro: enough complete, request based pricing (you pay what you really use)
    cons: Alexa is the data owner

    3. SEOmoz Site Intelligence API (http://www.seomoz.org/api)
    pro: good API, free version available, good documentation
    cons: major / exclusive focus on link analysis concepts

    4. OpenCalais API (http://www.opencalais.com)
    pro: stable and robust api, great keyword suggestion tools
    cons: focused only on content / keyword analysis

    5. Wordsfinder API (http://www.wordsfinder.com)
    pro: keyword extractor and traffic extimator API
    cons: not free, few data

    6. AlchemyAPI (http://www.alchemyapi.com/api)
    pro: good API, free version available, good documentation
    cons: major / exclusive focus on keyword analysis and content analysis concepts

    7. Majestic SEO Enterprise API (https://www.majesticseo.com/support/subscriptions/enterprise-api)
    pro: based on a good web-based gui service
    cons: few data and statistics, basically focused on link analysis

    8. Compete API (http://developer.compete.com)
    pro: it provides graphical icons indicators to statistics
    cons: exclusive focus on domain analysis

    9. Ginzametrics API (http://ginzametrics.com/api.html)
    pro: ranking and keyword analysis, good interface, http based, free of charge
    cons: in beta version at the moment (2011-03-04)

    I don’t want to consider search engine proprietary API for three main reasons:

    1. They relies only on that search engine data
    2. They are in constant development
    3. I should work more on my tool aggregating more (one for each search engine) API packages instead of few

    Another big reason is that search engine are going to shut down / deprecate the free search API (for obvious reasons!).

    MySQL dump, recovery, restore…

    In a recovery procedure for a productive environment I faced a MySQL subtask. My need was to restore a productive DB from a backup virtual machine.
    So, I had 2 MySQL db server:

    • DB Live (ip: 192.168.0.1, schema: webapp)
    • DB Recovery (ip: 192.168.0.2, schema: webapp)

    The first step was to delete all the tables from the productive schema. MySQL does not provide any command like “DROP ALL TABLES” but I found a good Unix script:

    mysql --host='192.168.0.1' --user=root --password=pass -BNe "show tables" webapp | awk '{print "drop table " $1 ";"}' | mysql --host='192.168.0.1' --user=root --password=pass webapp

    The second step was to bring all recovery data into live production server. For different reasons, I had to do this operation in one single step, without having any dump .sql file to export / restore. Again, a good unix command help me:

    mysqldump --host='192.168.0.2' --user=root --password=pass --single-transaction --flush-logs --hex-blob --max_allowed_packet=512M  webapp | mysql --host='192.168.0.1' --user=root --password=pass webapp

    And the game is done.