E-commerce site first glance and basic vocabulary
The footprint of an e-commerce website is a strategic element to understand how it is perceived, how do people connect and what do they first understand when they reach it.
There are zillions of analytic tools, web agency products, payable apps all proposing the job of giving a vision of the reputation of a target website.
There even are free online diagnosis web sites providing list of key data summarizing all technical key points, mashing up mutliple componenets to build scoreboards, piling up numbers and showing their ability to launch computation. A normalized approach on very generic targets (one size fits all) with a acrobatic vision on grabbing as much attention as it is possible.
Our last hacking thursday night was built on thinking on the very little private elements which can alert a e-commerce site owner that his web site is not in accordance with his expectations.
Our main motto here at Cruxbase is to build layers on extremely simple, versatile components to reach a focused target.
The night goals were :
- How do (soon to be) customer get the first glances of the website when using search engine ?
- How does the first displayed page vocabulary synthesized the "spirit" of the e-commerce site ?
We did quite an exhaustive search on already working available tools which provide wide range of numbers based on multiple data flows. From backlink counts to SNS analysis, from loading speed to word couniting. You name your idea, you get a tool.
Some are based on Google components (with the insane registration and connection process…. but willing to have a Google analytics data flow makes you very humble and ready for the volontary servitude - Thanks to Etienne de la Boëtie)
Some provides a full set of open source application, others deliver a payable (little $) based on the same components.
You use what you decide to use. It is what we may call « Auberge espagnole » metaphor.
We decided to select two major components as the skeleton for the nigh hack to fulfill our targets :
- Search autocomplete (suggestions)
- Page site scraping and word density
Keywords suggestion (autocomplete)
The search engine (Google, Bing…) starts suggesting words based on the entered data, first it accelerates the process of typing (requiring only few keystrokes to attain the full name) then it gives a synthesized vision of usual typing behaviour by the community.
This step is very interesting because it provides associated words linked with the site name, from geography to global sentiment. These words managed in a smart semantic dictionary gives easily a first step tendancy (playing with re-suggestion based on alternates enhance the process)
Both Google and Bing provides a simple call which return either XML or JSON answers which can be stored, evaluated and relaunched
Just note the « hl » parameter which provide suggestion in the focused language (suggestions may be different for a french speaker)
it returns an XML formatted answer
Returns a JSON formatted string
There are lots of other suggestion engine based on search strings managed by Yahoo, baidu, rednano, altavista… But our main commitment was just to cacth what the user see when typing the company/product name. To capture the first feeling.
Displayed Page Vocabulary
After selecting the good link (based on the suggestion) the (soon to be) customer catch a first page which representant the first contact with the e-commerce site. This 1st contact may be very focus (a landing page) very specific (a product page) or global (the home page). Nevertheless there is an imperative need to catch all text, comments, title, tags and rich snippets just to synthesize the vocabulary proposed and then understand if it is fitable, appropriate.
It is simply a question of scraping the page, extracting valuable data (getting rid or not of the HTML tagging) then organize an efficient storage to filter of parasite words, check for duplicates (singular/plural) and retrieve/rebuild grouped words.
The analysing frequency and sorting key words to be followed and enhanced. All this work is done using very common PHP classes or files (as open source class.html2Text.php and class.keywordDensity.php…)
What else ?
As far as the vocubalary is extracted then the magic can start! Using back Keywords suggestion it is now possible to check the competitors, the market impacts, the associations. The real marketing search and analysis business can now starts based on these two very simple but powerful tools. Best of all, it can be fully automated with real time rule based result analysis.