Software Blog  RSS News Feed What's This? Select a topic from the list below. Topics are ordered by date with the eight most recent at the top. Show All Articles July 19th 2010 iPhone App Development We are moving on from AJAX applications to the development of iPhone apps (and apps for other smartphone platforms). There are many challenges in making this move, some of which are described here. July 18th 2009 Facebook Security Issues Facebook has serious problems with its attitude to privacy, so much so that they appear to be in breach of Canadian privacy laws. Any action by the Canadian government will however be limited to just a small part of the wider problem. Facebook has to go much, much further. This article makes some suggestions on how I think they should clean up their act. July 5th 2009 Multi-Language Support Translations of complete web pages have come a long way from the early days of Babelfish, although they still leave a lot to be desired. But what if you only need to translate individual phrases? Should you construct the phrase in English and then use Google Translate or do we need to do something more sophisticated? November 4th 2007 Working From Home Working from home is becoming easier and easier as computers, the internet and mobile phones become more sophisticated. Why then don't more people (and more companies) embrace it? July 8th 2007 Using AJAX In Practice AJAX is a powerful programming tool that is without doubt the future of the internet. However, until the search engine and browser developers catch up with the new reality, there are a few gotcha's you need to be aware of. April 22nd 2007 Web Design Trends Web pages are moving away from fancy graphics to dynamic interactive sites with added value for users. To develop such pages, web developers need to be able to program in a variety of different languages. If you can't program, there soon won't be a place for you in the web development industry. March 19th 2007 More On Mobile Devices Mobile devices become smaller and more multi-functional with fewer buttons and bigger, touch sensitive screens. Those of us with limited short range vision should be getting worried by these developments. Perhaps the science of haptics wil be our saviour. February 25th 2007 Censorship On The Internet Some censorship of the internet is necessary. I fully support the removal of child pornography for example. But insiduous censorship by special interests is slowly creeping over the internet and eroding free speech. These special interests include web site owners who censor forum posts and business owners who deliberately remove competitors ads from public listings. |
| |
| |
| |
|
| |
|
|

|
BASED ON COPYRIGHT |
| |
Search engine algorithms rely far too much on incoming links and other fairly easily corruptible measures to determine site ranking. Any algorithm changes made by the search engines are soon picked up on by SEO (search engine optimization) experts who then adjust their websites to retain top ranking. Whilst this is good business for the SEO industry; simpler, better algorithms would improve things for users.
Search engines are slowly moving away from arcane measures such as meta tags towards a situation where content is king. Whilst this is undoubtedly a good thing, it is just as easy to corrupt as the current measures. So what is needed is a search engine algorithm that relies exclusively on content and which is incorruptible. That’s easy to say of course, but not so easy to develop and implement. Many times when you do a search you tend to get several results that display exactly the same content on half a dozen websites. Most of the time, the original author of that content is not given recognition. This is partly because it is often impossible to determine who the original author was.
The key therefore is to tighten up on copyright. If you develop original material and publish it then you own the copyright to that material. The principle of copyright is well established and most countries in the world have laws to protect it. As with any law, it is very difficult to pursue copyright thieves over international boundaries because laws are restricted to the country that implemented them. So what is needed is an international way of registering, protecting and enforcing copyright. Again, that’s easy to say but not so easy to implement.
Much original material is already available on the internet. Google’s self imposed task of indexing all the worlds books provides an invaluable store of original material against which to compare the new for copyright infringements. The new mechanism that is required is a way of registering and validating the copyright on new material before it is published. This isn’t as hard as it sounds, although the amount of computer hardware required shouldn’t be underestimated. Before an article is published on the internet or in printed form it should be submitted to the copyright database. This establishes the time of original registration which is the key to copyright. The copyright database then compares the new article with previous material and establishes originality. Algorithms already exist which can detect plagiarism so this is not a particularly difficult task, although it may take some time.
The new search engine algorithm proposed here then uses the copyright database to index new material and establish the original author as having a higher ranking than any copyright thieves. Any web site that refers to or uses someone else’s original material then has two choices. If they provide some form of standard attribution to the original then they are indexed but at a lower ranking than the original. If they don’t provide attribution, then they can be simply dropped from the search engine altogether. This effectively enforces copyright on the internet as anyone that refuses to acknowledge it simply doesn’t appear in the results. If they’re not appearing in the results, it makes the theft pointless.
One beneficial side effect of this algorithmic change is that all those millions of (usually useless) directory websites disappear from the top rankings. Link farms, made for Adsense sites and other internet trash would also disappear.
The copyright database must be independent of any search engine provider and must provide checks and balances to allow disputed copyright to be challenged and resolved. Everyone must have access to the database so they can check the status of their original material and also to test whether any new material they are considering would fall foul of any previous writings. Each language should have its own database with links between them to detect unauthorized translations. The maintenance of each language’s database should be the responsibility of the country or countries that use that language. This will be a significant problem for the English speaking world of course, but less so for other languages.
Will such an approach become a reality? Will someone step up to the plate and implement this idea or something similar? If it does come to pass, just remember that you heard it here first, unless of course I ripped off the idea from someone else.
| | | | | 
| Comment by David Sawers on February 25th 2007 | | | Google has just announced the use of a copyright database to detect pirated videos posted on YouTube so they can be removed. This is an essential step in cleaning up YouTube and perhaps presages the wider use of such copyright detecting technology. | | | | | 
| Comment by Kjell Bleivik on March 18th 2007 | | | "Search engine algorithms rely far too much on incoming links and other fairly easily corruptible measures to determine site ranking".
I fully agree. SEO is a large and growing industry where socalled SEO "experts" are experts in getting IPL's to the clients sites. If a company indicate some of the job for the SEO "expert", submitting etc. is nearly always quoted as an important part of the job.
"Any algorithm changes made by the search engines are soon picked up on by SEO (search engine optimization) experts who then adjust their websites to retain top ranking. Whilst this is good business for the SEO industry; simpler, better algorithms would improve things for users".
Som of the best "experts" are ahead of the SE algorithms.
"So what is needed is a search engine algorithm that relies exclusively on content and which is incorruptible. That’s easy to say of course, but not so easy to develop and implement".
Yes easy to say be difficult to implement. That is why portal's and directories developed by professionals will be more and more important. University sites have site search functionality.
"Many times when you do a search you tend to get several results that display exactly the same content on half a dozen websites. Most of the time, the original author of that content is not given recognition. This is partly because it is often impossible to determine who the original author was".
That is a big problem. That is why XML driven sites will be more important in my view. Meta data, "data about data" is a very important part of a good XML document. One way to do this is as proposed by Thomas Myer in his excellent book: "No Nonsense XML Web Development With PHP".
He proposes the following structure:
<article ide="MyIdNo1"> <author> My name </author> <headline>How to ... </headline> <description>This article ... </description> <pubdate>2007-03-18</pubdate> <copyright>My company</copyright> <keywords> ... </keywords> <body><![CDATA[ ... ]]></body> </article>
I have defined my own aspects of web 2, see my forum ForumNorway dot com. XML driven sites will be an important part of Web 2. XML documents can be styled i different ways. The last version of the Opera browser already translates XML documents to plain text. Your task is to style them.
"The key therefore is to tighten up on copyright. If you develop original material and publish it then you own the copyright to that material. The principle of copyright is well established and most countries in the world have laws to protect it. As with any law, it is very difficult to pursue copyright thieves over international boundaries because laws are restricted to the country that implemented them. So what is needed is an international way of registering, protecting and enforcing copyright. Again, that’s easy to say but not so easy to implement".
Yes, that is the difficult and important task, but will this be a new, more formal Wiki, or will it be a new content portal? The problem is to make it globally acceptable, like DMOZ, but without the DMOZ bias to English speaking countries and its roots in USA.
"The new mechanism that is required is a way of registering and validating the copyright on new material before it is published. This isn’t as hard as it sounds, although the amount of computer hardware required shouldn’t be underestimated".
Computer hardware and not least manhours to assure the quality of the content. Should this be paid or free work? Who will do it for free?
"Before an article is published on the internet or in printed form it should be submitted to the copyright database. This establishes the time of original registration which is the key to copyright".
Do not underestimate the time involved, compare the time it take before you submit at site to DMOZ until it is included. An article may be history before it is published.
"The copyright database then compares the new article with previous material and establishes originality".
The SE's should be better on this. It should be possible for them to identify the original and a copy. I did note the following. I published an post on WPW. A month later, I found an article with the exact KW string, on a members site, published later, referring to my post telling that this was nothing new. The SE's should have an internal database of content "stealers" and "copyists."
"Algorithms already exist which can detect plagiarism so this is not a particularly difficult task, although it may take some time".
Personally I am not impressed by the SE's to detect this.
"The new search engine algorithm proposed here then uses the copyright database to index new material and establish the original author as having a higher ranking than any copyright thieves. Any web site that refers to or uses someone else’s original material then has two choices. If they provide some form of standard attribution to the original then they are indexed but at a lower ranking than the original. If they don’t provide attribution, then they can be simply dropped from the search engine altogether. This effectively enforces copyright on the internet as anyone that refuses to acknowledge it simply doesn’t appear in the results. If they’re not appearing in the results, it makes the theft pointless".
Excellent points, but may be difficult to implement.
"One beneficial side effect of this algorithmic change is that all those millions of (usually useless) directory websites disappear from the top rankings. Link farms, made for Adsense sites and other internet trash would also disappear".
There are some good ones that I would not be without, like business.com and similar.
"The copyright database must be independent of any search engine provider and must provide checks and balances to allow disputed copyright to be challenged and resolved. Everyone must have access to the database so they can check the status of their original material and also to test whether any new material they are considering would fall foul of any previous writings. Each language should have its own database with links between them to detect unauthorized translations. The maintenance of each language’s database should be the responsibility of the country or countries that use that language. This will be a significant problem for the English speaking world of course, but less so for other languages".
Independent in a world where money speaks is not easy.
The English speaking world is dominating the www more and more, but I think I am lucky that read Danish, Norwegian and Swedish without problems.
"Will such an approach become a reality? Will someone step up to the plate and implement this idea or something similar? If it does come to pass, just remember that you heard it here first, unless of course I ripped off the idea from someone else".
I think there is already similar content management systems on the internet. Thomas Myer describes how to develop one in the above mentioned book. There is already an Australian SE, FactBites. What about Oracle Secure Enterprise search and grid computing? If implemented on a large scale, that may be a new SE with better content.
The task for a SE is to identify and index content globally at electronic speed. We can only hope that it will be better at detecting spam, scam, plagiat etc.
Kjell Bleivik 2007-03-18 | | | | | 
| Comment by David Sawers on March 18th 2007 | | | Thank you for your valuable comments Kjell.
A couple of clarifications on my original article are in order.
When I state "Before an article is published on the internet or in printed form it should be submitted to the copyright database." I didn't then mean that the publisher has to wait for verification of copyright before publishing the material. As long as the time of copyright submission is registered a few seconds before actual publication then processing of that information can be done later, when the search engine reads the material and compares it with information in the copyright database.
You ask how the copyright database should be administered. "Should this be paid or free work? Who will do it for free?" I think the search engine companies should do the bulk of the work in order to provide better search engine results. I would also not underestimate the knowledge and willingness of internet users to freely provide some of their time to validate results. Wikipedia grew rapidly because of such input. But they also quickly discovered the problem of vested interests corrupting and censoring articles. | | | | | 
| Comment by Kjell Bleivik on March 18th 2007 | | | Great clarification. | | | | | 
| Comment by DV on May 11th 2007 | | | Thanks Kjell for pointing me to this article. Please look what got brainstormed from this idea, especially the tip about levels of content authority (my eZineArticles post)
Please feel free to comment on what you've just read by adding a note in the box below. Your name will be posted alongside your comment but your e-mail address is only for my records and will not be made public or sold or given to any third party. If you choose not to give an e-mail address, that's fine but your credibility is increased in my eyes if you are prepared to stand by your comments. Please do not be abusive, use strong language or post spam or other junk. Due to persistent abuse by spammers, all comments will be moderated before they are published. Therefore, your comment will not appear immediately. By commenting on this form, you agree to permit Activeminds Software Ltd. to publish your comments on this website. Activeminds Software Ltd. accepts no responsibility for any comments posted on this site. They are solely the view of the commentator.
|
| |
|
|
|