Duplicate Content Checking
I wrote this simple utility in C# to check any given document is already published anywhere in the internet or not. This application is needed for one of my friend, he runs a information portal site, and got many freelance writers for his website. The contents these freelancers write are in wide range of topics and very difficult for him to check whether the content is copied from any other website or not. He asked me help, and I wrote this simple application using Google SOAP api.
How this works:
This application work with assumption that Google crawled and stored content of all of the web in its index. The freelance writers send the content to my friend through email, and This application split the input document into many lines of texts and query the Google search index database using the search api (it is like searching a string in Google in a automated way) and check the given text is already present in Google index or not. If it already present then it is duplicate content (published in internet by some one), otherwise it is fresh piece (some what).
How to use it
- Copy the text which you want to check in Input Text area.
- Click Start button.
- The result text area starts spitting results line by line along with the results found in Google index in parenthesis. (If the output text don’t have a parenthesis results url, which implies the text not present in Google index and that particular line of text is not duplicate).
The application can be found here, down load it, and leave a comment if it is useful.