User Guide for Skrapling

1. What is Skrapling?

Skrapling is an online easy-to-use tool for extracting various pieces of website content. Our tool is very useful for Language Service Providers (LSP’s) and translators, who need an easy way to extract content from websites quickly.

In only a few steps, you will be able to extract and export files in different formats: Word, PDF, excel, plain text etc. In addition, you can extract files in any of the languages related to the main URL.

2. Why is It Useful?

Skrapling has been developed to improve work flows and processes within the content and translation industry. The time of boring copy and pasting has come to an end, now you can simply just do a skrape!

Skrapling simplifies quoting jobs to take just a few minutes instead of hours on end. The tool is very useful for translation, language alignment and copywriting projects – any situation where the exact web content is needed in another format for editing purposes.

3. Vouchers

Before you can start skraping content, you will need to have credits in your Skrapling account, so you need to buy voucher credits on the website. The credits will appear in your Skrapling dashboard here:

 

The amount of voucher credits will be equivalent to the number of URLs you need for a specific website. If you run short on URLs with your voucher, you can upgrade by buying more credits. Simply go to the website and buy a new voucher and the credits will be added to the credits you have left in your current voucher. If you need help, you can send an email to: info@skrapling.com and we will be happy to assist you.

You can read more about our subscriptions and pricing here.

4. How to Do a Skrape

Want to get going? This is how you start skraping content in one or multiple languages:

  • Log in to your Skrapling account.
  • Go to your dashboard and select “New Job”.

  • Type in the main URL, where you want to extract content from, and press “Go”. Skrapling will scan the website languages, this may take a few minutes.

  • Now, you will see the detected language(s) on the page. In this example, the main language is English and two alternative languages have been found.
  • Select the languages you want to extract by clicking the buttons, so they turn green.

  • Click “Confirm Language Selection”.
  • A new screen will appear. You now have two options to choose from: “scan sitemap” and “start scrape” as shown below.

  •  Select “Scan sitemap” if you want to get an overview of the URL´s related to the main or root URL before skraping – you won´t use any credits on this.

    Take into consideration that not all websites have a sitemap configured. In those cases, Skrapling won’t be able to perform a Scan on the Sitemap.

    Please note that the sitemap scan may not work properly if you do not use the root url, it will also depend on how well the actual sitemap is elaborated. So if you get an error, then click on “Go back”, and choose “Start Skrape”.

    For example:

    • https://www.comunicatranslations.com is a valid URL for the sitemap scan.
    • https://www.comunicatranslations.com/es is NOT a valid URL to perform the sitemap scan.
  • Select “Start Skrape” to scrape all of the URLs related to the main URL.

  • If you do not want to skrape all of the URL’s related to the main URL, you can select which ones to include or exclude in the “Advanced” tap before clicking “start skrape”, like this:

  • Just write the word from the URL´s or sections you do not want to skrape in the “Exclude field”, eg. “blog” or “news”. Then click Start Skrape. On the top you will see an overview of your selected languages and their number of URL´s found.

  • Once the skrape is complete, it will look like this:

  • Be aware that the skraping process will continue even though your internet connection gets cut or you turn off your computer. Once you initiate the skrape it is being processed on an external server and therefore unaffectable. You can always stop the process in the actions button placed on the top right corner.
4.1 Exporting a List of URLs for Clients

Before you download the content, you might want to ask the client exactly which URLs they want to translate. For that you have the option to export a list of all the URLs on the website. This is particularly useful for LSP´s working with international clients who need their website(s) translated. Exporting a list of specific URLs means that the client will get an overview of their content and be able to choose which URLs they want translated.

In this section, we will show you how you quickly and easily can export an excel with all the URLs in one or more languages.

  • Go to your dashboard in Skrapling.
  • Choose the project and the language for which you want to get the list of URLs. In this example, a total of 847 URLs in five different languages have been found. But you can select just one of the languages if you want.

  • Click the actions tab and choose “Export URL list to Excel”. All of the language specific URLs will be exported into an excel file.

  • Once ready, click the download button. A zip-file containing an excel sheet for each language will be created.

  • This is how the excel file will look:

  • The idea is that the client can remove URLs from the list and send the excel back to you. You can then upload the excel file to Skrapling, so that only the URLs on the list will be skraped.

5. How to Export the Content

You can now download the entire content. No matter if you have uploaded the final URL list or you just want to download all URLs found during the skrape, the procedure is this:

  • Click on the “Actions” button. A list of export options will appear.
  • Select “Export content”.

  • Choose your preferred file format. Bear in mind the purpose of use (for alignment purposes, select “Word (docx)” and “one file per URL” for best results).

  • If the website contains duplicated content, this can be filtered in the exported documents, so these segments will be easier to identify when setting up projects in CAT-tools.
  • Once ready, click “Export” and another screen will appear. Click the button under the status tap to download a zip-file.
  • Apart from the duplicated text filter there is a group of filters related to Meta title, description, URL and Wordcount that allows you to hide or unhide this information before exporting.  They are located right below the duplicated text filter. 

 

The files will be extracted and put in individual language folders, as shown down below:

When you open these folders, you will see a list of documents, in this case Word files, that you can now use for your linguistic projects.

6. Understanding Error Messages

Sometimes, you will see error messages coming up during the skrape of the website(s). You can click the red button in the upper right corner to see a list of the errors:

You can also go to the “Logs” tap and get a full overview of the skraping process of the individual website languages:

Please note that Skrapling is not able to skrape all websites. Some websites are protected against crawlers, some have user access and others have too old coding, which Skrapling can´t read.

If you don´t understand the error message, you can always send us an email and we will try to help. If the error is due to a bug in Skrapling, we will contact our developer and it will most likely be fixed within a few days (depending on the severity of the bug).

7. Need Any Help? Contact Us Here

If you are having any trouble using our service, or if you have any doubts, please, do not hesitate to get in touch with us! Send us an email at info@skrapling.com or fill in our form here.

Happy Skraping!

Leave a Reply

Your email address will not be published. Required fields are marked *