Spiders: Web crawling is a large part of how content scrapers work. A spider like Googlebot will start by crawling a single webpage, and go from link to link to download web pages.
Shell Scripts: You can use the Linux Shell to create content scrapers with scripts like GNUs Wget to download content.
HTML Scrapers: These are similar to shell scripts. This type of scraper is very common. It works by obtaining the HTML structure of a website to find data.
Screenscrapers: A screen scraper is any program that captures data from a website by replicating the behavior of a human user that is using a computer to browse the internet.
Human Copy: This is where a person manually copies content from your website. If you’ve ever published online, you may have noticed that plagiarism is rampant. After the initial flattery goes away, the reality that someone is profiting off your work sets in.