org.xenbase.scraper
Class Scraper_JCellBio
java.lang.Object
org.xenbase.scraper.BasicScraper
org.xenbase.scraper.Scraper_JCellBio
public class Scraper_JCellBio
- extends BasicScraper
Method Summary |
java.lang.String |
getRedirURL(java.lang.String url)
Because we are using URLs from pubmed and because each journal
publisher's website is different, we need to go through a series of HTTP
301 redirects, then search the resulting page to find the URL of the full
article. |
ScrapedData |
scrape(java.lang.String url)
This is the actual function that takes the URL (produced by getRedirURL)
and returns the images and captions of that article. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Scraper_JCellBio
public Scraper_JCellBio()
getRedirURL
public java.lang.String getRedirURL(java.lang.String url)
throws java.lang.Exception,
java.lang.Error
- Description copied from class:
BasicScraper
- Because we are using URLs from pubmed and because each journal
publisher's website is different, we need to go through a series of HTTP
301 redirects, then search the resulting page to find the URL of the full
article. Because each publisher website is different, this function
needs to be unique for each journal publisher website.
- Specified by:
getRedirURL
in class BasicScraper
- Parameters:
url
- - URL to full article from PubMed
- Returns:
- String - Containing actual URL of full journal article
- Throws:
java.lang.Exception
java.lang.Error
scrape
public ScrapedData scrape(java.lang.String url)
throws java.lang.Exception,
java.lang.Error
- Description copied from class:
BasicScraper
- This is the actual function that takes the URL (produced by getRedirURL)
and returns the images and captions of that article. This is the core of the
scraper, and obviously each webpage is different, and so different string
parsing is done for different journals.
- Specified by:
scrape
in class BasicScraper
- Returns:
- ScrapedData - The Object containing all the images and captions
- Throws:
java.lang.Exception
java.lang.Error