JSTOR - Data for Research (DfR) -
|A self-service system for text mining. Provides a self-service system for text mining. By creating a free DfR account you can download the metadata, word frequencies, citations, key terms, and N-grams of up to 1,000 documents. To get larger datasets or a type of data not available through the main site, you have to contact JSTOR directly: email@example.com (Guide for using DfR)|
|SpringerLink - free (through library subscription)||You can download subscribed and open access content for TDM purposes directly from the SpringerLink platform.
Full-text content can be accessed via friendly URLs: PDF: http://link.springer.com/[DOI].pdf OR HTML(when available): http://link.springer.com/[DOI].html
Content can be downloaded via a web browser or with an HTTP GET request using a scripting tool. Researchers are requested to be considerate and limit their downloading speed to a reasonable rate.
PLEASE read their instructions for more detailed information HERE
More information about Springer's Nature API Portal HERE or BioMed Central HERE
|National Center for Biotechnology Information||Multiple collections of articles/abstracts from the National Library of Medicine. You can click "Tools" or "Web APIs" on the top menu to find out more information about accessing the data here|
|PLOS Search API (Public Library of Science)||Gives developers access to rich data that can be flexibly integrated into applications for the web, desktop or mobile devices.|
A partnership of academic and research institutions, offering a collection of millions of titles digitized from libraries around the Access HathiTrust Datasets and its Research Center for information about text and data mining.
Also, a new HTRC Derived dataset (2.0) is available with documentation here. HTRC Extracted Features 2.0 is the most current version of a derived dataset consisting of metadata and data elements extracted from volumes in the HathiTrust Digital Library. The dataset is composed of 17+ million JSON files representing a snapshot of the HathiTrust corpus from February 2020.
|Digital Public Library of America (DPLA)||Access to DPLA API Codex, Bulk Download, Technical Documentation, and Sample Code and Libraries.|
|Internet Archive||Instructions for developers and those interested in bulk download and API access.|
|Chronicling America (API)||Provides access to information about historic newspapers and select digitized newspaper pages. Search America's historic newspaper pages from 1789-1963 or use the U.S. Newspaper Directory to find information about American newspapers published between 1690-present.|
|Gale Eighteenth Century Collection Online (ECCO)||Available at the Text Creation Partnership (ECCO-TCP). Online searching at the partnership is free, but text-mining the collection requires a fee to access "raw" data.|
|Early English Books Online (EEBO) - free||The Text Creation Partnership creates standardized, accurate XML/GSML encoded electronic text editions of early print books. Phase I is freely available (25,000 titles).|
|Oxford English Dictionary and Oxford University Press (Oxford Scholarship Online) - free||Oxford accommodates TDM for non-commercial use. Researchers are not required to request permission for non-commercial text-mining, If you have any questions please e-mail Data.Mining@oup.com|
|LexisNexis Academic (now Nexis UNI) - free/fee-based||Not specifically available for text mining, but since text files can be downloaded many articles at a time, mining is possible. You can contact LexisNexis to inquire about using/purchasing their "Data as a Service" for larger datasets. Here is also a link for LexisNexis Bulk Content API mining personal consultation service.|
|CAP API (from Harvard Law Library) - free||The Caselaw Access Project API, also known as CAPAPI, serves all official US court cases published in books from 1658 to 2018. The collection includes over six million cases scanned from the Harvard Law Library shelves.|
|Adam Matthew Databases - free/fee-based||Source of unique primary digital collections that are readily available for mining. Researchers may contact him directly at firstname.lastname@example.org to discuss data mining requests/projects. More information here.|
|Govinfo Bulk Data Repository||
Bills, regulations, rules, and papers of the presidents of the U.S.
|Royal Society of Chemistry (RSC)||Researcher should send following information to email@example.com one month before activity:
Date to start, Completion date, Institution, Crawler IP address, Crawler user agent,Types of content (HTML / PDF), Institution contact email, and Researcher contact email.
|Wiley databases||Instructions can be found at the linked website.|