Home » Discussion Forum

Discussion Forum—A Way with Words, a fun radio show and podcast about language

Discussion Forum (Archived)

Please consider registering
Guest
Forum Scope


Match



Forum Options



Min search length: 3 characters / Max search length: 84 characters
The forums are currently locked and only available for read only access
sp_TopicIcon
Some N-grams info
Guest
1
2016/04/23 - 4:18pm

The Google Ngram viewer has a lower-limit of n-grams that are in at least 40 books. That means your search might exist in the corpus but lies below that threshold. Other corpora may have this same 40 citation limitation. Looks like a size restriction to avoid very large dataset sizes.

BYU has an impressive number of resources which appear to augment and even surpass Google's Ngram viewer.
They detail the differences and strengths for most of theirs in links on the corpus description page.
The soap opera corpus is unexpected and interesting.

http://corpus.byu.edu/
corpora, size, queries = better resources, more insight

http://corpus.byu.edu/coca/x.asp?r1=&w=600&h=1024
CORPUS OF CONTEMPORARY AMERICAN ENGLISH
The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English.
The corpus contains more than 520 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2015 and the corpus is also updated regularly (the most recent texts are from December 2015). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language.

http://corpus.byu.edu/coha/x.asp?r1=&w=600&h=1024
CORPUS OF HISTORICAL AMERICAN ENGLISH
The Corpus of Historical American English (COHA) is the largest structured corpus of historical English.
COHA allows you to quickly and easily search more than 400 million words of text of American English from 1810 to 2009. You can see how words, phrases and grammatical constructions have increased or decreased in frequency, how words have changed meaning over time, and how stylistic changes have taken place in the language. It's a lot more than just frequency charts for individual words and phrases (like with Google Books / Culturomics) -- although those types of searches can be done here as well, and yield essentially the same results as Google Books.

http://corpus.byu.edu/bnc/x.asp?r1=&w=600&h=1024
BYU-BNC: BRITISH NATIONAL CORPUS
This website allows you to quickly and easily search the 100 million word British National Corpus (1970s-1993). The BNC was originally created by Oxford University Press in the 1980s - early 1990s, and now exists in various versions on the web.

http://corpus.byu.edu/soap/
CORPUS OF AMERICAN SOAP OPERAS 100 MILLION WORDS, 1990-2012
(for very informal language)

Forum Timezone: America/Los_Angeles
Show Stats
Administrators:
Martha Barnette
Grant Barrett
Moderators:
Grant Barrett
Top Posters:
Newest Members:
Mike Brock
Forum Stats:
Groups: 1
Forums: 1
Topics: 3647
Posts: 18912

 

Member Stats:
Guest Posters: 618
Members: 1266
Moderators: 1
Admins: 2
Most Users Ever Online: 1147
Currently Online:
Guest(s) 31
Currently Browsing this Page:
1 Guest(s)