Resources



Public Data Sets on AWS

Wikipedia Extraction (WEX)

Click for a printer friendly version of this document Printer Friendly Save to del.icio.us
 

A processed dump of the English language Wikipedia

Submitted By: Santiago@AWS  
US Snapshot ID (Linux/Unix): snap-1781757e
US snapshot ID (Windows): snap-a6957ccf
Size: 66GB
Creation Date: 02/24/2009
Last Updated: 02/24/2009
License: GNU Free Documentation License 1.3
Source: Freebase

The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted intabular form. Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types.

Semantic extraction by freebase.com, using data from Wikipedia.org. Snapshots prepared by the infochimps.org team using community curated metadata. Released under the GNU Free Documentation License.

Discussion
Click to start a discussion on this document Create a New Discussion
No discussion has been created for this document.

Reviews
Create Review Write a Review
Be the first to review this.
Welcome, Guest Help
Login Login