Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Lists in Wikipedia
Wikipedia contains a lot of lists …
- Content reproduced from other articles
- Manually created
- Possibly outdated and prone to copy'n'paste errors
Can we find better solutions?
Do Categories help?
- English Wikipedia makes strong use of categories
- Many different uses (see Jakob Voss' paper)
- Classification by properties:
- Category:Novels by Nick Hornby
- Category:Lists of comedy-drama television series episodes
- Category:1981 births < Category:1990s births < Category:20th century briths
- Category:Rivers of the carribean
- Category:Bridges across the River Gipping
Properties in Semantic MediaWiki
- Shouldn't property values like «birthdate» just be entered directly?
- Compare Personendaten on German Wikipedia
- Semantic MediaWiki (SMW) introduces properties into MediaWiki:
- written inline using link syntax [[property::value]]
- like categories, properties have an own page-namespace
- many builtin datatypes (text, page, number, date, …)
- see demo
Retrieving Semantic Information
Property information accessible in many ways:
Problems?
Problem 1: Performance
«But it will slow down Wikipedia even further!»
- All features can be switched on or off
- Query complexity can be restricted
- Large-scale tests on Wikipedia-size datasets
- Achitecture for enabling outsourcing of complex computations
Problem 2: Conceptualisation
«No one knows which properties should be used. It will be chaos!»
- Similar to catgories, similar community mechanisms required
- Queries help to consolidate properties
- Option to allow only selected properties
- Some chaos does not hurt much in a wiki
Problem 3: Editing complexity
«All this new stuff could repell contributors!»
- It is really just :: and [[...]]
- No new user interfaces
- Property syntax can be hidden in templates
- Compare complexity to formatting images, adding references, inserting templates, …
- WYSIWYG editing is the future
Applications
Eliminating redundancy
- Less redundancy, less replicated information
- Simpler information access for users
- Less maintenance effort for lists
Quality control
- Data can be checked directly
- Search for omissions, e.g. «Which cities do not have a population number?»
- Cross check data items, e.g. «Are all capital cities of a country also located in this country?»
- Compare data to external sources (government records, IMDB, library databases, …)
- Compare different language Wikipedias
Multi-lingual Access
Multi-lingual Access
Multi-lingual Access
- Articles, Categories, Properties have pages
- Pages have interlanguage links
- Simple translation of semantic data possible
- Browse content of Wikipedia editions in all languages
Outlook
Software status
- Modular extension of MediaWiki
- Current release is SMW 0.7
- Translations in 8 languages (De, En, Es, Fr, He, Pl, Ru, Sk)
- Used in many wikis, both public and private
- Active user community
Release plan
- SMW 1.0 expected for end of September (at least RC1)
- Major new features:
- Simplified annotation (just «Properties» instead of «Relations» and «Attributes»)
- Simplified datatype system
- More expressive queries
- Faster
- More extensible code base
- Future development in upcoming research projects
Outlook
Related projects
- Semantic Forms extension
- METAVID
- Halo Project
-
Next steps
- Finish SMW 1.0
- Find a Wikipedia edition to enable it first
«Be bold in enabling extensions!»