First presentation to Canonical Debate Lab

Presentation to Canonical Debate Lab. A good starting point for overall goals, notes below.

State of HyperKnowledge 2020-06

Agenda

Motivations for HK
Specific goals
State of solutions
Applicability to CD

Motivation

Ultimate goal: deliberative democracy. Better conversations about social goals and means.
Local and global organizing. Fractal democracy.
Many projects can gain from collective intelligence, valid objections by experts and/or community get drowned by trolls & disinformation. Make it clear when a position has no backing, or is attacked by an argument that is not itself credibly attacked.
Make bridges between silos so we can learn from diverse communities and diverse point of views.

Specific goals

Low barrier to entry
- Structuring information is a skill. People without that skill have useful things to contribute. Start with informal conversation.
Large-scale conversations
- Fractal views: how to get an intelligible synthesis of what is said on specific issues. The conversation is too large to absorb. Requires structure.
- Composite view: easier to show multiple points of view on graph than text.
  - Text is still vital to make any one viewpoint intelligible.
- Many people use different vocabularies. Topic unification is key to gaining from diversity.
- All text is indexed by topic (semantic elements, below). The text content and reply hierarchies are important for navigation and communication, but sensemaking and exploration proceeds through topics.
Ecosystem of interoperable tools working on concept graphs
- Experiment with multiple views, moderation and voting techniques, etc. (Experiments because most known naive approaches are known to fail.) There will never be one ultimate tool.
- Communities have conversations on platforms. Shifting platforms is hard. We need to build bridges.
- People need to have semi-private space even on public discourse, because they fear being held accountable for half-baked ideas. So they want to own their data. Hosting everything on one server is a non-starter.
  - Eg: TopicQuest interaction model leverages the observation that you can partially shield the conversation from flame wars if you contain the conflict within semi-private guild spaces.
Resilience to attacks
- argument barrage
- false equivalence
- disinformation pollution

Specific technical goals

HyperKnowledge map: Topic Map plus some aspects taken from Sowa\'s Conceptual graphs
- Topic: anything that can be talked about is a topic. N-ary associations between topics (with named roles) are topics.
  - Topics form a generalized hypergraph: an association can point to another association. Why is this necessary? Because any claim can be brought to scrutiny, and that includes claims about claims about claims.
- A key feature of topic maps is the possibility to do a posteriori topic merging when two topics are found to refer to the same thing/idea in a given context.
- The topic is not the term that designates it. (In Piercean terms, topics are signified, not signifiers.)
- Conceptual graph: Sowa includes a lot of formal logic which is out-of-scope for me, but includes the possibility to refer to a universe of discourse, i.e. a subgraph, as a unit. Important especially for hypothetical spaces (compare Fauconnier\'s mental spaces) or simply subjunctive clauses.
Event sourcing, because we need to make stable point-in-time references to evolving concept graphs. But also we want to subscribe to changes.
Multi-source. A stream of events represents a perspective (personal or community.) How individual perspectives are unified in a community perspective depends on specific techno-social community process.
Git-inspired fork and merge semantics. A git of knowledge representation. Creating viewpoint streams should be as easy as creating branches in git.
Peer asymmetry. Global federation will be a server farm, but a small process in the browser is also a peer. So essential to define slices of the event stream, as we cannot assume we can swallow the firehose.
Federated query. Who has events (ie. associations ie. claims...) about a given topic?
Some topics are \"pure\" concepts, ie. do not represent a resource in RDF sense, i.e. nobody \"owns\" a canonical representation. We\'re all building our perspective\'s view.
It should be possible to do federated queries that will retrieve topics independently of chosen identifiers. This includes identity-independence in atoms of compound statements.
Microservices can consume an event stream and enrich it with computations, either through an api or, better, in the stream. This allows for a distributed reactive architecture, with all the benefits and subtle pitfalls. We need to include cycle detection in the design.
Topics can be anchored in informal discourse. (Known as topic occurence) We want to create content (text, media) that explains some topics with explicit embedded topic references. More important, topic references can be identified post hoc in text, video, etc., incl. proprietary platforms.
- We need a data format for text that embeds possibly-overlapping anchors, and is somewhat resistant to edits.
Ties with personal identity: tie together accounts on multiple platforms where content may happen, maybe look at distributed identity, etc.

Non-goals and anti-goals

Automated reasoning is out of scope.
- Having a data structure that makes automated reasoning easier is desired. We\'re looking at datalog variants as a model.
Canonical vocabulary is an anti-goal. Vocabularies evolve under the pressure of new and diverse needs.

Technical model

Viewpoint Streams: stream of events and/or state representing a viewpoint. A server may host multiple streams.
- Metadata representation of viewpoint stream must give endpoints below.
- Also meta-information, eg is this stream derived from / following another stream
Topic identifiers
- We need to refer to topics, so we need identifiers that uniquely refer to the topic.
- We can decide that two identifiers refer to a single topic. Two main use cases:
  - In a federation context, new topics are generated all the time, and we cannot expect a priori agreement on identifiers, so we will often deal with identifiers local to a viewpoint stream.
    - Some identifiers will be agreed upon by many streams (eg RDF vocabularies), but we should never assume they are universal.
  - Even within a single viewpoint stream, we may decide that two topics which were thought to be different refer to the same entity/concept, or vice versa. So it must be possible to merge or split topics after it was created. This is even more true with references to \"foreign\" topics (i.e. identifiers local to another viewpoint stream.)
- There is no guarantee that two perspectives will agree whether to merge any two given topics. (Topic merging can be used aggressively, e.g. \"socialism is fascism.\") So topic merging is local to a viewpoint stream.
  - I can make a federated query with one identifier and get replies using different identifier.
- Each viewpoint stream should maintain a single canonical identifier for any topic; it should be easy to query for the canonical identifier given any identifier. (Useful for federated query, below.)
- Topic identifiers need not be locators. (ie. they could be URN and not URLs.) (Linked data prefers URLs as identifiers, but it assumes that there is a single owner for the resource, vs shared object of discourse.) OTH, canonical identifiers for events, (though they are also topic identifiers) must be URLs, including the viewpoint stream URL, and it should be possible to infer a partial order of events from event identifiers alone.
  - There is no requirement that the URL be DNS-based. Server are fragile. Distributed Web identifiers are URLs too, in that you can use them to retrieve data.
- Further, people will often refer to topics through ambiguous linguistic expressions. Those names can refer to the topic, sometimes uniquely, but generally this is a many-to-many relationship, which is likely to be fragile. Ideally, when dealing with text, names should be replaced by identifiers as soon as possible. Linguistic names can be used in topic queries, but expect multiple answers.
Event types:
- Simple Graph Event:
  - Event identity: partially ordered, based on Stream URL
  - topic attribute binding or association
  - Event may replace other events
  - Event creator (may be computational)
  - truth value in [-1,1] (reference to Rönnbäck)
  - event creation time
  - event applicability time
    - the event time represents the moment the viewpoint stream becomes aware of a fact. It may refer to a real-world event with an entirely different temporality
  - Signed? Encrypted? Still thinking those through.
- Social Graph Event:
  - represents a social aggregate of simple graph event. Computational creator should have a representation of algorithm.
  - Must contain a representation of multiple values and some statistical properties
  - Must be possible to obtain references to individual graph events, but probably indirectly (eg tree of last single event and previous social graph event), and reference does not imply access (e.g. of encrypted vote, possibly with zero-knowledge proof. How to do voting is a separate discussion.)
- Literal value creation (by reference for large documents) or modification event (e.g. text edition event, depends on literal type).
- High-level semantic event. Is decomposed to lower-level atomic events (graph or literal), but inserted in the event stream to help interpret the sequence of actions. Can also provide transactional or saga semantics. Can be ignored by event reducers (in the redux sense.)
- Merge event (events defined by a frozen slice of another stream are considered appended to current stream) If merge events are merged, materialize explicitly. Events from may be copied in new queue (e.g. to translate foreign topic names to local names.)
- Open issue: what of IP associated with some data? We may need to carry license metadata... But it can also be a vector of attack.
Open issue: Cache slices of foreign queues to avoid being dependent on foreign server? Think of a distributed redundancy mechanism? Probably build on IPFS or DAT here.
- This is a federation-level service, most clients could ignore this layer. But some requests have to fallback from server to federation queries.
API points for a viewpoint stream:
- API points sound like there should be a server, but the most important of these could be build as distributed web protocols.
  - topic identity: viewpoint stream must tell us if it knows a topic under another name locally. (I.e. which is the canonical name for that topic according to this stream)
  - stream discovery: what other streams do you know that have events about this topic?
    - Ideally such a query should be protocol-level, like a DHT
    - Must be alias-aware
    - Some ideas about using Bloom tables for this, would make this API probabilistic
  - Queue slice query and subscription
    - Events can be obtained from a server, a distributed web technology (IPLD, hypercore, holochain...) etc. It should not matter.
    - Slices based on named (materialized) collections
    - Those collections should have path-based rules for monotonic growth
    - Must be possible to refer to collection state at time T (frozen collection) for merge events
    - Events since a known event time
    - Since this is a basis for reactive event chaining, we need loop detection and backpressure
- Those API points probably assume a server
  - Topic snapshots: all \"head\" events about a given topic (or its local alias.)
    - head events i.e. not replaced by a later event.
    - Computed and cached periodically for faster retrieval of past state
  - Propose events to be merged (push)
  - Propose to merge events from (a slice of) another datasource.
    - Open question: how to validate such events? What should consistency checks look like? What\'s the flow of accepting/refusing events?
Thanks to social graph events and ordering given by explicit event invalidation, the reduction of events to a state is highly sequence-independent and can be treated as a CRDT. We should aim for CRDT properties for literal modification events as well.
Possibility: The claim (association/attribute) within the event structure is an addressable topic, and should have a content-addressable hash identity (as in IPLD). If the local topic identities follow a canonical scheme, those claims will be equally normalized. It becomes easier to make queries for such higher-order topics across streams using the same canonical scheme. (Literals are also treated as hashes in this case)
- Even otherwise, a query for such a compound topic would ship the topic, and translation to local names can be made by the receiving server.
Base vocabulary close to OWL: sameAs, instanceOf, subclassOf, subPropertyOf, etc.
- If a topic represents a document, make that an explicit property (refersTo?). This is very different from LD where a resource URL leads to the resource\'s representation
- Interop with RDF is a half-hearted goal, but pointing to a named claim means we cannot have transparent round-trip. Still, the subset of RDF that fits in HK should be usable fairly transparently. Because Wikidata has claim identifiers (only one level), round-trip should be more robust.
View component architecture, fed by ecosystem microservices. (WebComponent based?)
Concretely, which DB? That\'s an implementation detail, but I want to abstract many considerations. This must live on either server or client, so ideally should be composable from KV stores. Basically for each stream, we have the event queue, the name equivalence classes, and an index (KV-store) with (topicID, time) pointing to the events Ids. (I may want a single physical event queues for multiple viewpoint sources\' virtual queues; then index by (sourceId, topicId, time))
Open questions: identity (DID?), permission model, collection data structures (besides obvious membership association), front-end component architecture, text anchoring (though that is likely to build on WebAnnotation)...

Applicability to CD

Need to refer to foreign references (for evidence)->topic reference
Multiple sources of evidence -> topic merge
Will probably maintain a single (canonical) map of which topics are merged (vs maintaining multiple contentious merges in different viewpoint streams.)
- Must think of a mechanism to handle contentious unification. But decision by fiat viable in context.
Claims already have local identity, which is not based on component topics. Probably keep it that way in near future, can be added as unification.
Argument edges (Claim applications) are treated as nodes, as a single level of reification.
- Note: two levels, but different levels than wikidata, which names entities and claims, while CD names claims and arguments (and is considering entities).
- How much do I need higher levels? Can the argument on argument always be reified? What about property assignments? Maybe to discuss whether two arguments are conflicting... Can that be done socially?
Is social aggregate part of CD representation? (probably computed vs reified)

References

Information technology - common logic (cl): a framework for a family of logic-based languages. Technical Report ISO/IEC 24707:2018(E), ISO/IEC, Geneva, 07 2018.
Felix Dietze, André Calero Valdez, Johannes Karoff, Christoph Greven, Ulrik Schroeder, and Martina Ziefle. That's so meta! Usability of a Hypergraph-based Discussion Model, in Proceedings of the International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management, 2017
Gilles Fauconnier. Mental Spaces: Aspects of Meaning Construction in Natural Language. Cambridge University Press, New York, NY, 1994. isbn: 978-0521449496
Ben Goertzel, Cassio Pennachin, and Nil Geisweiller. Engineering General Intelligence, volume 5-6 of Atlantis Thinking Machines. Atlantis Press, 1 edition, 2014. and 978-9462390294
Lars Rönnbäck. Modeling conflicting, unreliable, and varying information. 12 2018.
John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213--237. Elsevier, 2008. isbn: 9780444522115
Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007.
Linas Vepštas. Sheaves: A topological approach to big data. CoRR, abs/1901.01341, 2019.
Chris Gebhardt, InfoCentral