SWAT4HCLS virtual hackathon

Welcome to the virtual hackathon of SWAT4HCLS 2021

This hackathon will be hosted on January 15th and January 19th 2021. It is a two days event. During these two days, participants will be able to discuss, hack and/or collaborate on various topics related to the Semantic Web in the life sciences.


We will host the following platforms to allow online collaboration


The opening and wrapping up sessions will be hosted on zoom: https://us02web.zoom.us/j/84103126982?pwd=QzVSenhRTHZFQk9paEZ6ZEhjVXJlZz09


On Discord we have created a SWAT4HCLS section. Here it is possible to host text and voice conversations. Within the voice conversation, it is possible to share screens. The SWAT4HCLS channel can be accessed by following this link. There are already a set of preset channels.


Inspired by the previous edition of the biohackathon we are also hosting the event on Remo.

Live streaming of presentations and tutorials

It is possible to present results or give a short tutorial or demo. Participants interested in using this opportunity, please reach out to us on Discord to discuss a streaming session. These sessions will be streamed live and will remain available online afterward.


The event is free, but registration is required. Please through Eventbrite


Friday January 15th

Times are CET (see calendar for other timezones)
time description links
09:00 welcome, introduction and pitches (Zoom)
hacking, collaborating, writing papers, etc (Discord) (Remo)
12:30 report back (Zoom)
hacking, collaborating, writing papers, etc (Discord) (Remo)
17:00 report back (Zoom)
hacking, collaborating, writing papers, etc (Discord) (Remo)
21:00 wrapping up (Zoom)
hacking, collaborating, writing papers, etc (Discord) (Remo)

Tuesday January 19th

Times are CET (see calendar for other timezones)
time description links
09:00 welcome, introduction and pitches (Zoom)
hacking, collaborating, writing papers, etc
12:30 report back (Zoom)
hacking, collaborating, writing papers, etc
17:00 report back (Zoom)
hacking, collaborating, writing papers, etc
21:00 wrapping up (Zoom)
hacking, collaborating, writing papers, etc


Pitches for what to work on are welcome. Please add your pitch below or in the following GDoc

Title: Using FAIR in healthcare

SWAT has recently expanded to the healthcare realm. Yet, despite efforts from many people including me, FAIR is still not really used in healthcare. This is due to many factors, but making data FAIR is still one of the biggest bottlenecks in implementing FAIR. I would therefore like to brainstorm about strategies that would make this process easier and therefore find more traction for FAIR in clinical practice

Discord: Chat and Voice channels Pitch by Rianne Fijten

Title: Creating subsets from Wikidata.

During the past editions (2019-2020) of both the biohackathons and SWAT4HCLS we worked on extracting subsets. During the last virtual hackathon of the biohackathon we were able to extract subsets from using Shape Expressions from Wikidata. We would like to use the SWAT4HCLS hackathon to continue the developments, i.e. finalizing the above-mentioned pipeline, but also deploy the workflow on a set of use cases.

Channel: chat Pitch by Jose Emilio Labra Gayo

"Continue collaboration around subsetting and layering of open knowledge graph data using Docker-packaging, Wikidata subsetting, and schema mapping strategies to make it easier to mix-and-match SPARQL databases and datasets. For example, extracting lifesciences-oriented wikidata content, translating to bioschemas / schema.org vocabulary, and mixing with crawled bioschemas data from Proteomics web sites. Discussion topics could include questions around best practices and documentation for packaging (docker etc.) the combined datasets alongside software such as SPARQL databases and querying clients."

Channel: chat Pitch by Dan Brickly

Title: Expressing Wikidata as indexed binary RDF

Pursuant to Wikidata subsets, the [HDT (Header Dictionary Triples](https://www.rdfhdt.org/) binary RDF format provides random access to RDF data. This allows programs to work with large datasets using limited memory. The goal is to express Wikidata in HDT and develop tools (e.g. ShEx validators) to consume it without burdening the BlazeGraph SPARQL server.

Channel: chat Pitch by Eric Prud'hommeaux

Youtube: https://youtu.be/Wd_sSecDAE8

Title: Github actions to check semantic web artifacts

This pitch is inspired by pull request review work done in this semantic model repository. During the pull request review process we did do systemic manual checks like checking syntaxic validation of shex and turtle files, removing redundant contents and unused prefixes. So in this hackathon we would like to explore options to make github actions to automatically do some of our systemic manual checks.

Channel: chat Pitch by Rajaram Kaliyaperumal

Youtube https://youtu.be/Ou4wBtSLbDQ

Title: Linking Complex Portal with other FAIR resources

Complex Portal is a resource to collect and curate knowledge around macromolecular complexes in model organisms. During this Hackathon we would like to propose a project on making the Complex Portal more interoperable/FAIR. This includes linking Complex Portal entries with linked data resources such as Wikidata, WikiPathways, Scholia and the (COVID-19) disease map projects. Please join us…

Channel: chat Pitch by Birgit Meldal and Egon Willighagen

Youtube https://youtu.be/g_YgVZgSzYQ

Title: Contextual phenotype identification in any text related to medical conditions

Help us collect health-related content from Twitter, Reddit and other social media. Especially during the COVID-19 epidemic, user-generated content is becoming the most relevant and up-to-date source of health information. We would like to use this data to benchmark our self-supervised phenotyping methodology. During the hackathon, we will propose a demo assessment of our methodology on the clinical data.

Channel: chat Pitch by: Vibhor Gupta

Youtube https://youtu.be/8ebZ7q81wbw

Title: ShapePaths to Access Schemas and the Data They Validate

ShEx has a practical JSON structure which could in principle be access via some JSON Path language (e.g. [this one](https://support.smartbear.com/alertsite/docs/monitors/api/endpoint/jsonpath.html)). However, the things we want to say with such a path language (identify triple constraints in a shape, create error reports which specify the navigation of shapes and properties that created the error, extract triples and RDF terms from RDF graphs validated by parts of a schema identified by a path) call out for a more specialized syntax that will be both terse and intuitive. The goal is to continue development of the ShapePath language, as driven by use cases.

Channels chat Pitch by Eric Prud'hommeaux

Title: putting UMLS Metathesaurus into a Wikibase instance

UMLS combines several known terminologies such as ICD and SNOMED into one big meta-thesaurus. Having triplestore and ElasticSearch capabilities behind such a rich resource could be of great value. The idea is to create a workflow docker image that takes a downloaded UMLS version as input and runs in combination with a Wikibase docker instance. The workflow docker image would create the necessary predicates and then would put the whole UMLS metathesaurus into the Wikibase instance.

Channels chat Pitch by Andreas Thalhammer

Title: Exploring grlc and Salad to align Web APIs with Linked Data

Web APIs are the most extended way of enabling programmatic access to data on the Web and Linked Data is the structured data underlying the Semantic Web. However, Web APIs usually rely on JSON or YAML structured data documents and the implementation of Web APIs around Linked Data is often a tedious and repetitive process. Recently have appeared grlc and Salad resources to bridge this gap. grlc is a tool to automatically convert SPARQL queries into Web APIs. grlc is a lightweight server that translates SPARQL queries stored and documented in GitHub repositories to Linked Data APIs on the fly. Salad is a schema language that describes rules for preprocessing, structural validation, and link checking for documents and provides a bridge between document and record oriented data modeling and the Semantic Web. The goal in this hackathon is to play around with both tools driven by use cases such as the alignment of the Query Builder Web API and the RDF patient registry data that are under development in the European Joint Programme on Rare Diseases ([EJP RD) project. If you want to cook something interesting in this hackathon, join us!

Channel: chat Pitch by Núria Queralt Rosinach and Rajaram Kaliyaperumal

Youtube https://youtu.be/gK4bJ9xkZDY

Title: Adding logical structure to the COVID-19 epidemiology ontology

Rapid analysis of epidemiological data is necessary to monitor disease outbreaks and to allow public health institutions and governments to make timely evidence-based decisions. But, the COVID-19 pandemic brought into focus the need to efficiently find, access, share and re-use COVID-19 epidemiological data. The COVID-19 epidemiology ontology has started to be developed at the last BioHackathon-Europe 2020 (proposal 30) to have these data as FAIR as possible. Currently, it is a plain list of curated terms mapped to terms in OBO ontologies. The goal in this hackathon is to continue its development by defining and implementing axiom patterns, please join!

Channel: chat Pitch by Núria Queralt Rosinach

Youtube https://youtu.be/gK4bJ9xkZDY

Title: Converting REACTOME database in JSON-LD and explore the potential in ElasticSearch/Siren investigate

Making datasets available in RDF format was at the heart of the Bio2RDF project. Now with the availability of JSON-LD standard and the scalable NO-SQL technologies (mongo, neo4J, elasticsearch), how can the Life Science Linked Data community could benefit from those newer approaches? Our goal is to create a Biomart like user interface built around Reactome, CHEBI, Uniprot and GO databases all interlinked with URIs.

Technologies: Python, JSON-LD, ElastiSearch, Siren Investigate

Channel: chat Pitch by: François Belleau

Youtube https://youtu.be/Wd_sSecDAE8