PhD position in programming languages and IoT at CITI Lab

 Title

Programming language abstractions for the Internet of Things

Keywords

Programming languages, distributed systems, Internet of Things, middleware

Location

CITI-INRIA Laboratory, Université de Lyon, INSA de Lyon (http://www.citi-lab.fr/)

Dynamid research team (http://dynamid.citi-lab.fr/)

Funding

This PhD thesis will be supported by the Spie-ICS – INSA chair on IoT that will start in September 2016.

Start date

September 2016

Contact

Main supervisor: Dr Julien Ponge julien.ponge@insa-lyon.fr

More info: https://julien.ponge.org

Co-supervisor: Dr Frédéric Le Mouël frederic.le-mouel@insa-lyon.fr

More info: http://perso.citi.insa-lyon.fr/flemouel

Topic
Research project

The so-called Internet of Things marks the convergence of small connected devices (e.g., personal devices, body devices, wireless sensors) and the larger set of more traditional distributed applications as accessed over standard Internet protocols. The “software is eating the world” mantra (http://www.wsj.com/articles/SB10001424053111903480904576512250915629460) is no lie as more and more of devices communicate with cloud-based services. Still, developing and integrating software remains largely a crafting exercise with mainstream programming languages, while research languages tend to be too impractical.

The architecture of modern applications is converging towards distributed services that expose standard-based interfaces. A service tends to fulfill a single functional purpose (e.g., storing some data / logs, providing authentication, and so on). In this setting an application shifts from a paradigm where it is made by assembling component libraries to a paradigm where many (distributed) processes form the application. Communications between such services are typically made using the general-purpose HTTP protocol, but more specific ones can be used when needed (MQTT for IoT devices, ZigBee in some wireless sensor networks, etc). Given that distributed services rely on the integration with other services through highly inter-operable protocols, it is very wise to take advantage of many programming languages rather than follow a “one size fits it all” approach.

Interestingly, the characteristics of distributed services deployed on cloud infrastructures are quite similar to those of (sensor) network gateways. Among many problems, these applications need to cope with concurrency due to network requests, and they have to bind data from/to network protocols. While middleware can be used to, say, automatically expose a HTTP service interface and perform data binding, or to provide concurrent programming abstractions, this remains orthogonal to programming language operational semantics and type systems.

The history of programming languages is paved with abstractions being moved from library support to first-class citizen language constructs: memory management (e.g., Java, Self), threads (e.g., Java), actor models (e.g., Erlang, Scala), communicating sequential processes over co-routines (e.g., Go), etc. Still, even with a modern programming language the development of distributed services involves lots of boilerplate code (e.g., types for network messages data-binding) and there is little to no static checks beyond types, especially with respect to the correctness of concurrent code. As an example, the Go programming language only provides runtime race condition detection.

In practice, one can observe that the code of a typical application based on distributed services involves a significant share of message processing and network operations. The literature lacks successful languages that were both practical and suitable for these kinds of networked applications. The Scala programming language is a prime example of a language effort that initially tried to address the need for the development of “XML services” with the support of XML semi-structured data elements in the language. Still, Scala does not enforce a concurrency model, it does not provide network programming helpers, and it merely focused its efforts on a sophisticated type system. Funnel (Functional Nets) was a predecessor of Scala with first-class support for concurrency primitives based on join-calculus. Still, it proved impractical to use in real-world applications, just like other attempts of join-calculus in the ML / OCaml families.

An alternative to composing distributed applications using programming languages is to rely on some orchestration language such as BPEL and workflow execution engines. Behavioral protocols can be extracted from BPEL processes, which is useful for checking correctness of distributed systems compositions. Still, the limited expressiveness of workflow languages combined with the complex tooling to develop, test and execute them limit their wider adoption in favor of more traditional programming languages.

The main scientific goal of this PhD thesis is to investigate which abstractions shall be part of the next-generation programming languages in the age of the Internet of Things. We are especially interested, but not limited to, the useful abstractions to cope with: concurrency, asynchronous programming, data processing, software dynamics, message passing, network membership discovery and distributed algorithms (e.g., consensus and transactions). Given the distributed / concurrent nature of the applications that we target, we are also interested in providing compilation-time assistance beyond classical type checking (e.g., deadlock detection, time-bound guarantees, operation sequences consistency, etc). Last but not least: we also want the research outcomes to be practical.

Anticipated challenges

1. Establish an exhaustive state of the art on programming language and middleware abstractions. Consider which ones shall be part of a programming language, and which ones shall be relegated to library support, based on an extensive study of distributed services requirements.

2. Propose a programming language, perhaps as a new or a derivative of an existing one like Eclipse Golo, a language developped at the CITI Lab. Formalize and prove the soundness and correctness of its type system and operational semantics. Classify the ranges of static checks that can be performed at compilation time. Devise which remaining checks shall be done at runtime. Discuss their algorithms.

3. Propose an implementation on top of the Java Virtual Machine or the LLVM code generation infrastructure with state of the art performance. Develop a rigorous micro-benchmarks tests suite, and revisit some suitable larger benchmarks from popular references like http://benchmarksgame.alioth.debian.org/.

4. Validate the language usefulness for developing distributed applications, both in cloud and wireless sensor gateway settings. Provide metrics to evaluate programs against other languages. Perform a field study on practitioners to assess the language practicability, suitability and learning curve.

As the work will be conducted in a larger project as part of the Spie-INSA chair on IoT, the candidate will conduct experiments and share progress with other PhD students in systems, networking and radio communications. We will take advantage of a large IoT experimental room that we have, as well as the FIT / CorteXlab testbeds (http://www.cortexlab.fr/).

Recruitment process
Expected skills

The candidate should have earned an MSc degree (or equivalent) in computer science and engineering. The candidate must have a strong background in distributed computing, both from theoretical and practical point of views, as well as good notions on programming languages theory and implementation. The nature of this work requires strong software engineering skills. Knowledge of the JVM internals or LLVM is a plus, as well as having been exposed to a wide range of programming language families.

How to apply

Email a motivation letter

Full CV with project and courses that could be related to the subject

Complete academic records (from Bachelor to MSc)

2 or 3 references

Applications will be reviewed when they arrive until one candidate is selected

References

Baptiste Maingret, Frédéric Le Mouël, Julien Ponge, Nicolas Stouls, Jian Cao and Yannick Loiseau. Towards a Decoupled Context-Oriented Programming Language for the Internet of Things. In Proceedings of the 7th International Workshop on Context-Oriented Programming (COP’2015) in conjunction with the European Conference on Object-Oriented Programming (ECOOP’2015). Prague, Czech Republic, July 2015.

Julien Ponge, Frédéric Le Mouël, and Nicolas Stouls. 2013. Golo, a dynamic, light and efficient language for post-invokedynamic JVM. In Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools (PPPJ ’13). ACM, New York, NY, USA, 153-158.

Julien Ponge, Computer Science & Engineering, Faculty of Engineering, UNSW. (2009). Model based analysis of time-aware web services interactions. PhD Thesis. University of New South Wales.

Martin Odersky. 2000. Functional Nets. In Proceedings of the 9th European Symposium on Programming Languages and Systems (ESOP ’00). Springer-Verlag, London, UK, UK, 1-25.

Martin Odersky and Matthias Zenger. 2005. Scalable component abstractions. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, New York, NY, USA, 41-57.

Burak Emir, Sebastian Maneth, and Martin Odersky. 2006. Scalable programming abstractions for XML services. In Dependable Systems, Jürg Kohlas, Bertrand Meyer, and Andrü Schiper (Eds.). Springer-Verlag, Berlin, Heidelberg 103-126.

Rob Pike. 2012. Go at Google. In Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity (SPLASH’12). ACM, New York, NY, USA, 5-6.

Cédric Fournet and Georges Gonthier. 1996. The reflexive CHAM and the join-calculus. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ’96). ACM, New York, NY, USA, 372-385.

Cédric Fournet, Georges Gonthier, Jean-Jacques Lévy, Luc Maranget, and Didier Rémy. 1996. A Calculus of Mobile Agents. In Proceedings of the 7th International Conference on Concurrency Theory (CONCUR ’96). Springer-Verlag, London, UK, UK, 406-421.

Cédric Fournet, Cosimo Laneve, Luc Maranget, and Didier Rémy. 1997. Implicit Typing à la ML for the Join-Calculus. In Proceedings of the 8th International Conference on Concurrency Theory (CONCUR ’97). Springer-Verlag, London, UK, UK, 196-212.

Chun Ouyang, Eric Verbeek, Wil M. P. van der Aalst, Stephan Breutel, Marlon Dumas, and Arthur H. M. ter Hofstede. 2007. Formal semantics and analysis of control flow in WS-BPEL. Sci. Comput. Program. 67, 2-3 (July 2007), 162-198.

Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, Mar. 2004.