Modern progamming languages: ByteCode and Virtual Machines

97 359 0
Modern progamming languages: ByteCode and Virtual Machines

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Modern programming languages: ByteCode and Virtual Machines CSE 6329, Spring 2011 Christoph Csallner, UTA Old days: No Virtual Machine You write: Program in source language MyProg.cpp Source language specification Book: “The C++ programming language” Source-to-machine compiler (Old) Microsoft Visual Studio Program in machine code MyProg.exe in MS Windows x86 binary Machine instruction set Actual machine Old days: No Virtual Machine You write: Program in source language Program in Intermediate Representation MyProg.cpp Source language specification Book: “The C++ programming language” Source-to-machine compiler (Old) Microsoft Visual Studio Program in machine code MyProg.exe in MS Windows x86 binary Machine instruction set Actual machine Today: Virtual Machines popular You write: Program in source language MyProg.java Source language spec Java Specification Src-to-bytecode comp javac MyProg.java Program in bytecode MyProg.class Bytecode language spec JVM Specification Virtual machine java MyProg Program in machine code (mostly) MyProg in MS Windows x86 binary Machine instruction set Actual machine Program Analysis today • Many programs compiled to bytecode – Virtual machine executes bytecode • Bytecode has advantages over source language • Many Program Analyses analyze bytecode – Results translated back to your original Java/C#/… source program • Example program anlyses that are very easy to use: – For Java: FindBugs: http://findbugs.sourceforge.net/ – For C#: Pex for fun: http://www.pexforfun.com/ Big picture You write: MyProg.java Source to bytecode compiler E.g.: javac, MS Visual Studio Program analysis E.g.: FindBugs, Pex Bytecode: MyProg.class Virtual machine, e.g.: JVM, Net runtime OS, machine code E.g.: Windows x86 Why is bytecode good for Program Analysis? Simple yet powerful • Bytecode is simpler than source language – Similar to compiler IR – Simplifies analysis – Java, C#, VB, F#, etc are far more complex Simple yet powerful • Bytecode is simpler than source language – Similar to compiler IR – Simplifies analysis – Java, C#, VB, F#, etc are far more complex • Retains most information of source language – Similar to compiler IR – Enables meaningful analysis Simple • Fewer language elements = less “syntactic sugar” • Example: Explicit loop constructs in Java – Sourcecode: • Which ones? CONSTANT_Fieldref_info CONSTANT_Methodref_info • Reference to a field/method/constructor • CONSTANT_Fieldref_info { // similar for all – u1 tag; – u2 class_index; // type declaring this member // CONSTANT_Class_info – u2 name_and_type_index; // simple name and descriptor // CONSTANT_NameAndType_info } 84 Wait, Aren’t there many more details? Yes, see the JVM Specification or ask in class, office hours, mailing list Do you have a small example? The simplest possible class file Interface I in Java source code • /** * @author csallner@uta.edu (Christoph Csallner) */ public interface I { } 87 Interface I in Java bytecode (hex) • cafebabe00000032000707000201000149070004010 0106a6176612f6c616e672f4f626a65637401000a536 f7572636546696c65010006492e6a61766106010001 000300000000000000010005000000020006 88 Interface I in Java bytecode (hex) • Cafebabe 0000 0032 • 0007 – – – – – – // Header // Constant Pool 07 0002 // cp[1] 01 0001 49 // cp[2] 07 0004 // … 01 0010 6a6176612f6c616e672f4f626a656374 01 000a 536f7572636546696c65 01 0006 492e6a617661 // cp[6] • 0601 // Access Rights • 0001 0003 000000000000 0001 0005000000020006 89 Header • Ca fe ba be • 00 00 • 00 32 // magic // minor_version: // major_version: 50 90 Constant Pool Count and first two items • 00 07 • 07 – 00 02 • 01 – 00 01 – 49 // constant_pool_count: // cp[1]: First cp item, index // cp_info tag = Class // CONSTANT_Class name_index: // cp[2], cp_info tag: = Utf8 // CONSTANT_Utf8 length: byte // only byte of Utf8 String value: “I” 91 Rest of Constant Pool • 07 0004 • 01 0010 // cp[3] = Class, points to item // cp[4] = Utf8, 16 bytes – 6a 61 76 61 2f 6c 61 6e 67 2f 4f 62 6a 65 63 74 // “java/lang/Object” • 01 000a // cp[5] = Utf8, 10 bytes – 53 6f 75 72 63 65 46 69 6c 65 // “SourceFile” • 01 0006 // cp[6] = Utf8, bytes – 49 2e 6a 61 76 61 // “I.java” 92 Access rights, name, subclass relation, interfaces, fields, methods • 06 01 // Access flags – 0x0001 = Public – 0x0200 = Interface – 0x0400 = Abstract • • • • • 00 01 00 03 00 00 00 00 00 00 // this class: cp[1] // super class: cp[3] // interfaces count: zero // fields count: zero // methods count: zero 93 Attributes • 00 01 • 00 05 • 00 00 00 02 • 00 06 // attributes count: one // attributes[1], index of name // points to cp[5] = “SourceFile” // length of attribute: bytes // index of source file // points to cp[6] = “I.java” 94 Interface I in Java bytecode (hex) • Cafebabe 0000 0032 • 0007 – – – – – – // Header // Constant Pool 07 0002 // cp[1] 01 0001 49 // cp[2] 07 0004 // … 01 0010 6a6176612f6c616e672f4f626a656374 01 000a 536f7572636546696c65 01 0006 492e6a617661 // cp[6] • 0601 // Access Rights • 0001 0003 000000000000 0001 0005000000020006 95 How can I …? Low-level tool support Editor for binary files • Frhed for MS Windows, open source – http://frhed.sourceforge.net/ 97 Class File Disassembler • Part of JDK, usage: javap -verbose 98 [...]... Java source programs (and more) • Bytecode retains most information of original source program – Allows automatic reconstruction of source from bytecode – “Dis-assembler” fast, powerful, and convenient 12 Accessible • Several “dis-assembler” libraries provide a nice API to retrieve and even change bytecode – Beyond capability of Java or C# built-in reflection – BCEL and ASM for Java bytecode – ExtendedReflection... ExtendedReflection (part of Pex) for Net bytecode 13 Documented Standard • Carefully designed and specified – Better than most compiler IR • Java Virtual Machine specification – http://java.sun.com/docs/books/jvms/second_edition/ht ml/VMSpecTOC.doc.html • Net Virtual Machine specification – http://www.ecmainternational.org/publications/standards/Ecma-335.htm 14 Shared Standard • Shared standard among different languages... http://www.amazon.com /Virtual- Machines- Iain-D-Craig/dp/1852339691 STRUCTURE OF THE JAVA VIRTUAL MACHINE 23 Structure of the Java Virtual Machine = Sections of chapter 3 of JVM Spec 1 2 3 4 5 6 7 The class file format Data types Primitive types and values Reference types and values Runtime data areas Frames … 24 Class file format • Standard format for Java bytecode • JVM accepts bytecode only in class file format • JVM Spec, Section... disassembler to help with bytecode • Following mostly copied from Java virtual machine specification 2nd edition: http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html 22 JVM Specification: http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html Iain D Craig, Virtual Machines, Springer, 2005, Chapter 3: http://www.amazon.com /Virtual- Machines- Iain-D-Craig/dp/1852339691... x86 Linux machine 16 Bytecode: Shared intermediate language You write: MyProg.java You write: MyProg.cs You write: MyProg.ada Source-tobytecode compiler Bytecode: MyProg.class Java virtual machine MyProg in Windows x86 Windows machine MyProg in Linux x86 Linux machine MyProg in XYZ binary Cell phone 17 Many software engineering papers focus on combination of Java source with Java bytecode • • • • Probably... 14 Shared Standard • Shared standard among different languages – Java, C#, VB, F#, etc all compiled to same bytecode – Programs in many source languages can be checked with single Program Analysis tool • Shared standard among different operating systems – Cell phones, mainframe, etc all run same bytecode – Programs on many OS can be checked with single tool 15 Old days: Typically no shared intermediate... javac compiler implements our source -bytecode combination • Comes with JDK • Open source • Translates any legal Java program into a Java bytecode program • Eclipse compiler is an alternative You write: MyProg.java Java spec javac Bytecode: MyProg.class JVM spec Java virtual machine … 21 Overview • Following overview gives a flavor – Slightly simplified: Details may differ from JVM – Omits several parts:... Java bytecode • • • • Probably easiest to understand Other combinations work similarly Well documented, many research papers Industrial-strength, but still relatively simple – – – – C# started with Java-like features But C# grew faster more complex now C++ more complex than Java Other combinations more obscure 20 javac compiler implements our source -bytecode combination • Comes with JDK • Open source... Example: Explicit loop constructs in Java – Sourcecode: 4 • while, do (“until”), basic for, enhanced for – Bytecode: 0 • ? 10 Simple • Fewer language elements = less “syntactic sugar” • Example: Explicit loop constructs in Java – Sourcecode: 4 • while, do (“until”), basic for, enhanced for – Bytecode: 0 • All 4 are mapped to jumps – Makes program analysis easier to implement 11 Powerful • Still a non-trivial,... format • Binary format • Independent of hardware and OS – Fixes byte order (“endianness”), regardless of byte order of current machine • Independent of actual files, despite the name • Class may arrive at runtime as a byte array from elsewhere – From a class generator – From the web 26 Class/interface class file • 1:1 mapping between (class or interface) and class file – Class file can define a class

Ngày đăng: 15/02/2016, 10:02

Tài liệu cùng người dùng

Tài liệu liên quan