Saturday 14 October 2017

MongoDB Architecture Introduction

Introduction

MongoDB is a NoSQL open-source, document-oriented database. It uses JSON(Java Script Object Notation) like documents, however data is stored in the database in BSON form. In MongaDB document is the basic unit of storage. It doesn't make any sense to call MongoDB as schemaless as there are collections involved in it. But schema design is very important in MongoDB, we can configure dynamic schema

  • 2007 - Company 10gen began developing MongoDB
  • 2009 Shifted to an open source development model and began commercial support and services
  • 2013 - 10gen changed its name to MongoDB Inc

Some Important Files in MongoDB

  • journal - This is like redologs for crash recovery
  • <database>.ns - This is namespace it stores metadata information 
  • <database>.o , <database>.1 - These are the datafiles that stores data

Why use MongoDB? Advantages


  1. It makes development task easy
  2. Can easily scale up horizontally 
  3. Parallelism can be achieved at server or hardware level
  4. Structured, unstructured data can be stored 

Nexus Architecture

MongoDB’s design philosophy is focused on combining the critical capabilities of relational databases with the innovations of NoSQL technologies. 

Expressive query language & secondary Indexes

Users should be able to access and manipulate their data in sophisticated ways to support both operational and analytical applications. Indexes play a critical role in providing efficient access to data, supported natively by the database rather than maintained in application code.

Strong consistency

Applications should be able to immediately read what has been written to the database. It is much more complex to build applications around an eventually consistent model, imposing significant work on the developer, even for the most sophisticated engineering teams.

Enterprise Management and Integrations

Databases are just one piece of application infrastructure, and need to fit seamlessly into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing technology infrastructure, processes, and staff, including operations teams, DBAs, and data analysts.

Flexible Data Model

NoSQL databases emerged to address the requirements for the data we see dominating modern applications. Whether document, graph, key-value, or wide-column, all of them offer a flexible data model, making it easy to store and combine data of any structure and allow dynamic modification of the schema without downtime or performance impact.

Scalability and Performance

NoSQL databases were all built with a focus on scalability, so they all include some form of sharding or partitioning. This allows the database to scale out on commodity hardware deployed on-premises or in the cloud, enabling almost unlimited growth with higher throughput and lower latency than relational databases.

Always-On Global Deployments

NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run across many nodes, including replication to automatically synchronize data across servers, racks, and data centers.

Tuesday 20 September 2016

Install Apache, PHP and Configure PHP Mongo Driver on Linux

Article demonstrates on how to install Apache and Php. Later on will configure PHP Mongo driver. This articles covers only PHP Mongo driver configuration Please refer the below mentioned link for installing MongoDB in Linux. 

How to install MongoDB on Linux system

So lets begin with Apache configuration.

Installing Apache

[root@pract1 ~]# yum install httpd
Loaded plugins: refresh-packagekit, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
.....
Total download size: 910 k
Is this ok [y/N]: y
Downloading Packages:
(1/2): httpd-2.2.15-54.0.1.el6_8.x86_64.rpm        | 832 kB     00:00
(2/2): httpd-tools-2.2.15-54.0.1.el6_8.x86_64.rpm  |  78 kB     00:00
.....
Updated:
  httpd.x86_64 0:2.2.15-54.0.1.el6_8

Dependency Updated:
  httpd-tools.x86_64 0:2.2.15-54.0.1.el6_8
.....
Complete!

Installing PHP

[root@pract1 ~]# yum install php php-pear php-devel gcc
Loaded plugins: refresh-packagekit, security
mongodb                      |  951 B     00:00
ol6_UEK_latest               | 1.2 kB     00:00
ol6_latest                   | 1.4 kB     00:00
Setting up Install Process
Resolving Dependencies
--> Running transaction check
.....
Total download size: 33 M
Is this ok [y/N]: y
Downloading Packages:
(1/17): cpp-4.4.7-17.el6.x86_64.rpm                 | 3.7 MB     00:03
(2/17): gcc-4.4.7-17.el6.x86_64.rpm                 |  10 MB     00:09
(3/17): gcc-c++-4.4.7-17.el6.x86_64.rpm             | 4.7 MB     00:04
(4/17): gcc-gfortran-4.4.7-17.el6.x86_64.rpm        | 4.7 MB     00:04
(5/17): libgcc-4.4.7-17.el6.i686.rpm                | 114 kB     00:00
(6/17): libgcc-4.4.7-17.el6.x86_64.rpm              | 103 kB     00:00
(7/17): libgfortran-4.4.7-17.el6.x86_64.rpm         | 267 kB     00:00
(8/17): libgomp-4.4.7-17.el6.x86_64.rpm             | 134 kB     00:00
(9/17): libstdc++-4.4.7-17.el6.x86_64.rpm           | 295 kB     00:00
(10/17): libstdc++-devel-4.4.7-17.el6.x86_64.rpm    | 1.6 MB     00:01
(11/17): openssl-1.0.1e-48.el6_8.1.x86_64.rpm       | 1.5 MB     00:01
(12/17): openssl-devel-1.0.1e-48.el6_8.1.x86_64.rpm | 1.2 MB     00:01
(13/17): php-5.3.3-48.el6_8.x86_64.rpm              | 1.1 MB     00:01
(14/17): php-cli-5.3.3-48.el6_8.x86_64.rpm          | 2.2 MB     00:01
(15/17): php-common-5.3.3-48.el6_8.x86_64.rpm       | 529 kB     00:00
(16/17): php-devel-5.3.3-48.el6_8.x86_64.rpm        | 512 kB     00:01
(17/17): php-pear-1.9.4-5.el6.noarch.rpm            | 393 kB     00:00
.....
Total                                                                                                                                                                1.0 MB/s |  33 MB     00:33
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
......
Installed:
  php.x86_64 0:5.3.3-48.el6_8                                   php-devel.x86_64 0:5.3.3-48.el6_8                                   php-pear.noarch 1:1.9.4-5.el6

Dependency Installed:
  php-cli.x86_64 0:5.3.3-48.el6_8                                                                 php-common.x86_64 0:5.3.3-48.el6_8

Updated:
  gcc.x86_64 0:4.4.7-17.el6

Dependency Updated:
  cpp.x86_64 0:4.4.7-17.el6                   gcc-c++.x86_64 0:4.4.7-17.el6    gcc-gfortran.x86_64 0:4.4.7-17.el6    libgcc.i686 0:4.4.7-17.el6               libgcc.x86_64 0:4.4.7-17.el6
  libgfortran.x86_64 0:4.4.7-17.el6           libgomp.x86_64 0:4.4.7-17.el6    libstdc++.x86_64 0:4.4.7-17.el6       libstdc++-devel.x86_64 0:4.4.7-17.el6    openssl.x86_64 0:1.0.1e-48.el6_8.1
  openssl-devel.x86_64 0:1.0.1e-48.el6_8.1

Complete!

Configure PHP Mongo driver

[root@pract1 ~]# pecl install mongo
WARNING: "pecl/mongo" is deprecated in favor of "channel:///mongodb"
downloading mongo-1.6.14.tgz ...
Starting to download mongo-1.6.14.tgz (210,095 bytes)
.............................................done: 210,095 bytes
118 source files, building
running: phpize
Configuring for:
PHP Api Version:         20090626
Zend Module Api No:      20090626
Zend Extension Api No:   220090626
Build with Cyrus SASL (MongoDB Enterprise Authentication) support? [no] :
building in /var/tmp/pear-build-rootDwggHq/mongo-1.6.14
running: /var/tmp/mongo/configure --with-mongo-sasl=no
.....
Build complete.
Don't forget to run 'make test'.
......
running: make INSTALL_ROOT="/var/tmp/pear-build-rootDwggHq/install-mongo-1.6.14" install
Installing shared extensions:     /var/tmp/pear-build-rootDwggHq/install-mongo-1.6.14/usr/lib64/php/modules/
running: find "/var/tmp/pear-build-rootDwggHq/install-mongo-1.6.14" | xargs ls -dils
521531    4 drwxr-xr-x 3 root root    4096 Sep 19 21:33 /var/tmp/pear-build-rootDwggHq/install-mongo-1.6.14
.....
532128 1784 -rwxr-xr-x 1 root root 1824969 Sep 19 21:33 /var/tmp/pear-build-rootDwggHq/install-mongo-1.6.14/usr/lib64/php/modules/mongo.so

Build process completed successfully
Installing '/usr/lib64/php/modules/mongo.so'
install ok: channel://pecl.php.net/mongo-1.6.14
configuration option "php_ini" is not set to php.ini location
You should add "extension=mongo.so" to php.ini
[root@pract1 ~]# 

Add MongoDB Extension and Verify

[root@pract1 ~]# vi /etc/php.ini
 Press i
 extension=mongo.so
 Press esc 
 :wq
[root@pract1 ~]#
Restart the apache services
[root@pract1 ~]# service httpd restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]
[root@pract1 ~]#

Now Verify if MongoDB drivers have been configured for PHP

[root@pract1 html]# php -m | grep -i mongo
mongo      --Output must be mongo which implies that PHP MongoDB driver have been configured successfully 
[root@pract1 html]# 

Another way to verify is to create one php file with below mentioned content 
[root@pract1 ~]# vi /var/www/html/phpinfo.php
 Press i
 <?php
 phpinfo();
 ?>
 Press esc 
 :wq

Open the web browser and check the phpinfo.php file using below mentioned address 
http://<ipaddress/hostname>/<port>/phpinfo.php
Ex:-     http://192.168.56.101/phpinfo.php
On web browser there has to be a mongo module similar to the one mentioned below


Saturday 17 September 2016

Installing MongoDB on Linux

Article illustrates the step by step procedure, how to install the MongoDB on Linux. MongoDB is one of the popular document based NoSQL database.


Add MongoDB Repository 

Lets login to the server as root user and add the MongoDB repository on our system. Goto repository location 

[root@pract1 ~]# cd  /etc/yum.repos.d

Create the mongodb repository. Goto vi editor to create repository 
[root@pract1 yum.repos.d]# vi  mongodb.repo
Press i  It performs insert operation in the file 

[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1

Press ESC button then
:wq   
[root@pract1 ~]#
Note:- :wq or :x in editor will save the file and quit the vi editor


Begin MongoDB Installation

Now install the MongoDB by running below mentioned command

[root@pract1 yum.repos.d]# yum install mongodb-org
Loaded plugins: refresh-packagekit, security
mongodb                                  |  951 B     00:00
mongodb/primary                          |  45 kB     00:00
mongodb                                             279/279
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package mongodb-org.x86_64 0:2.6.12-1 will be installed
--> Processing Dependency: mongodb-org-shell = 2.6.12 for package: mongodb-org-2.6.12-1.x86_64
--> Processing Dependency: mongodb-org-server = 2.6.12 for package: mongodb-org-2.6.12-1.x86_64
--> Processing Dependency: mongodb-org-tools = 2.6.12 for package: mongodb-org-2.6.12-1.x86_64
--> Processing Dependency: mongodb-org-mongos = 2.6.12 for package: mongodb-org-2.6.12-1.x86_64
--> Running transaction check
---> Package mongodb-org-mongos.x86_64 0:2.6.12-1 will be installed
---> Package mongodb-org-server.x86_64 0:2.6.12-1 will be installed
---> Package mongodb-org-shell.x86_64 0:2.6.12-1 will be installed
---> Package mongodb-org-tools.x86_64 0:2.6.12-1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================
 Package                    Arch         Version    Repository  Size
====================================================================
Installing:
 mongodb-org                x86_64      2.6.12-1   mongodb     4.6 k
Installing for dependencies:
 mongodb-org-mongos         x86_64      2.6.12-1   mongodb     6.9 M
 mongodb-org-server         x86_64      2.6.12-1   mongodb     9.1 M
 mongodb-org-shell          x86_64      2.6.12-1   mongodb     4.3 M
 mongodb-org-tools          x86_64      2.6.12-1   mongodb     90 M

Transaction Summary
====================================================================
Install       5 Package(s)

Total download size: 110 M
Installed size: 279 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): mongodb-org-2.6.12-1.x86_64.rpm            | 4.6 kB     00:00
(2/5): mongodb-org-mongos-2.6.12-1.x86_64.rpm     | 6.9 MB     00:07
(3/5): mongodb-org-server-2.6.12-1.x86_64.rpm     | 9.1 MB     00:16
(4/5): mongodb-org-shell-2.6.12-1.x86_64.rpm      | 4.3 MB     00:05
(5/5): mongodb-org-tools-2.6.12-1.x86_64.rpm      |  90 MB     01:24
------------------------------------------------------------------------------
Total                                              968 kB/s | 110 MB     01:56
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : mongodb-org-server-2.6.12-1.x86_64          1/5
  Installing : mongodb-org-mongos-2.6.12-1.x86_64          2/5
  Installing : mongodb-org-tools-2.6.12-1.x86_64           3/5
  Installing : mongodb-org-shell-2.6.12-1.x86_64           4/5
  Installing : mongodb-org-2.6.12-1.x86_64                 5/5
  Verifying  : mongodb-org-shell-2.6.12-1.x86_64           1/5
  Verifying  : mongodb-org-tools-2.6.12-1.x86_64           2/5
  Verifying  : mongodb-org-mongos-2.6.12-1.x86_64          3/5
  Verifying  : mongodb-org-server-2.6.12-1.x86_64          4/5
  Verifying  : mongodb-org-2.6.12-1.x86_64                 5/5

Installed:
  mongodb-org.x86_64 0:2.6.12-1

Dependency Installed:
  mongodb-org-mongos.x86_64 0:2.6.12-1  mongodb-org-server.x86_64 0:2.6.12-1  mongodb-org-shell.x86_64 0:2.6.12-1  mongodb-org-tools.x86_64 0:2.6.12-1

Complete!
This completes the installation 


Verify the MongoDB installation 

[root@pract1 yum.repos.d]# rpm -ql mongodb-org-server
/etc/init.d/mongod
/etc/mongod.conf
/etc/sysconfig/mongod
/usr/bin/mongod
/usr/share/man/man1/mongod.1
/var/lib/mongo
/var/log/mongodb
/var/log/mongodb/mongod.log
/var/run/mongodb


Start the MongoDB service 

Lets check the mongod service if its stopped, start service with the command specified below
[root@pract1 yum.repos.d]# service mongod status
mongod is stopped
[root@pract1 yum.repos.d]# service mongod start
Starting mongod:                                           [  OK  ]
[root@pract1 yum.repos.d]#

Also will check the log if there is any errors reported. Here in the below log everything seems to be fine
[root@pract1 yum.repos.d]# cat /var/log/mongodb/mongod.log
2016-09-16T18:21:09.609+0530 ***** SERVER RESTARTED *****
2016-09-16T18:21:09.612+0530 [initandlisten] MongoDB starting : pid=2791 port=27017 dbpath=/var/lib/mongo 64-bit host=pract1.localdomain
2016-09-16T18:21:09.612+0530 [initandlisten] db version v2.6.12
2016-09-16T18:21:09.612+0530 [initandlisten] git version: d73c92b1c85703828b55c2916a5dd4ad46535f6a
2016-09-16T18:21:09.612+0530 [initandlisten] build info: Linux build5.ny.cbi.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2016-09-16T18:21:09.612+0530 [initandlisten] allocator: tcmalloc
2016-09-16T18:21:09.612+0530 [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1" }, processManagement: { fork: true, pidFilePath: "/var/run/mongodb/mongod.pid" }, storage: { dbPath: "/var/lib/mongo" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2016-09-16T18:21:09.637+0530 [initandlisten] journal dir=/var/lib/mongo/journal
2016-09-16T18:21:09.637+0530 [initandlisten] recover : no journal files present, no recovery needed
2016-09-16T18:21:09.785+0530 [initandlisten] allocating new ns file /var/lib/mongo/local.ns, filling with zeroes...
2016-09-16T18:21:10.037+0530 [FileAllocator] allocating new datafile /var/lib/mongo/local.0, filling with zeroes...
2016-09-16T18:21:10.037+0530 [FileAllocator] creating directory /var/lib/mongo/_tmp
2016-09-16T18:21:10.082+0530 [FileAllocator] done allocating datafile /var/lib/mongo/local.0, size: 64MB,  took 0.043 secs
2016-09-16T18:21:10.087+0530 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
2016-09-16T18:21:10.087+0530 [initandlisten]     added index to empty collection
2016-09-16T18:21:10.087+0530 [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0  reslen:37 301ms
2016-09-16T18:21:10.092+0530 [initandlisten] waiting for connections on port 27017
2016-09-16T18:22:09.793+0530 [clientcursormon] mem (MB) res:30 virt:456
2016-09-16T18:22:09.793+0530 [clientcursormon]  mapped (incl journal view):160
2016-09-16T18:22:09.793+0530 [clientcursormon]  connections:0


Perform Basic Tasks onMongoDB

In MongoDB we use mongo shell to connect to the MongoDB database. Goto mongo shell as specified below

[root@pract1 yum.repos.d]# mongo
MongoDB shell version: 2.6.12
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
        http://docs.mongodb.org/
Questions? Try the support group
        http://groups.google.com/group/mongodb-user
>
********** By default mongo connects to test database. So lets create a new database called mydb
> use mydb
switched to db mydb
>
********** Now create a collection in mydb
> db.createCollection("mycollection");
{ "ok" : 1 }
********** Get the list of collections in the database
> show collections
mycollection
system.indexes
********** We will insert some documents to the collection
> db.mycollection.insert({"Name":"Manjunath","City":"Bangalore"});
WriteResult({ "nInserted" : 1 })
*********** Lets check the documents stored in the collections
> db.mycollection.find();
{ "_id" : ObjectId("57dd17915aaf4c5e06833092"), "Name" : "Manjunath", "City" : "Bangalore" }
********** Below mentioned command will show the list of databases
> show dbs
admin  (empty)
local  0.078GB
mydb   0.078GB
test   0.078GB
>

Sunday 4 September 2016

Big Data and Hadoop Introduction

What is Big Data? Is it just a buzzword

When a volume of data that cannot be handled by a single server or machine, that is called as big data. Its the collection of large data sets that cannot be processed under traditional computing techniques. Gartner defines big data as follows (3Vs Definition)
"Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." Additionally, a new V "Veracity" is added by some organizations to describe it

  1. Volume:- Enterprise data is grows exponentially, preserving this large data set is a big challenge. Data sets could grow from terabytes to petabytes and from petabytes to exabytes. This huge amount of data refers to Volume in Big data.
  2. Velocity:- Every day large amount of data is getting generated. Rapid growth of data posses challenges while processing data. Large data sets has to process data or provide query results as quickly as possible
  3. Variety:- Various types of data that is being generated, lets consider social media where different kinds of data is being generated such as document, audio, videos, photos etc. Handling various kinds of data refers to variety in Big Data
  4. Veracity:- Its is the quality of data that has been gathers that may affect to provide accurate analysis

How it all began?

Google published a paper in the year 2004 on a process called MapReduce. The MapReduce concept provides a parallel processing model, that could process huge amounts of data. What MapReduce does is, it splits the queries and distributes across parallel nodes and processed in parallel (the Map step). The processed results are gathered and delivered (the Reduce step). An implementation of the MapReduce framework was adopted by an Apache open-source project named Hadoop

What is Hadoop?

Apache Hadoop is an open-source software framework for distributed storage and distributed processing. Its built on sets of computer clusters mostly commodity hardware to work on very large data sets. Apache hadoop includes distributed file system known as HDFS. HDFS splits the input and stroed the data on to the different nodes in the cluster and lets data to be processed in parellel. Data is processed in parallel that makes the system very fast and efficient

Core Modules of Hadoop

Apache Hadoop framework is composed of the following modules:

  1. Hadoop Common:- These are JAVA libraries and utilities needed by other Hadoop modules
  2. Hadoop Distributed File System (HDFS):- a distributed file-system that stores data on commodity hardware, providing very high bandwidth across the cluster
  3. Hadoop YARN:- YARN (Yet Another Resource Negotiator) is a resource management platform that is responsible for managing cluster resources in a Hadoop Cluster
  4. Hadoop MapReduce:- The framework that understands and assigns work to the nodes in a cluster. MapReduce program is used for large scale data processing

Advantage of Hadoop


  1. Scalablability:- New nodes can be added as needed and added without needing to change data formats
  2. Cost effective:- Hadoop brings massively parallel computing to commodity servers
  3. Flexible:- Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources
  4. Fault tolerant:- When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat

Monday 29 August 2016

What is NO SQL? Categories of NO SQL Database

Introduction

NO SQL refers to "not only sql". Its the non relational database technology. However there are some NO SQL databases that supports SQL language as well. No SQL databases are widely used where there are real time web applications such as Google, Amazon, Facebook etc.

Why to use No SQL?

  1. We use NO SQL to get some of the advantages such as
  2. Simpler design
  3. Easy horizontal scaling of machines
  4. Better control over availability
  5. Cost effective as commodity hardware is used 
  6. Better performance over relational database management systems
In order to gain these advantages you will have to compromise with the consistency 

Understanding CAP Theorem

The CAP theorem, also named Brewer's theorem which states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  • Consistency (all nodes see the same data at the same time)
  • Availability (every request receives a response about whether it succeeded or failed)
  • Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)


We can run NO SQL databases on single server or in multiple commodity servers. It employs distributed architecture with salient features like 
  • Commodity Servers are used in many nosql databases
  • Commodity servers put together to run as single system
  • Provides redundant storage
  • Provides geographic distribution
  • It avoids having single point of failure i.e. outage on single system will bring the whole system down

Categories of NO SQL

  • Key value store
  • Columnar
  • Document Store
  • Graph Database
Relational Database:- Its a database model where data is organised in the form of rows and columns with unique key identifying each row or tuple. Some of the popular relational databases are Oracle, SQL server, etc. 

Key-Value Store:- Fundamental data model used in key-value pairs are associative array(map or dictionary) where data is represented as collection of key-value pairs. This model can be extended to a discretely ordered model that maintains keys in lexicographic order. Extension is computationally powerful and can efficiently retrieve selective key ranges. Some of the popular databases in this category includes Memcache, Radis etc

Column- Oriented database:- These database work by creating collections of one or more key/value pairs that match the record. It doesn't need pre-structured table to work with data. Records that come in the form of single or multiple columns having information. Each column of every record can be different.

Document Store:- These are the database stores where data is stored in the form of documents that are usually in the form of JSON/BSON etc. Document posses the unique key that represents each of the document stored in the database. There are various ways to organize these documents such as collections, tags etc. 

Graph Database:- These databases are designed to store the relative data where they can be represented in the form of graph. Lets consider social networking, person x is married to person y where as person x is cousin for person z, also person z is friend of person x. Other example representation of Graph data would be public transport links, road maps or network topologies.