Author Archives: Omar Alsaeed

Uploading Images on Mobile – Security Issues and Considerations

Uploading Images on Mobile – Security Issues and Considerations

What does uploading images on mobile mean?

Uploading pictures may refer to the process in which an individual picture is selected and uploaded to a certain platform for a particular reason, for example, a user uploading a Facebook image. However, uploading images on mobile also refers to  an automatic process in which multiple images are uploaded periodically to a fixed location (for examples, see this wiki page on automatic Android image upload techniques and resources). Moreover, these locations generally tend to be cloud-based systems and platforms.

 

Overview

In this post we will be viewing the necessity of uploading images and keeping them safe, in your mobile devices. Moreover, the reasons that present us with the need to upload them in the first place. Then, we will view the process on various Operating Systems and the different use cases and respective libraries. Finally, finishing with some security concerns and brief closing thoughts.

 

Why?

We can easily understand how important is it to upload pictures to a safe place once you read the following concerns. Basically, the problem is that images can be lost easily if not backed up or duplicated in a safe place that won’t likely crash/get hacked. Thus, every image should be uploaded to the operating system of the device used, which means that an automated process should be introduced to avoid the redundancy that would occur if the user is to upload each image separately after taking – it for example.

This brings the need for using a library that would perform this task efficiently, a library like Volley or Retrofit 2 or any other library, in respect to your operating system.

 

Android

This is the most common OS for Smartphones all over the world, which means that it is very likely that you will face a malfunction or a malicious object that will endanger your images, at some point in your device’s lifespan.

Which means that you should be securing those images ahead, which can be done, for example, using Retrofit 2. It is one of the most popular network libraries among Android devices, where its server application uses Php to get the posted file from an android client application and store it in a folder on the server. Android Volley can be also used as an effective tool for uploading images to the server.

The Glide can be also used to download pictures for Android devices, in contrast to uploading them.

 

IOS

Having less fear of viruses doesn’t mean that things can’t be simply lost. IOS users are common clients for various libraries and cloud platforms securing their images wisely. Some use it for sharing images with other users in a network or for maintaining the resolution of the image in a safe place up on the network.

Alamofire can be used or any other library or platform to execute the process of uploading the images in bulks to the server, using PHP probably.

 

Windows

This system still represents a great target for hackers and people who might be considering stealing your content and even blackmailing you for it later.

Users can utilize the FileUpload or jQuery for their images. It is usually used on Computer devices with this Operating System, which makes its means more computer-based oriented.

 

For more details on the different methods for different Operating Systems, check this page.

Security Concerns

We all know that today’s world requires double checking for every process that includes any piece of valuable info, especially if it is personal. This can be handled easily with the presence of libraries, apps, and platforms that enables the user to upload with ease once set up.

 

Some might consider or question the accessibility of the storage center to third parties under certain circumstances. This might decrease or increase the user’s trust in the method and the application used.

Others say that securing personal information is definitely necessary, and protecting them in a safe place is a must.

 

Best practices are by always keeping several copies of any important information, on multiple devices or platforms, to maintain the quality of the image and the privacy of it.

 

Did you know that you can also back search your photos?

Which is when you have an image and you want to know its origins or nature of objects in it, you back search the image from your own mobile device using Google, Bing, or CTRLQ.org. Apps can be also used if wanted, Veracity or Reversee can be used by IOS users, search by image is a free option for Android users as wells.

 

For more info, and details for different Operating Systems, check this page.

 

Closing Thoughts

We saw where does the need to upload images to the operating systems of mobile devices essential, especially Android. This shows us that we need a library that would complete the process on its own. We viewed the process within different OSs, the libraries that can be used in them and the different use cases in them. Finally, I presented some possible security issues.

I think it is clear that images should be particularly safe and protected, uploaded to safe places.

 

References

Take a Photo and Upload it on Mobile Phones with HTML5

http://www.codepool.biz/take-a-photo-and-upload-it-on-mobile-phones-with-html5.html

 

How to Do a Reverse Image Search From Your Phone

https://www.pcmag.com/article2/0,2817,2492468,00.asp

 

Image Upload in Mobile

https://cloudinary.com/visualweb/display/IMMC/Image+Upload+in+Mobile

What is ETL and Is It Dead Or Alive?

Source:
www.informatec.com

Overview

What is ETL?

ETL is a process in data warehousing that is responsible for pulling data out of the source systems and placing it into a data warehouse. This process includes the performance of three tasks: Extracting the data, transforming the data, and then loading the data. We will see a detailed description of them later.

 

In this post, I will be discussing the concept of ETL, the process of ETL, with an explanation of the three phases, and speaking about the relevance that ETL has in today’s world. To get started with ETL, see this ETL wiki, which includes a collection of resources to get you acquainted with basic ETL and data integration concepts.

 

Data must be properly formatted and normalized in order to be loaded into these types of data storage systems, and ETL is used as shorthand to describe the three stages of preparing data. ETL also describes the commercial software category that automates the three processes.

 

ETL Process

In ETL, the following three tasks are performed:

 

Extract:

the extraction process is the first phase of ELT, in which the desired data is identified and extracted from one or more sources, including database systems and applications. The data is held in temporary storage where the subsequent two phases can be executed. Depending on the source system’s capabilities (for example, operating system resources), some transformations may take place during this extraction process. During extraction, validation rules are applied to test whether data has expected values essential to the data warehouse. Data that fails the validation is rejected and further processed to discover why it failed validation and remediate if possible. The data is extracted from source systems (SAP, ERP, other operational systems), data from different source systems is then converted into one consolidated data warehouse format which is ready for transformation processing.

 

Transform:

After data is extracted, it has to be physically transported to the target system or to an intermediate system for further processing. Transformations can include different processes like date clearing, filtering, and formatting, resorting rows or columns of data, joining data from two values into one, conversely, splitting data from one value into two, or, applying any kind of simple or complex data validation. The importance here comes, of course, in the scale of the data, which makes scalability essential in ETL.

 

Load:

The load phase moves the transformed data into the permanent, target database. It loads data into data warehouses or data repositories other reporting applications as well.

 

Once loaded, the ETL process is complete, although many companies perform ETL regularly in order to keep the data warehouse up to date.

 

The three phases of each ETL operation – extract, transform, and load – are usually tightly coupled together. As noted earlier, the data pipeline typically exists virtually on the memory of the machine performing the ETL, so each load from source to destination typically runs in one contiguous operation. However, ETL processes are typically grouped together into logical units and executed either in sequence or in parallel.

 

What is the relevance of ETL today?

When creating a data warehouse, it is common for data from disparate sources to be brought together in one place so that it can be analyzed for patterns and insights. It would be great if data from all these sources had a compatible schema from the outset, but this is rare. ETL takes data that is heterogeneous and makes it homogeneous. Without ETL it would be impossible to analyze heterogeneous data in an automated method and derive business intelligence from it.

I will talk about the technology changes that are driving the emergence of ELT, and provides a comparison of ELT vs. ETL.

Because of ETL, analysts, business users, and data scientists could interact and query the data without disturbing business operations. More importantly, most ETL tools

eliminated the need to code: They provide a drag, drop, click, and configure

interface which breaks the transformation steps into distinct “stages”. ETL

specialists could build step-wise routines that were easy to trace and debug

without the need to understand code written by engineers.

 

ETL vs. ELT

Both methods are efficient and are easily performed, but ELT is better in my opinion, as you will see why now.

 

Some think that the use of ELT over ETL would have a reduced cost compared to ETL, for example, it will have reduced time-to-market for changes and new initiatives, as SQL deployments take much less time than traditional code. Moreover, there is no need for a dedicated ETL infrastructure, which saves the company some cash as well.

Furthermore, ELT utilizes cloud-based databases better, as processing steps undertaken during off-hours are not billed as CPU hours. Finally, there is more flexibility in maintaining copies of operational data for audit purposes, and in storing intermediate processes as tables (Reusability, traceability etc.).

 

ETL can be contrasted with ELT (Extract, Load, Transform)  which transfers raw data from a source server to a data warehouse on a target server and then prepares the information for downstream uses. Moreover, many people think that ETL is outdated and ELT is now the relevant method in data integration.

 

For an insight on the topic, check this article.

 

Closing Thoughts

In conclusion, we saw that moving to an ELT paradigm has cost advantages that are directly measurable, but coupling an elimination of the intermediary ETL tool needs

changes to processes and a careful rethink of how analytics is delivered in any company.

 

References

ETL Process

http://datawarehouse4u.info/ETL-process.html

What is Extract, Transform, Load (ETL)?

https://www.informatica.com/services-and-training/glossary-of-terms/extract-transform-load-definition.html#fbid=RjHRT3238g7

Overview of ETL

https://docs.oracle.com/cd/B19306_01/server.102/b14223/ettover.htm

ETL vs. ELT

https://blog.holistics.io/etl-vs-elt-how-elt-is-changing-the-bi-landscape/

What is AWS EBS?

What is AWS EBS?

Overview

What is Amazon  EBS?

Amazon Elastic Block Store is an Amazon platform that provides block-level storage that can be attached to Amazon EC2 instances. It is a Storage Area Network, or SAN, in the cloud.

 

Before you dive in and for more details, check out this blog post with an introduction to AWS EBS and some information on how to manage EBS volumes.

 

In this post, I will be discussing the concept of Amazon EBS, the way it works, some of its common uses, and presenting some notes on its performance and security.

We will be seeing how efficient and useful is this tool, and how it can solve multiple issues for companies with large scales of data, with the width of its use cases.

 

How does it work?

It is a simple process, you can create a volume (that can be from 1GB to 1TB of size) and then mount it on a device (ex. dev, sdj) on an instance, format it, and you would be good to go! You can perform several things on it, such as detaching it, and connecting it to a different instance, you can also snapshot the volume to AWS S3, restore it later to a different volume.

For prices of the EBS Magnetic volumes, AWS charges $0.05 per GB/month and $0.05 per 1 million I/O requests. Other options are available, for different use cases, of course.

 

For more info, check the official documentation.

 

Common Use Cases

RAID

This is used when the customer wants to achieve a higher network throughput with a much better IOP. In a scenario where the company would configure a software level RAID array. It is supported by most OSs and is used to boost the IOPS and network throughput of a volume.

Pre-Warming

It is a term for pre-initializing, which is a function that helps to achieve high speeds of throughput when a new Volume is accessed. It is an action to be done before accessing the block storage to avoid the dramatic growth of latency of an I/O operation that occurs the first time each block is accessed.

Increasing EBS Volume Size while in Use

Since the system is elastic, which means that the size, and the IOPS can be modified while working, without an issue.

To modify the instance size, the customer should stop the instance, make a snapshot of the volume, create a new volume, and attach it there. Amazon released in a tweet that an EBS volume size and type can be modified now while in an in-use state, which reduces the workload that can occur from extending attributes of a volume.

CloudWatch Events for AWS EBS

AWS automatically provides data as instance metrics and volume status checks via Amazon CloudWatch. This a state notification system that can be used to monitor the EBS volumes. It has several state notification events for volumes, such as: OK, Warning, Impaired, and Insufficient Data. A customer can configure CloudWatch alarms to initiate an SNS notification if a state change occurs.

 

For more uses and/or info on each of them, check this blog post.

 

Performance and Security

Reliability

An EBS volume has redundancy built in it, in other words, it won’t fail if an individual drive fails, or if an error occurs in in other places. This is less reliable, still, compared to the S3’s way, which stores multiple copies of the same data in different places. This means that snapshotting the data to S3 is essential on the long run, as the data will be securely saved there. But the EBS system still stores replications of the data in different Availability Zones, but as we said, within the same instance.

Performance

EBS volumes are network-attached disk storage and therefore occupy a section of the instance’s overall bandwidth. The performance exceeds the expectations, with high sequential transfer rates, reaching 120 MB per sec. Moreover, this rate can be increased by mounting several EBS volumes on one instance, which hypothetically gives an unlimited transfer rate!

Security

Encryption and access control policies deliver a strong defense-in-depth security strategy for the data stored in the system. AWS also offers a sophisticated system of encryption for the data at rest (boot and data volumes) using Amazon-managed keys or keys customers create through Amazon Key Management Service, or KMS.

 

For more info, check the official description.

 

Closing Thoughts

In conclusion, EBS is amazingly simple and practical when it comes to comparison with the size of complex options it opens. Using traditional forms of hosting seems too old to use with the existence of this powerful cloud. This tool represents a solution for many companies that need this availability, speed, security, and flexibility in the platform they are using.

 

References

Amazon’s EBS Explained

https://www.rightscale.com/blog/cloud-industry-insights/amazons-elastic-block-store-explained

5 Functions of EBS Volumes You’re Not Using

https://cloud.netapp.com/blog/ebs-volumes-5-lesser-known-functions

6 AWS Cloud Storage Cost Use Cases: Part 1

https://www.cloudyn.com/blog/not-only-s3-6-aws-storage-cost-use-cases-part-1/

Storage Options in the AWS Cloud: Use Cases

https://media.amazonwebservices.com/AWS_Storage_Use_Cases.pdf

What Data Enrichment Means and Who Uses It

What Data Enrichment Means and Who Uses It

Overview

What is Data Enrichment?

Data Enrichment is a process used to enhance, refine or otherwise improve raw data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It also shows the common imperative of proactively using this data in various ways.

 

This post will discuss the concept of Data Enrichment, its relevance to the world, the people that should be thinking about it, viewing the different use cases, and presenting five possible tools for it.

 

Why is Data Enrichment relevant? And who needs it?

Data Enrichment is a term that started becoming essential lately in any company’s plan for success. As data has become abundant in our world, the problem that companies are facing isn’t the absence of data, but having low-quality data, that doesn’t help them much in planning for the future. Here is where Data Enrichment kicks in, as it elevates the quality of the already-existing databases, to ensure a better plan for the future customers, simply by understanding the current customers more.

 

Answering the second part of the question is easier, as the answer is simple… Every company who wants to succeed in the market!

This comes from the fact that companies that appreciate the feedback from their customers tend to do better in the market, compared to companies that have low-quality data on their customers and their opinions. It is simple logic, having a higher quality of the data of today means a greater understanding and more accurate predictions for tomorrow.

 

The Use Cases

Companies can, and should, use Data Enrichment for numerous reasons. Some of which will be introduced here.

Shorter lead capture forms:
The use of a smart tool, or platform, to shorten the forms a visitor to a site should fill, enhances the visitor’s experience, and therefore, result in more visitors to the site.

More Personalization:

Customers tend to have better reviews for online websites and sellers in general, when they feel talked to personally, rather than generally. This point is really important, as DE is a strong tool that provides the grounds for this move, as it personalizes the data of the entries, to enable the company to personalize its next campaigns, ads, and requests.

Enabling Machine Learning technology:

The world of today offers a great, and simplistic, tool that can perform sophisticated processes, without any manpower or effort.Artificial Intelligence can acquire data about the customers in an enhanced, more personal, method, using an intelligent device.

 

For more info on the use cases, check this list.

 

Tools used for Data Enrichment

LeadSpace

This powerful tool combines two features in one platform: lead scoring and data enrichment. It collects info on customers by scanning websites, social media, and so on, creating a rich database, that needs a few touches to be ready; phone numbers, and email addresses. It is a powerful platform that companies can rely on with confidence, as its accuracy is great. It also leaves space for the company to segment out a specific group among its customers, with flexibility.

 

Lusha

Approaching the game differently, Lusha offers a direct approach to enriching a company’s salesforce lead, by integrating and enriching social profiles, achieving minimal research time and larger responses rates.

It integrates with numerous apps and platforms and is endorsed by specialists from big companies like T-mobile and IBM.

 

InsideView

This tool performs three possible processes to databases. Data Cleanse, in which it deletes abundant and outdated data in the already-existing databases, and then fill them with updated data as it comes. Another one is Lead Enrichment, which fills the data fields that are left off. Thirdly, it provides Customer Data Validation, which verifies the data of the entries, and drops those that are inaccurate from the profiles of the customers.

 

LeadGenius

Another typical name in the market, it is known for being powerful in data verification and enrichment. This tool can operate by filling in holes that were left out by other platforms, in other words, it integrates well with other platforms a company might be operating on. It connects to the company’s email marketing automation and makes the outbound process more efficient and less pricey than inbound, by contacting the right leads and providing a real-time response record.

A plus for this platform is the unlimited upper limit for the number of users, which means that companies can scale up their databases without fear!

 

RingLead

Duplicate records, degraded prospect data, reporting inaccuracies, and account mismanagement are all common afflictions that degrade the value of a company’s data. RingLead is an excellent tool for companies that are affected by those issues, it can “heal” and deal with massive databases. It automatically imports prospects from LinkedIn!

 

Closing Thoughts

Viewing all of this shows how important is Data Enrichment as a fundamental stage for companies’ successes. I discussed the meaning of the term, its relevance, and importance to people, especially Marketing managers, and presented several use cases for it. Moreover, we saw five of the most powerful and innovative tools that perform this process neatly and smartly.

Whether a company is trying to better understand its past campaigns, failed or successful, or to get a more accurate prediction of the future ones, the company needs Data Enrichment. This is due to the importance high-quality databases have, simply because they represent the customers and what do they actually want.

The world of data is entering a new level, where quantity is not the problem anymore, but the quality!

 

References

Data Enrichment

https://www.techopedia.com/definition/28037/data-enrichment

4 Data Enrichment Tools for Lead Generation

http://technologyadvice.com/blog/sales/data-enrichment-tools-lead-generation/

8 Use Cases for Data Enrichment

https://www.reachforce.com/blog/8-use-cases-for-enriched-data/

What is Data Enrichment?

https://blog.cvmsolutions.com/what-is-data-enrichment

The Data Warehouse Explained: Benefits, Challenges, and Predictions

The Data Warehouse Explained: Benefits, Challenges, and Predictions

Overview
What is a Data Warehouse?
A Data Warehouse, or DW, is a system used for reporting and data analysis and is considered a core component of business intelligence. They store current, future, and old data for the user, to produce, later, analytical reports that are based on them. They are essential for any enterprise, as they act as a log that is fundamental in creating an enhanced future path for the company. This article will provide an overview of the data warehouse concept, review its main processes and use cases, mention some benefits and challenges of the data warehouse, and suggest a few cool predictions for data warehouses in 2018.

Main Processes & Use Cases
There are different versions and updates of data warehouses coming to the data industry every day. I am here to cover a few concepts related to data warehousing. To be clearer and more focused, we will view the main data warehouse concepts: BI, ETL and Analytics tools.
BI stands for Business Intelligence, and it is a tool used to make fact-based and insightful decisions that can enhance the client’s company’s performance. The data can be read from different sources, not exclusive to a data warehouse, to generate a report that brings the data to life.

ETL is known as the process in which the data is extracted from the data source and brought into the data warehouse. It stands for Extraction, Transformation, and Loading.
Analytics tools have three different types of analytics – Descriptive, Predictive, and Perspective analytics. Each of the different types offers a different kind of information for the company.
The three types can be used differently and for distinct purposes. A company might wish to get a visualization of their data, in which case they can use Charito – being powerful and richly usable for data pros. The company might want to keep all their data in one place and get a descriptive report, in which case they can use Blendo – a tool that uses ETL to consolidate all the data in one place and analyzes it in an innovative way.
Benefits & Challenges
Data warehouses have advanced a lot since the usage of the cloud system as a platform for some applications. The shift meant that companies can rely on external platforms to not only store and manage their data for them but to also read, analyze and organize their data to generate reports that can warn the company for a coming downfall or suggest a path for improving the company’s performance. Some of the benefits can be: Improved communications within the company itself, upgraded security, lowering the cost of data storage and maintenance and an increase in the revenues; that was affected by the evaluations done by Cloud Data Warehouses.
Some of the challenges that might come along using a data warehouse are:
Facing errors when combining inconsistent data from disparate sources.
Data quality challenges can come from duplicates, logic conflicts, and missing data.
Poor data quality can also generate faulty reports that might lead companies to take unproductive moves suggested by the reports.
All of those challenges can be spotted, recognized and fixed with an efficient system of data management and a supervised self-evaluation system.

Predictions
The future of technology is understandably hard to predict, but that doesn’t mean that viewers can’t give an insight into how might the future look like for data warehouses.
Over the past years, data warehouses have been revolutionized repeatedly, especially with the Cloud services being adopted by many platforms. New trends are mainly focused on Artificial Intelligence and Machine Learning, as they seem to be promising and fruitful. This can be groundbreaking for many companies, as this suggests that Data Warehouse platforms might be able to provide live dashboards in the future, helping businesses to monitor their performance at every second.

For more info check this blog.

Moreover, the Machine Learning part means that there will be algorithms created, to perform the analysis on the data supplied, but this algorithm will advance and learn from its mistakes. Furthermore, it can be customized to each company, for example, an algorithm can be fed with the historical records of the company with all the past data, so the algorithm can see what works for this company and what doesn’t.
On the other hand, working on Predictive analytics seems to be a top priority for some platforms and their clients. Enhancing the way we understand the present and the past, to be able to predict with higher precision and reliability for the future. This can have a big impact on the market in general, as companies would better understand their customers and use that to deliver better services where needed.

For a cool perspective, check this blog post!

Closing Thoughts
Data Warehousing is an essential step in the process of the development of any company around the globe. It reads and analyzes data, with massive scales, to build a deeper understanding of the company’s performance, past mistakes, and achievements and utilize them to generate a reliable and precise model that makes safe predictions for the future, and suggests new improvements in the company’s work. It has different types: BI, ETL and Analytical tools, each with its own mechanism and uses.
Data Warehousing has valuable benefits that include: better communications, higher security measures, and lower budgets for data management and storage. Some of the challenges might occur with inconsistent data, duplicates, logical contradictions, and poor-quality data. However, the future of DW seems to be bright, with lots of technologies being embedded in the modern platforms, including AI, machine learning, and predictive analytics.
Companies will realize, in the close future, the necessity of cloud data warehousing, as it simplifies the whole process and allows companies to use their resources in a more productive way.

References
An evaluation of the Challenges of Multilingualism in Data Warehouse Development
http://www.scitepress.org/Papers/2016/58584/index.html

Types of Analytics: descriptive, predictive, perspective analytics.
https://www.dezyre.com/article/types-of-analytics-descriptive-predictive-prescriptive-analytics/209
Top 10 Tools for a Dangerously Effective Data Stack in 2018
https://blog.panoply.io/the-top-10-tools-for-a-dangerously-effective-data-stack-in-2018
Enterprises Eye Big Benefits from Cloud Data Warehouse
https://blog.panoply.io/enterprises-eye-big-benefits-from-cloud-data-warehouses
7 Challenges to Consider when Building a Data Warehouse
http://www.onapproach.com/7-challenges-consider-building-data-warehouse/

What Is AWS S3 and How to Synchronize Data to S3?

What Is AWS S3 and How to Synchronize Data to S3?

 

Overview

What is AWS S3?

Amazon Web Services is a simple and efficient electronic storage platform. It targets developers, with web-scale computations brought in ease between their hands.

Amazon S3 has a self-guiding interface that allows the user to swiftly navigate through its web services. Its services are designed for data storage and recovery with an unlimited amount used, at any time, and from anywhere on the network. It offers any developer the usage of the same services that Amazon uses to run its own global network of websites, given with access to the same reliable and reasonably-priced data storage infrastructure that is designed to deal with large scales of data using a minimal period of time.

 

In this article, I will discuss the AWS S3, with the options available for syncing data to S3, steps for syncing using the CLI, and also for using the NetApp Cloud service as a tool to sync files to an S3 bucket, where companies can use the AWS S3 data sync features to transfer any NFSv3 or CIFS file share to the S3 bucket.

 

What is it used for?

As firms and companies of today’s world need to be capable of collecting, storing and analyzing their data, which are potentially of unparalleled scales, in a secure and simple process, they, therefore, need AWS. For more details, see the official documentation.

It was specifically built for developers and for this kind of lengthy processes. AWS S3 is used for an unmatched durability, reliability, and scalability, as it runs on the world’s largest global cloud infrastructure, to deliver 99.9% durability for the customer. It provides the most comprehensive security and compliance capabilities, due to the fact that it is the only cloud storage platform that supports three different forms of encryption for its stored data, at the same time.

Briefly, organizations can migrate all of their bulky, yet important, databases to S3, negating the need for an on-premise storage. It can be also used to store and distribute static web content, separately, since each stored object has its own URL. Moreover, it can be used for statistical purposes, where companies can sync their data to S3 and then sync it back to an on-premise system, to be able to gain a statistical report on their data.

 

Overview of the options available for syncing data to S3

There are several methods used to sync data from an on-premise system or any other cloud system to S3, and vice versa. The method to be followed is a matter of choice, as it also depends on the amount of data that needs to be synced. For smaller amounts of data, less than a petabyte, three methods can be followed. First of which is the Command Line Interface, or the CLS, which will be explained thoroughly later. The second one is the Import/Export method, which is a suitable option for 16 terabytes of data or less. The third one is the Storage Gateway, which is a hybrid system of cloud storage, where the user keeps a copy of the data on their on-premise systems.

 

Steps for uploading files to AWS S3 using the CLI method

Following this method requires setting up an IAM user, before installing the AWS CLI on the device. Then, one can perform all sorts of actions on the data, using the CLI commands. The user has an option of writing their own scripts for backing up files and retrieving them from the cloud. The process will be explained simply, in three separate steps.

Step 1:

Creating an AWS IAM user.

In this step, the user shall create a user account with administrative permission. The user can follow the simplistic AWS management console’s steps, ending up by saving the Credentials file, as it will be needed in Step 3.

Step 2:

Install and Configure the AWS CLI.

After acquiring an IAM user, the customer needs to download and configure the CLI. A little bit different, the steps would be for different OS systems. But, the process basically includes downloading and installing the installer, then configuring the AWS within the system through the command line.

Step 3:

Using the AWS CLI with Amazon S3.

In this step, the user will create a bucket in S3 and fill it with data from a file or a database. The bucket will act as a container to the file(s) the user will put in this bucket, more like a folder. Then, this bucket can be accessed, renamed, modified or deleted through command line instructions.

 

For more details on using CLI commands with S3, see the official documentation.

 

Syncing data to S3 using the NetApp Cloud Sync service

As mentioned earlier, AWS uses three different forms of encryption for the data they manage the transfer of, with high reliability and tight security measures that are covered with the company’s firewalls to ensure the safety of the data.

NetApp cloud sync is a tool that can be used to sync data to S3. Using this method, companies can sync their data from any NFSv3 or CIFS file share to an S3 bucket.

 

Steps of syncing:

First, we need to ensure that the three main components in this process are defined and configured, they are:

The NFS Server, The Data Broker, The S3 bucket.

Step 1:

Configure the NFS server, while choosing the IP address or a hostname that the Data Broker can identify and access.

Step 2:

This step concerns identifying the Data Broker, which is the main application responsible for the process of the data transfer between the NFS volume and the S3 bucket. It needs to have access to the NFS server, so it can read and/or write data.

Step 3:

After installing the system on the local device, the installer will establish all the prerequisites and run the services on the device. After this, the user needs to select the NFS share from the server that will be synced.

 

Performance & Security

AWS S3 is evidently a fast data transfer system, which is one of the major benefits of the Cloud sync system.

This is due to the parallelization of work that is embedded within the system. It walks through directories and transmits files in parallel. Moreover, Cloud sync keeps a catalog of the data uploaded, and would spot any changes, or files that are not synced, and would sync them automatically, syncing only the changed data, to ensure a fast and efficient process.

Administrators can run scripts through the CLI to schedule a regular sync or specify a section of the data, that they wish to be synced, and can make the process completely automated, by scheduling a regular automatic sync. Which puts the S3 at the top of the list of methods for transferring large data sets from an on-premise to a cloud, as it takes significantly less time than the other available methods.

On the other hand, S3 is strongly secure. The high security of the data transfer provided by AWS comes from the fact that the data flow is performed completely by the services carried out by Data Broker. Which means that the Cloud Sync is only responsible for supervising the transfer, without having actual access to the data itself. This ensures the safety of the data and the inability for an outsider to break through the system. This is because the data stays within the company’s security system at all time. Finally, all the communications with S3 are executed on the secure API, provided by Amazon.

 

Closing Thoughts

Companies and Organizations with massive structures of databases would definitely rethink their data strategies after AWS S3. This service represents an easy and simplistic process of data storage on a Cloud with a solid performance, high speed, and tight security measures. Moreover, utilizing the NetApp Cloud sync can give an advantage to the user with this flexible, reasonably priced system.

 

References

Amazon, What is Amazon S3?

http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

Amazon, Uploading files to S3

https://aws.amazon.com/getting-started/tutorials/backup-to-s3-cli/

Amazon, S3 Product details

https://aws.amazon.com/s3/details/

Amazon S3 CLI documentation:

http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html

AWS S3 data sync with NetApp:
https://cloud.netapp.com/blog/synchronizing-data-to-s3-cloud-sync