What is AWS Batch?

No one likes to wait. That’s particularly true when it comes to batch processing jobs for Big Data projects like genomic research, building an airplane and materials for safety, and the massive requirements for data processing related to health and financial records.

For computer scientists, developers, engineers, or anyone who needs to run a batch processing job, the needs are even greater. Because of the massive data needs -- often at petabyte scale -- the jobs often need to be queued for processing and determined by the compute resources for that local, on-premise data center. An example of this might be a simulation to determine the safety of a new material to be used in a future car.

There are many variables -- the impact on the material, the temperature, and the speed of the driver not to mention the chemical properties of the material itself. It’s an extraordinary Big Data effort, but there are also time-to-market considerations and project timelines. 

Fortunately, with the advent of cloud computing services, there isn’t the same restriction in terms of waiting for the compute resources to become free enough to run batch processing jobs. AWS Batch allows companies, research institutions, universities, or any entity with massive data processing needs to run batch processing jobs without the typical on-premise restrictions.

Batch processing refers to a computing operation that runs multiple compute requests without the need for the user to initiate another process. The name comes from the early days of computing when end-users had to initiate every computing process one by one. With batch processing, you can queue the requests for processing and then allow the service to do the heavy lifting in terms of scheduling the requests, adjusting compute performance, and allocating the memory and storage need to run the batch jobs. And, you can schedule multiple batch processing jobs to run concurrently, tapping into the true power of cloud computing.

Since this scheduling occurs automatically between AWS Batch and the related Amazon services you need -- such as Amazon EC2 (Elastic Cloud Compute) -- there is no need to configure any software for IT management or processing. AWS Batch coordinates the IT services you need for the project at hand without further intervention from the user.

For those with heavy demands for data processing, this allows the staff to focus more on the actual project management and business requirements, the results of the computations, queuing up more batch processing jobs, and analyzing the results and making decisions about what to do next. AWS Batch provides all of the necessary frameworks to do the batch processing.

Benefits of AWS Batch

A side benefit to using AWS for batch processing with AWS Batch is you can take advantage of Spot Instances, a service included with Amazon EC2. Spot Instances are unused compute resources that are lower in cost and available for batch processing instead of on-demand services. This cost savings comes into play as Spot Instances become available. In the end, it means great savings for all batch processing -- and configured automatically for you.

Because of how the cloud storage, performance, memory, and infrastructure and servers are all automated according to the batch processing requirements, and because the end-user doesn’t need to configure any of those compute resources, AWS Batch helps simplify the entire Big Data endeavor, especially in terms of coordination across AWS. That is often the hardest and most time-consuming part of a Big Data project because the scientists and engineers who run the batch processing project are not necessarily experts in infrastructure or IT service management.

They don’t need to know about memory allocations, storage arrays, server configuration, or how these components inside a data center all work in tandem to produce the desired results.

Another benefit has to do with costs. When companies don’t have to manage and configure the compute environment for batch processing, they don’t have to take the time and expense needed to make sure it is all up and running 24x7 and they don’t have to purchase any of the equipment. Instead, AWS Batch automatically allocates the exact compute resources you need for that project, and you pay only for the compute resources you actually use. This is true for every batch processing job including the concurrent jobs you might run.

Not only does a company avoid the management chores and costs of running an on-premise data center, but they don’t have to coordinate the various services needed for batch processing. An example of this might be a massive genomic research project for drug discovery.

A pharmaceutical might start out with basic needs for batch processing using a minimal amount of storage, but normally as the project intensifies and the processing needs increase, the project might stall out as the company coordinates the various services, such as storage, networking, endpoint security, or memory allocations. There’s a cost savings in not having to manage those services, add them and maintain them, or making sure they are secure for all batch processing jobs.

Posted in Uncategorised

What is Amazon Athena?

The answers companies need from their data can sometimes be elusive. We live in an age where data is in great abundance, especially with the expansion into cloud storage. But the tools to analyze and process that data are not always easy to use, overly accessible, or even that effective. The problem? Data has to reside somewhere, and most companies have to think about how it is stored, who will access it, how to make that secure, and most importantly how to make data access reliable and fast.

That’s where Amazon Athena can help. It’s a query service in that companies are able to run SQL queries against their data as though it resides in a local data center. It’s serverless in that you don’t have to manage the infrastructure at all or use database software to manage it. And, it’s extremely fast. Your staff can run SQL queries and expect results even on large datasets in a matter of seconds.

To use Amazon Athena, the data is first housed on Amazon S3 (Simple Storage Service), which is an object storage service that runs in the cloud. Amazon S3 is what makes the data accessible and safe to use, while Amazon Athena is the query service that provides the power to derive the results you need from the data. This means you don't need to concern yourself about designing databases.

One way to think about Athena? It’s somewhat similar to a Google search. You know the data is out there, but it’s often hard to find the data sets you actually need. A query is similar to a Google search in that you create the parameters for the SQL query you need to perform. The difference here is that you're using cloud computing services instead of a search engine.

This is not something that requires setup or configuration, which is typically the case with a local data store and can involve an ETL (Extract, Transform, Load) which prepares data in a database for a query by isolating the dataset. Instead, your query can run without using ETL and therefore simplifies the process -- you run the query from an easy-to-use web console. You point to your data in S3, configure the schema, and start the query.

One example of how this might work involves a retail company that sells a large number of products with thousands or even million os SKUs (stock-keeping units). A company might want to know if there are SKUs that should be retired. Normally, this might require preparing a complex ETL to configure and prep the data for SQL queries. Because of how the object storage works within Amazon S3 and because of the integration without other Amazon Web Services (such as AWS Glue Data Catalog), the queries work without any prep.

This means companies can run a point-of-sale transactional query like the one related to retired SKUs or perform other queries faster and with better results.

Benefits of Amazon Athena

As with most Amazon Web Services, the major benefit to using Amazon Athena is that it provides great flexibility in how you run queries without the added complexity. One example of this is with a pharmaceutical company using the cloud for genomic research. Your staff might decide to run multiple queries against the data set, but normally each one requires setup and configuration to create a cloud database that can accept the queries. With Athena, the staff can run multiple concurrent queries all at the same time but trust the results will be clean and accessible within seconds. These actionable results from queries will mean that companies have access to clean, reliable data to make better decisions and continue their research.

Another advantage to Athena related to this is a lower cost. Companies don’t have to manage the footprint required for the datasets, so if they do run multiple queries or need to make decisions related to a vast treasure trove of data, they don’t have to first improve the IT infrastructure or configure their data storage to handle the higher number of requests. Athena expands and retracts performance variables as needed for the queries at hand.

As mentioned earlier, Athena is flexible enough to handle a variety of tasks related to database queries. It runs standard SQL and supports standard data formats such as CSV, JSON, ORC, Avro, and Parquet. Athena uses Presto -- an open-source SQL query engine -- with ANSI SQL support, so it is not a proprietary query service users will have to learn from the ground up. Athena lets you run quick SQL queries but also supports more complex joins and arrays.

In the end, the power comes into play with Amazon Athena because it runs within Amazon S3, so all of the benefits of that object storage platform for your database carry over to Athena in terms of reducing complexity, providing the endpoint security and performance needed, and allowing companies to run multiple queries without having to manage or configure the infrastructure. Companies can focus more on the actual queries and results, not on the platform itself.

Posted in Uncategorised

What is Amazon Aurora?

For large companies or those with massive data analytics needs, there are times when basic cloud computing services just won’t help. The open-source options might be too unreliable or not fast enough, the on-premise alternatives require too much maintenance, or there are just too many complex variables for an internal IT staff to worry about.

Amazon Aurora is a cutting edge relational database that was built for the cloud and has the computing prowess to keep up with the most performance-driven data analytics projects. While a normal cloud database can run using open source options (including the one from Amazon called RDS or the Relational Database Service), Aurora is a major leap forward because it is essentially an enterprise-grade relational database that runs in the cloud, yet it still provides the same intuitive interface of Amazon RDS (and in fact runs on top of RDS).

A relational database for enterprise use is a different beast from a normal relational database. The tables are far more complex, but most importantly there is a need for the exceptional speed, reliability, and security that Aurora provides. A pharmaceutical company may be creating a new prescription drug and there’s a need to develop it quickly. A government entity may be doing Big Data analytics on a new citywide infrastructure change, such as replacing bridges. An automaker may need to run analytics on the materials used in a new electric vehicle that will need to meet government standards yet be light enough for a better MPG rating.

One thing is clear: The needs are much greater than for normal cloud computing services. In some cases, a company may have a need for up to 64 TB of data storage per database instance or for continuous backup of all data which means there is little margin for error. The reliability needs might be for 99.99% up-time availability. When the Big Data project is related to new drug discovery, the safety of human drivers in a new car, or related to bridges in a city, compromise is not an option.

Interestingly, while the Amazon Aurora service is enterprise-grade in terms of performance, scaling, reliability, and security, it is not enterprise-grade in terms of cost. Companies pay a fraction of the cost for this service compared to what they would pay for an on-premise solution or for a competing product that requires a minimum number of instances.

In terms of speed, Amazon stats that Aurora is up to five times faster than a normal MySQL or PostgreSQL database instance but it one-tenth the cost.

Benefits of Amazon Aurora

Even with all of the power and performance, the three key benefits to using Aurora are related to simplicity, cost, and security. As mentioned, Aurora runs on top of Amazon RDS so it is the same web interface you might already be using. The heavy lifting and complexity when it comes to an enterprise-grade database in the cloud is usually related to the provisioning, maintenance, scaling, patching, backups, and updating that’s required, yet RDS handles all of that. For your staff, the initial setup looks and functions similar to an open-source database on RDS.

And, the database instances are all self-healing, auto-scaling, and fault-tolerant thanks to the connection between Aurora and Amazon S3 (Simple Storage Service), the object storage platform that works in tandem with the enterprise relational database instances.

Cost plays an important role here because normally scaling up your Big Data project would require an enormous investment in the infrastructure. With Amazon Aurora, it’s possible to add up to 15 read replicas per instance simply by choosing that option. There is no infrastructure management, planning or development involved to achieve this high performance throughout. As you scale up, Amazon S3 also scales to meet the storage needs, up to 64 TB per instance.

Scaling down is just as important -- companies don’t lose the investment they made to handle the biggest projects while it sits idle waiting for the next massive deployment.

Endpoint security is a critical component of any Big Data project, especially in the age of data breaches and exposed user information that is often sold on the Dark Web. If a company like Ford is experimenting with Big Data projects with materials or components inside a new and unannounced vehicle, and the data hacked and exposed, it can be a major setback.

Aurora uses technologies like network isolation, encryption at rest using key encryption, and encryption during data transmission using SSL. It’s also important to note that, since Amazon Aurora uses S3 for storage, that service is also highly secure -- the underlying data used for the Big Data project is archived automatically in the same cluster. There is little opportunity for data leaks when the database itself and the storage are so closely linked.

Posted in Uncategorised

What is Amazon EMR?

For any business that needs to analyze reams of data, there are several complex infrastructure challenges to consider. Massive amounts of data require massive storage, server performance has to be optimal, and there’s an array of networking and security concerns.

Fortunately, Amazon EMR (also known as Amazon Elastic MapReduce) is a service that can help with Big Data analysis needs for companies of all sizes. More than just about any other Amazon service, EMR is closely linked to other platforms to help with Big Data analytics, including Amazon EC2 (Elastic Compute Cloud) for renting virtual servers and Amazon Amazon Simple Storage Service (S3) for object storage. These products all work in tandem to help companies with the infrastructure and platform needs to run Big Data projects.

Imagine the alternative. For a company that is conducting genomic research, analyze traffic data for a city, building a vast machine learning initiative using artificial intelligence to analyze business data across a large company, the infrastructure would have to be deployed in a way that can handle all of that Big Data analytics -- the servers required, the online storage, the networking, and the security to deploy the frameworks you need to run the Big Data project.

Instead, EMR runs as a cloud computing service for deploying the frameworks without the related local, on-premise infrastructure management and deployment. With the ability to deploy Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto as services in the cloud, companies can then rely on EC2 and S3 to provide the autoscaling you need as the Big Data project evolves and gathers more and more data.

This is the real headache most companies face when analyzing a large amount of data. It’s often two simultaneous headaches. First, there are the project and business requirements (meaning, the reason the project is being conducted int he first place), the coding needed to make it all happen, the reporting and analysis deliverables, and all of the related project variables. That’s complex enough. Second, companies then also have to build the infrastructure required to handle a project of that magnitude. Known as Petabyte-scale computing, it is a double-edged sword because it’s often true that the data scientists and programmers developing the actual Big Data project are not necessarily experts in IT infrastructure.

Like many Amazon Web Services, Amazon EMR runs as a service you manage remotely and auto-scales to meet your needs, so there is little to know management involved.

Benefits of Amazon EMR

Some of the most important benefits to consider with EMR are related to cost and reducing complexity. In terms of cost, as mentioned previously, there is no need to build your own clusters in a local data center that’s on-premise. A compute cluster that runs on EMR can cost as little as 15-cents per hour for 10 nodes. Companies pay at a rate of “per instance” which means you are not paying for the actual infrastructure to sit idle. There is a minimum charge of only one minute, and of course, companies can then scale up from there as they analyze more and more data and benefit from adding additional nodes.

That scaling is important because it means companies don’t have to make plans to retrofit an existing infrastructure. As Big Data needs change and evolve to meet business requirements, you can add dozens of additional clusters and nodes or even thousands. An example of this might be a pharmaceutical company that decides to analyze genomic data for drug discovery. The company starts with one product line and one project meant for genomic discovery, but then adds additional projects on more clusters to aid in the drug discovery.

As for reducing complexity, it’s possible to install and configure an EMR cluster in a few minutes. There is no provisioning, setup, or configuration -- which is amazing considering what is normally required to configure a Big Data cluster running on an Apache Hadoop framework. This means data engineers and scientists and even non-programmers in a company can start using EMR without knowing about back-end infrastructure management.

A final word about security: While data breaches are increasingly common and hard to predict or avoid, the benefit of EMR is that all security issues are handled by the service itself -- including server-side encryption, virtual private cloud access, and firewall configuration. This endpoint security happens “behind the scenes” and is part of EMR even for the most basic clusters.

In the end, Amazon EMR is intended to help companies focus on what they do best -- build the actual project and launch their reporting and analysis tools in the cloud without the inevitable infrastructure problems related to scaling up a project. Companies can work more on the actual deliverables for the analysis and reporting you need, not the back-end.

Posted in Uncategorised

What is Amazon Lightsail?

When it comes to cloud computing, there are two main considerations. One is cloud storage -- the place to keep all of your files in a secure environment. The other is related to the infrastructure, which is how you run the applications and cloud databases for those apps.

Amazon Lightsail provides virtual servers that run in the cloud. The product complements Amazon EC2 (Amazon Elastic Compute Cloud) for storage. There are dozens and dozens of other Amazon services and products, many of them part of Amazon Web Services (or AWS). However, Lightsail is the product that companies rely on for running applications without the complexity of running your own servers in a data center.

As such, Amazon Lightsail is a powerful product that can change how you do business. Because it is a virtual server, there are no concerns about scaling up or down as your needs change, no complex configurations to worry about, and no performance bottlenecks that result from poor planning and scaling in your own data center with your own servers.

Lightsail can help companies refocus efforts on the actual application development, business requirements, customer support, and other important tasks to help increase revenue and reach more customers instead of the computing environment you use to run those apps. To say cloud computing and virtual servers have changed technology might even be an understatement because popular apps we all use every day would likely not have materialized.

Benefits of Amazon Lightsail

All of the benefits of cloud computing apply to Amazon Lightsail, and perhaps more so because it is a robust product that provides a host of features. One of the biggest benefits to Lightsall is that it is easy to use, IT service staff can start and execute a new virtual server for your apps in minutes as opposed to days or weeks (as they build the required on-premise infrastructure). Apps can be pre-configured for Windows or Linux as a stack and the management console looks like a business dashboard you might use to run financial reports.

The way the pricing model works for cloud hosting is also an advantage. There is one clear price for the virtual server according to the memory allocation you choose, the number of processors, the application storage needed, and the transfer speed. For example, Amazon lists the price for a virtual server with 512 MB of memory, one processor, 20 GB of SSD storage, and 1 TB data transfer speed as costing $3.50 per month. For twice the amount of memory and data transfer speed, the cost is $5.00 per month, scaling up from there to a max of $160 per month (that’s for a whopping 32 GB of memory, an 8 core processor, 640 GB of SSD storage, and a 7 TB transfer speed). If there’s ever been a way to easily understand why cloud computing allows you to easily scale for growth, according to those pay levels, this is the best example.

Beyond that, Amazon Lightsail is all about reducing management complexity. Your staff doesn’t have to configure or maintain the network, any of the endpoint security features, user access, or even the server itself, how it runs, the performance level, and when you need to close it down.

Another advantage has to do with extensibility. Often, a company will start with one or two virtual servers that run on Lightsail and configure extra cloud storage but then expand into other service offerings. For example, a company might start with a business app that doesn’t handle any financial transactions, and then decide to use an additional service in AWS that extends the capability of its existing app. Or they decide to connect multiple apps together and need services that do that database syncing between virtual servers.

Mostly, companies sometimes experience quick growth with apps that suddenly become much more popular, either with external customers or internal corporate apps. The infrastructure you use for this might not be able to keep pace, but with Amazon Lightsail that’s not an issue. As your needs change, the app requirements scale up, you add more services and more virtual servers, it is often a simple uptick in a few features within the console to adjust, making the infrastructure management easy and simple.

This is a boon for companies that want to create innovative new products without having to think as much about the computing environment you will use. It removes one of the biggest hurdles to fast growth, which often involves improving compute resources.

In the end, Amazon Lightsail provides the computing platform companies need to be successful and grow quickly or to scale down services and try new applications that meet the needs of customers in an ever-changing tech climate. It doesn’t create the typical restriction of “living with what you have” and not ever-adapting to what your customers actually want and need.

Posted in Uncategorised

Why the face detection on the 2020 Subaru Legacy Limited is so amazing

In a matter of seconds, the 2020 Subaru Legacy Limited will recognize you and adjust the seat position, mirrors, climate settings, and even the last screen you used on the display.

I know this because, after jumping in the car several times over a week, I never had to think about the saved settings. When I asked a friend to jump in, the Legacy didn’t recognize him. Like a phone that senses who you are, the Legacy provides access to your favorite settings.

In the future, it won’t let anyone else drive the car.

That’s the ultimate goal for the autonomous driving future we’ll all expecting when cars can detect us, start driving where we want them to go, read our schedules and emails, and act as an automated agent of service to bring us where we need to go on time.

DriverFocus

For now, I liked how the Legacy works so fast. A camera detects your face – using the same system I wrote about in the Subaru Forrester a while ago called DriverFocus. Up to five users can save their settings, and it’s just a simple matter of tapping in your name. I was able to swap out my name and delete others, almost like I was using an iPad.

The Legacy sedan uses a massive new 11.6-inch nav screen, where I added myself as a user. The split screen can show Apple CarPlay or Android Auto in one portion with the climate settings on the lower half. When I jumped in to drive, I noticed my last setting for the seat heater and temp level were the same. Many cars don’t do this -- they disable the seat heater, for example, for every time you drive so you have to enable it again.

Mirrors and seat position are typically another annoyance, not really for those who drive their own car but when you share a car. Every time you drive you to have to make adjustments to the mirrors even if the seat can use a custom user setting (with the buttons on the door).

You also have to know how to save the seat settings, usually a long press on the 'set' button. It’s surprising how many people I know who don’t realize there is a saved seat option.

With the Legacy, they don’t have to know. Once you add yourself as a user (or for the non-technical folks, have someone do that for you), then there’s nothing else to do. The car recognizes you and you start driving. I ended up not even noticing the adjustments.

What's next...

My thoughts drifted to how this will work in the future. Once the car recognizes you and, as mentioned earlier, calls up your calendar and email, it might know you need to go to a meeting across town and even find the route you like best. It could send off a quick email about when you will arrive.

And, eventually, it might even know that you typically like to do a quick video chat with your kids at the same time every day -- in short, once cars recognize us, they can customize every setting, where we drive, and what we do.

This will involve musical taste, routing, and even how the car drives. Today the Legacy doesn’t adjust itself in terms of a sports mode or extra traction control, but I know that would come in handy. Cars might even know us better than we know ourselves.

On The Road is TechRadar's regular look at the futuristic tech in today's hottest cars. John Brandon, a journalist who's been writing about cars for 12 years, puts a new car and its cutting-edge tech through the paces every week. One goal: To find out which new technologies will lead us to fully self-driving cars.

Posted in Uncategorised

What is Amazon RDS?

John Brandon explains how businesses can use AWS RDS for managing relational cloud databases.


One of the most mature and well-known cloud computing products is called Amazon RDS (Relational Database Service). Launched in 2009, the service is designed to store relational database instances in the cloud for access by applications and as a way to provide relational databases for analytics, reporting, and business dashboards within those apps.

To understand RDS, it’s important to take a step back and define a relational database. The concept was invented way back in 1970, and it’s essentially a way to store data in a format that is more useful and streamlined. A relational database consists of tables that can be interrelated -- the data is structured in a tabular form and each piece of data can use an identifier. Typically this involves a unique key, although as you can imagine, a database with multiple tables and thousands of rows with unique identifiers quickly becomes complex.

Because of the complexity of a large relational database, there are performance considerations to think about -- and other factors including scaling the relational database, accessing it from anywhere using any app, the security needed to protect the data, and the IT infrastructure required to support the database. Companies tend to use multiple relational databases as well.

That’s where Amazon RDS comes into play. Because RDS runs in the cloud as a service (and is part of Amazon Web Services or AWS), it provides both flexibility and scale. This helps companies as they grow, expand to other areas and provide additional services to customers. You can scale your business apps and the data without having to scale your infrastructure.

Benefits of RDS

Many of the benefits of using RDS are similar to the benefits of using the cloud. This includes the flexibility of where the relational database is stored, first and foremost. A company might offer a customer-facing application used on a smartphone or other mobile device, or an internal application for a large company that runs on an internal website. Users might need to access the relational database from a variety of devices, from many different locations, but the cloud makes the data accessible in a way that makes it seem like the data center is sitting right next to you.

That’s because Amazon RDS has one key advantage -- your IT staff do not need to manage it, or even have to get to grips with database design software, and the RDS can scale as your needs change. One example of how this works is when you compare a typical server in a data center to one in the RDS cloud. You might purchase a server with a set amount of memory, storage, and performance. Then, when you build an application that accesses the database on that server, you are stuck with the allocations you selected (or you have to manage them and adjust them). With RDS, the entire infrastructure is “serverless” in that it can adjust to your needs -- whether that is the performance needed, the memory, or the storage. And, this is something that can be automated for your database.

This is an important benefit for companies that may start out with only a few apps and a smaller relational database. Most companies expect to grow but some don’t know how to scale for the growth from a technical standpoint. Suddenly, a customer-facing app for iPhone or an Android app becomes incredibly popular or an internally designed and developed mobile app changes to meet the needs of quick expansion.

Important things to know

RDS is flexible in terms of scaling and how your users can access the databases from apps, but also in how you can use it. For example, you can use database software you already know and use to manage relational databases, including MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server. As your needs change, you can even scale up to a different service such as Amazon Aurora which provides even faster performance. Amazon RDS is part of a host of complimentary cloud computing services from AWS as well, such as EC2 for file storage.

As mentioned earlier, your staff doesn’t need to do all of the management chores, which can include the configurations needed, the adjustments for more performance or storage, the backups, and the security required to make sure your data is safe.

One last thing to know is about availability. In addition to the other benefits of Amazon RDS, including the scaling and configuration, your apps can reliably access the data -- it is always available because there is also a secondary instance of the relational database that is hosted in RDS in the event that the first instance should fail for any reason.

In the end, Amazon RDS mitigates against the main concern companies have when they develop apps in the first place, either for external use or for internal use. A relational database helps with performance and availability, but companies don’t have to constantly adjust their IT operations or even hire more staff to keep up with unexpected growth.

Posted in Uncategorised

How the cameras in the 2020 Chevy Equinox create some surprising external views

Camera technology in cars is starting to evolve. What started as a simple back-up camera (now required in all new cars) has evolved into an all-seeing, roving eye that stitches together images from multiple angles.

In a recent test of the 2020 Chevy Equinox, I was pleasantly surprised at how the camera automatically showed a front-facing view of the car, stitched together side views to help me pull into parking stalls, and even created a view of the car from behind.

2020 Chevy Equinox

First, about that forward-facing camera. That’s not common or something I’ve noticed in any recent car test. As you pull away from your driveway in the morning or from a parking spot, the camera view changes from rear-view to front-facing, so you can keep an eye out for anything that might be in the road.

In my case, I managed to spot some small but potentially hazardous snow mounds at the end of my driveway (since winter has arrived here already). Glancing at the front-facing cam, I pulled slightly away from the mounds.

Not that they can cause too much damage in a crossover, which is raised up off the ground compared to a typical sedan. However, you never know if crunching over one can manage to hit the front fender. It’s happened to me before and caused damage with cars I own. Back-up cams are helpful, but a front camera was unique and helpful in different ways. I could see someone using the front camera as they pull into a tight parking spot.

The best angle, any time

In terms of the side angle views, that’s also helpful for parking and a bit of a conversation starter. In the Equinox, you can press the camera app icon and then switch views easily as you drive or as you sit idle. With these multiple views, and the ability to easily switch between then, I was able to park a little closer to a curb when I parked by a pizza place. It meant the crossover didn’t stick out into traffic as much, and my wife could exit to the sidewalk and avoid the slush.

2020 Chevy Equinox

I was most impressed with the read view of the car. It’s not really a miracle of engineering but it is pretty astounding to those who don’t know what is happening. Of course, it is simply a 3D render of the car -- there isn’t a drone tracking you from behind and showing live video. However, the stitched camera views are live and help you see all around the vehicle.

It’s also helpful because you can access the camera at any time, not just at low speeds. The front camera only shows up for a few seconds automatically, but you can press the camera icon and access the live views at any time. An interesting side note here is that, if you do spend too much time clicking on camera icons and changing the view angle, a warning message appears telling you to pay attention to the road. (I asked a passenger to fiddle with the settings.)

Looking to the future

In the future, cameras will be everywhere – pointing around the car as they do now on the Equinox but also available to stream in the car from a road-side camera, perhaps one that is just up ahead and shows traffic congestion levels. And, we really will deploy a follow-me drone at some point to show an overhead view of the car, although the biggest challenge there is not technical but regulatory. (I’ve tested follow-me drones in cars on closed roads already, including one that took pictures and video. It was a blast but potentially not too safe for other drivers.)

One thing that will definitely improve is automation. The car will automatically show congested traffic if it is relevant to your route and your typical preferences and patterns. Cameras will work like the front-facing camera that helped me avoid a snowbank.

2020 Chevy Equinox

On The Road is TechRadar's regular look at the futuristic tech in today's hottest cars. John Brandon, a journalist who's been writing about cars for 12 years, puts a new car and its cutting-edge tech through the paces every week. One goal: To find out which new technologies will lead us to fully self-driving cars.

Posted in Uncategorised

What is Amazon SES?

As companies grow, they sometimes rely on methods that don’t make sense anymore. One example of that is how businesses process outgoing emails from email clients. Distinguishable from normal web-based email, these transactional emails include the monthly newsletter you might send customers or the reminders an app sends to inform customers about a support issue.

Small companies certainly can still use Google Gmail to queue up their email communication and even schedule emails using third-party apps. However, this doesn’t work when you are sending thousands of emails per month because of the complexity of the infrastructure.

Often, this complexity is due to how a business needs to scale up or down as the business changes. It’s related to the business apps you use that send out emails on a routine basis and the marketing campaigns you conduct -- all of which involve more than the simple act of sending emails and includes analysis and tracking, reporting, and a way to respond to messages.

Enter Amazon SES, which stands for Simple Email Service. The service is designed for marketing campaigns, company communication, web application transactional emails, and any other activity that involves sending emails to customers, partners, or internally.

For those who already use Amazon EC2 (Amazon Elastic Compute Cloud) for hosting applications on the serverless cloud computing platform, you can send up to 62,000 emails per month for free. After that, the service works in a pay-as-you-go model with a low fee per thousand emails. The basic idea with Amazon SES is to provide a complement to your existing IT infrastructure that allows companies to focus on the content of the emails and not on the infrastructure that’s required to process and analyze them.

And, due to the scalability of the cloud, there are no concerns over storing email content, performance related to how the transactional emails are sent, or issues with the back-end analytics and reporting you might need to do after sending the emails.

Benefits of Amazon SES

Email is a fact of life for many businesses, even as some have moved into the age of Slack and social media. It’s expected that any customer-facing application will communicate with a user by sending official emails that explain new features, notify them about security concerns, or provide a way for customers to provide feedback and obtain support. Email is the well-known, official channel for communication for apps, and it’s often used for marketing services and campaigns because of how companies can track the success of the campaigns.

The problem is that email is also a complex endeavor for companies that need to send thousands and thousands of messages per month from multiple apps, for official company business, and as part of marketing efforts. It’s complex in part because of the massive number of messages being transmitted but also due to the strain email can place on platforms.

There are concerns over reliably sending the messages, compliance with email marketing regulations, and dealing with the incoming email deluge.

Fortunately, Amazon SES has the back-end infrastructure to keep up with the flow. It uses cutting edge content filtering techniques, reputation management features to guard against any issues with regulatory compliance (avoiding being labeled as spam, since customers opt-in to receive the messages), and a vast array of analytics and reporting functions. Amazon SES is a console app that admins can manage and configure for the business needs.

Dealing with incoming email is also not a Herculean undertaking. Amazon SES can store incoming messages in an Amazon EC2 bucket, and companies can then use AWS Lambda to process the email using custom code. For example, if your company uses an app to send out an email about new features routinely, and customers reply with questions, you can run a report and create a new email that answers the questions -- for example, by using another service called Amazon SNS (Simple Notification Service) that looks for keywords.

The most important benefit here is related to cost. As mentioned earlier, Amazon SES uses a pay as you go, model, so even after companies that use EC2 process the free tier of email sends, the costs are extremely low for a business that process thousands of more emails per month or even into the tens fo thousands. The cost is about 10 cents per thousand emails.

This scale is where cloud computing is a major asset for companies that might experience quick growth as they add new services or offer more customer-facing apps. As you host each app in the cloud, you can then rely on Amazon SES to handle all of the email processing for you. There’s no “gotcha” as you grow and expand services and offerings.

In the end, every company will continue to process email for customers as they accepted and reliable form of communication and as a method for dealing with transactional emails sent from an app (both from mobile or on the web). It’s a technology that is here to stay.

Posted in Uncategorised

What is AWS Fargate?

For any company planning to use cloud services to develop a consumer app (or a larger company rolling out an internal app), it can be hard to predict how many people will start using it. Will your social media app suddenly become the next Instagram or Snapchat? Or will it catch on with a niche audience that uses it every day, all day long? In both scenarios, the tech challenge is to make sure it is always available, the data is clean and organized, and there are no security concerns.

While it sounds like a sci-fi television show, AWS Fargate is a powerful and serverless compute engine for running containers used for applications. In cloud computing parlance, a container is an isolated instance that provides the flexible computing power you need. It runs in the cloud, which means there is no infrastructure management to be concerned about. Most importantly, if your application is widely used internally or externally, AWS Fargate can keep up with the demand.

Basics of AWS Fargate

To understand what AWS Fargate is and what it does (and the benefits), it’s best to start at the beginning. Amazon introduced the cloud compute engine back in 2017. However, the compute infrastructure was already mature by then since Amazon EC2 (Amazon Elastic Compute Cloud) had already existed for over ten years. IT service management, administrators, and web developers were asking for more flexibility in how they use the “elastic” storage, how they manage containers, and how they configure the serverless environments for their apps. Their needs were ever-changing.

Think about the typical consumer app. Once it catches on with users, there is a great need for improving performance and managing endpoint security. Data breaches can occur in a heartbeat, and when they do and consumer data is compromised, it can be incredibly costly. It only takes one data breach with an app that exposes customer data like credit card numbers and date of birth before the reputation of a firm is forever altered. And, it only takes a break in the performance of a popular app or a few hours of down-time before customers start finding the exit.

It’s become clear that in the modern landscape of consumer apps, business apps, or internal apps used by employees that the expectation is for extremely robust performance, security, and availability. If an app depends on an IT staff to scale the performance, or maintain security, or keep up with cloud storage demands -- even in a semi-automated fashion -- the customer suffers.

It’s a curious fact of modern cloud computing initiatives that the landscape keeps evolving -- user demands keep changing, new regulations in certain industries keep evolving. It’s hard to keep up, and even harder if you have to manage the computing environment itself. That’s why AWS Fargate exists -- it makes it easier to keep up with application demands for app development.

Benefits and purpose

Back in 2017 when Fargate was first introduced, the process of cloud hosting an application (in a serverless environment) was a bit more complicated. It typically involved managing the clusters, optimizing them, choosing the instances and tweaking the settings. Fargate does all of this such that the person setting up the container only has to choose the performance and memory requirements, set the task definitions, and configure the networking policies.

The idea with AWS Fargate is to simplify the use of the Amazon Elastic Container Service, which is used to manage containers and is the heart of Amazon AWS for application delivery (and arguably the heart of a cloud computing initiative). Each container can isolate an application, making sure it is secure and runs at the highest level of performance.

The reason companies use AWS Fargate is partly related to performance needs and partly related to cost. The key feature with AWS Fargate to know is that it isolates the application. This means if the compute needs increase dramatically, the application can scale to meet demand. Also, companies pay only for the actual compute power they use.

The “pay as you go” model was not always available, and that’s a key advantage to AWS Fargate, especially for companies that run multiple apps and services. It’s almost impossible to keep up with the changing demands when you have apps that serve only minor purposes for your industry or customer segment, such as making a connection to another app.

In the end, the purpose of AWS Fargate is to keep up with the demand. Admins can see the analytics and reporting necessary to manage containers effectively. Apps are isolated from one another in containers, so admins can adjust compute needs as needed for each app. This also helps with security, because there is no data-sharing between containers. 

The flexibility, affordability, security, and performance of AWS Fargate is what makes it ideal for companies that need to keep up with consumer demand. 

Posted in Uncategorised

What is AWS Glue?

It's only fair to share...Share on RedditShare on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on Tumblr

Managing data is a full-time job for some (quite literally). Especially at a larger company, there may be requests to run an analytics report, move data from one repository to another, or even create “clean data” for an important new web application. In terms of data management, cloud computing services provide extreme flexibility in what you can do with data reporting, and there are quite a few tools available to help, especially for Amazon Web Services (or AWS).

AWS Glue is one of those data and cloud storage management tools. It’s known as a managed ETL, which means it is used to Extract, Transform, and Load data in preparation for reporting and analytics. AWS Glue is a data catalog for storing metadata in a central repository. It’s a way to automate ETL so that you point AWS Glue to the data that’s stored within AWS. The data becomes searchable and queryable for any of the reporting and cloud analytics you need to use.

It’s helpful to understand ETL before diving into AWS Glue and the benefits of using it. ETL is how data management employees at a company blend data so that it can be used for a query. There might be multiple data stores and multiple cloud databases, but the ETL readies the data without having to move any data stores. ETL essentially preps the data so that it is ready for analytics and reporting, as opposed to the alternative which is to actually move the data, isolate it, and then run queries in preparation for any analytics or reporting.

AWS Glue is the tool that generates ETL code for programming languages Scala or Python. Essentially, once you generate the catalog data, you can then perform searches and queries on the data using cloud computing tools such as Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum, all designed to help companies store and use data in applications. AWS Glue also works with Virtual Private Cloud (Amazon VPC) on Amazon EC2.

To understand what AWS Glue is, it’s helpful to understand how it works. For starters, data management employees, developers, and data scientists can use AWS Management Console to register the data sources. After crawling the data the ETL will then create catalogs using classifiers like JSON, CSV, and Parquet. Employees will then select a source for the ETL and generate the code needed for the reporting and analytics. Finally, the ETL can schedule recurring jobs and to prep the data for tools like AWS Lambda.

AWS Glue benefits

The main advantage of AWS Glue is flexibility. Many companies now use a data lake that contains a wealth of structured and unstructured data. In the past, companies were forced to move the data into a new repository, to endlessly manage the data, and to worry about the servers and infrastructure needed for their apps. Speaking of a fulltime job! That was a complicated time period in the history of Information Technology, all prior to the cloud.

With AWS Glue, there’s no need for a server on-premise (since it is all serverless and runs as a managed ETL) or even your own data center, your own local data management stores, or a dedicated employee who manages the data. Instead, AWS Glue is the glue that ties together disparate data and makes it ready and available for queries.

AWS Glue is also highly automated. It can crawl disparate data sources, identify the formats, and suggest how to use the data. Once AWS Glue does all of this, it can then generate the code you need for any data queries, transformations, or processes.

An important distinction to make here is that AWS Glue does all of its ETL processing in the cloud. That means employees don’t have to do any of the data management and prep that is often required to run ETL, such as managing endpoint security, configuring the data beforehand, moving the data to the right repository, or any of the more complicated steps such as configuring the data stores, managing storage, and configuring servers.

AWS Glue removes much of the headache involved with preparing data for analysis. Known as “heavy lifting” in the industry, it is the chore of making structured or unstructured data ready for queries. With AWS Glue, that is not needed. All of the discovery, cleansing, enriching, and moving of the data occurs behind the scenes as part of the ETL, making everything much easier for IT service management.

Because the cloud is so flexible, and there are so many different data stores, web applications, and business needs for reporting and analytics, AWS Glue helps bring some sanity to the data exploration process — without having to do any of the back-end work first. It’s powerful in that it saves time and effort, and yet the queries can be repeatable and automated.

It's only fair to share...Share on RedditShare on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on Tumblr
Posted in Uncategorised

What is AWS Glue?

It's only fair to share...Share on RedditShare on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on Tumblr

Managing data is a full-time job for some (quite literally). Especially at a larger company, there may be requests to run an analytics report, move data from one repository to another, or even create “clean data” for an important new web application. In terms of data management, cloud computing services provide extreme flexibility in what you can do with data reporting, and there are quite a few tools available to help, especially for Amazon Web Services (or AWS).

AWS Glue is one of those data and cloud storage management tools. It’s known as a managed ETL, which means it is used to Extract, Transform, and Load data in preparation for reporting and analytics. AWS Glue is a data catalog for storing metadata in a central repository. It’s a way to automate ETL so that you point AWS Glue to the data that’s stored within AWS. The data becomes searchable and queryable for any of the reporting and cloud analytics you need to use.

It’s helpful to understand ETL before diving into AWS Glue and the benefits of using it. ETL is how data management employees at a company blend data so that it can be used for a query. There might be multiple data stores and multiple cloud databases, but the ETL readies the data without having to move any data stores. ETL essentially preps the data so that it is ready for analytics and reporting, as opposed to the alternative which is to actually move the data, isolate it, and then run queries in preparation for any analytics or reporting.

AWS Glue is the tool that generates ETL code for programming languages Scala or Python. Essentially, once you generate the catalog data, you can then perform searches and queries on the data using cloud computing tools such as Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum, all designed to help companies store and use data in applications. AWS Glue also works with Virtual Private Cloud (Amazon VPC) on Amazon EC2.

To understand what AWS Glue is, it’s helpful to understand how it works. For starters, data management employees, developers, and data scientists can use AWS Management Console to register the data sources. After crawling the data the ETL will then create catalogs using classifiers like JSON, CSV, and Parquet. Employees will then select a source for the ETL and generate the code needed for the reporting and analytics. Finally, the ETL can schedule recurring jobs and to prep the data for tools like AWS Lambda.

AWS Glue benefits

The main advantage of AWS Glue is flexibility. Many companies now use a data lake that contains a wealth of structured and unstructured data. In the past, companies were forced to move the data into a new repository, to endlessly manage the data, and to worry about the servers and infrastructure needed for their apps. Speaking of a fulltime job! That was a complicated time period in the history of Information Technology, all prior to the cloud.

With AWS Glue, there’s no need for a server on-premise (since it is all serverless and runs as a managed ETL) or even your own data center, your own local data management stores, or a dedicated employee who manages the data. Instead, AWS Glue is the glue that ties together disparate data and makes it ready and available for queries.

AWS Glue is also highly automated. It can crawl disparate data sources, identify the formats, and suggest how to use the data. Once AWS Glue does all of this, it can then generate the code you need for any data queries, transformations, or processes.

An important distinction to make here is that AWS Glue does all of its ETL processing in the cloud. That means employees don’t have to do any of the data management and prep that is often required to run ETL, such as managing endpoint security, configuring the data beforehand, moving the data to the right repository, or any of the more complicated steps such as configuring the data stores, managing storage, and configuring servers.

AWS Glue removes much of the headache involved with preparing data for analysis. Known as “heavy lifting” in the industry, it is the chore of making structured or unstructured data ready for queries. With AWS Glue, that is not needed. All of the discovery, cleansing, enriching, and moving of the data occurs behind the scenes as part of the ETL, making everything much easier for IT service management.

Because the cloud is so flexible, and there are so many different data stores, web applications, and business needs for reporting and analytics, AWS Glue helps bring some sanity to the data exploration process — without having to do any of the back-end work first. It’s powerful in that it saves time and effort, and yet the queries can be repeatable and automated.

It's only fair to share...Share on RedditShare on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on Tumblr

What is AWS CLI?

Here’s a curious fact of modern computing. For those who have grown up with modern interfaces such as Windows 10 and Apple iOS 11, the idea of a “command line interface” is a bit peculiar. We’re so accustomed to seeing, clicking, and executing that any kind of textual interface (other than actually texting on a phone) is lost on large masses of people.

That’s not true when it comes to cloud computing services, such as Amazon Web Services and the Command Line Interface (or AWS CLI). In fact, for anyone who has worked in a data center, is comfortable with Linux, creates web applications, or remembers the early days of MS-DOS, a command line is not a crippled, old school system but rather a powerful way to control and execute commands.

Technically speaking, the CLI is a downloadable app you use to control AWS functions. That means for Windows users you will need to install and run the 32-bit or 64-bit version of the CLI. On a Mac or Linux distro, you will need to use Python 2.6.5 or higher and install using pip.

After that, the CLI is a terminal service in that it looks like the MS-DOS command line and has scripts you can execute, which saves time and effort. You can start with the basics by typing a help command in the CLI which walks you through what you can type and why.

One key to understanding how CLI works and why you can benefit from using it as an organization is to understand that AWS is a comprehensive cloud computing environment. There are many components, and it’s not always easy to understand how they all work (and how they work together). A CLI helps technical staff execute commands without having to review every feature and function or use a graphical interface.

This means if you have a team to deal only with cloud storage, and you spell out the commands available and what they do, those team members can execute those commands and use the scripting variables required, without having to worry about any other features within AWS.

Benefits of AWS CLI

The true power of the CLI is in its simplicity. As with any command line interface, you can issue commands that are powerful, reduce complexity (ironically because it is not all spelled out in a potentially confusing GUI), and save time with your cloud management and working with your cloud computing infrastructure. It is intended for IT employees, programmers, and other technical folks at your company to be able to perform cloud computing tasks without taking as many steps.

The technical team at most companies are almost always busy with multiple tasks that all seem urgent. They are managing client devices, installing security patches, reconfiguring networking, and dealing with printers and copiers that are not working for a specific department. In short, they are running around like the proverbial headless chickens. AWS CLI provides a way to execute scripts such as looking at Amazon S3 storage instances, triggering backups, performing recursive uploads and downloads, viewing buckets, and inspecting services. At the same time, CLI also lets the technical staff configure AWS itself.

Some of these tasks can be performed within the AW GUI, although not every command is available and some might not be as easy to find, execute or use repeatedly in an automated way. Your technical staff will learn quickly that the CLI is meant to make their jobs easier.

Examples of usage

As with any CLI, there are scripts you can use to trigger certain activities. One of the best ways to understand how the CLI works and why it is so beneficial is to consider how it helps administer the AWS S3 cloud storage environment. (S3 stands for Simple Storage Service and is really the backbone of AWS for all file storage, archiving, and management.)

Without providing the actual command lines (which you can find easily enough in the usage guide), some of the basic functions include the ability to copy one instance of a file store that is local out to the cloud computing storage repository. The basic usage here is to use the sync command and to name the local file store and the target file store on AWS S3.

For AWS itself, you can view the contents of an AWS bucket (similar to a file folder) by using a command to view the contents of a bucket. Because this is a CLI, you can add the script variables such as which S3 instance you want to inspect. You can also start instances, describe instances, publish instances, and manage them. (An instance in AWS parlance is an object such as storage, memory, and networking which are all intended to help you run web applications for your company and manage the related resources.)

Posted in Uncategorised

What is Platform-as-a-Service? Everything you need to know

Modern software applications can be incredibly complex. There are code libraries to maintain, graphics to build, and regulations to think about. Once an application is up and running -- whether it’s for internal use at your company or a customer-facing mobile iPhone app or Android app -- the real work begins. Companies have to continually update the app to ward off security issues, improve features to meet customer demand (internal or external), and keep up with the digital transformation occurring in your given market. When companies are tasked with maintaining and upgrading the surrounding platform for the app -- the operating systems, servers, networks, and computers involved -- it becomes even more of a Herculean task.

That’s where Platform-as-a-Service (or PaaS) comes in. As one of several cloud computing models in use today (joining the older Software-as-a-Service (SaaS) model, the newer Infrastructure-as-a-Service (IaaS) model, and many others), the concept of PaaS helps companies focus on software development or providing other services to customers without having to manage, update, and maintain the actual platform that hosts the application.

To understand what Platform-as-a-Service is and how it benefits a company, it’s important to understand how the idea even developed in the first place. For starters, the original concept of Software-as-a-Service (SaaS) was essentially a proof of concept for many companies. It means an application -- such as web-based email, a business app, or even a word processor -- runs entirely in the cloud. Companies started using SaaS as opposed to on-premise, locally installed applications. In some ways, this gave birth to cloud computing service, because the first time many of us first used the cloud was when we checked our web-based email.

Platform-as-a-Service extends this model much further. Over the last decade and more, companies have not only relied on business apps run from the cloud, but they have also started relying on cloud computing to help run the platform for software. PaaS is best understood in the context of how companies used to run applications before the cloud, and also how specific industries benefit from Platform-as-a-Service today.

(By the way, Infrastructure-as-a-Service takes this one step further. It is more than the platform for an application but it the entire cloud computing infrastructure, including apps, storage, servers, networking, and everything required to run an IT department.)

History of PaaS

Building an application -- for internal use or for customers -- is often a two-pronged endeavor. Not to oversimply the development process, which is often quite complex, but it is true that every application involves both software and hardware. The application software includes the user interface, development framework, graphics libraries, databases, and many other entities that are all required for the user to run the application. However, there is always a hardware component as well -- the software has to be installed, managed, maintained, and updated on a hardware platform, whether that is in a local data center on servers and a local network or an external cloud computing platform that provides access for remote users.

It wasn’t until 2005 that PaaS became a viable option, mostly due to how cloud computing advanced, networking speeds increased, and devices became more readily available. IT service management tasked with building apps, and app development departments and companies quickly gravitated to PaaS because it alleviates most of the tedious chores related to the application platform. For example, companies don’t have to procure new hardware and storage resources, they don’t have to plan out the capacity needs, they completely outsource most of the hardware and technology-related patching and maintenance that’s needed.

In short, PaaS providers ushered in a new age where companies can focus on what they do best -- building the actual application, and not worrying about how it is hosted.

An example of how PaaS works

It might seem like the dark ages of technology now, but it used to be that every application developed for internal use or external had to be housed within a local data center. Before the advent of the cloud, this meant that hospitals and clinics, accounting firms, media companies, and every other type of industry had to become experts in IT in addition to experts in their chosen field. As an example, if a hospital wanted to develop an internal app to track patient records, the internal staff would have to develop the app (including the user interface, database, and every other aspect of the software) and also manage the servers and networks for the cloud hosting.

It became a challenge because the app development was complex enough -- if you are familiar with HIPAA (or the Health Insurance Portability and Accountability Act) regulations, these apps are becoming even more complex. Hospitals and clinics then had to maintain the platform as well, including all of the related security patches, storage, and networks.

Platform-as-a-Service removes that layer of complexity, providing more flexibility and relieving the hardware management duties so that a business can focus on what they do best.

Posted in Uncategorised

What is a data lake? Everything you need to know

When it comes to cloud computing, the terms we use are almost as important as the data we store and analyze. Companies that communicate about how cloud computing data is stored, retrieved, accessed and archived tend to maximize the use of that data. This leads to better products, higher revenue for the company, and more growth. More than anything, it leads to better communication between business units, the Information Technology department, and even the front office, sales, marketing, customers and business partners.

One of the terms that came into wide use over the last few years is a data lake. Before the rise of cloud computing, and even before the Internet was widely used as a means of transmitting data, cloud computing experts used the term data warehouse, but it wasn’t quite sufficient. A data warehouse, as the name implies because of how a “warehouse” is highly organized, consists of data that a company processes, analyzes, and reuses as part of its cloud storage management. For a retailer, a data warehouse might contain all of the product information, SKUs (stock keeping unit), and prices. A data warehouse is typically optimized for a fast, reliable access.

A data lake is not so highly organized. Cloud computing experts started using the term data lake to differentiate the storage of both structured and unstructured data compared to a data warehouse. With a data lake, there is no assumption about the data being optimized.

Yet, there are clear advantages. A data lake can contain a wide assortment of data, but companies can still run cloud analytics on the data, they can still operate a business dashboard, and they can still use the data in an app or for other processing duties. While it is a catch-all term that can consist of massive data stores and is highly scalable and useful for multiple purposes, a data lake also a generic way of describing unorganized and organized data.

Key components

In order to understand a data lake and how it helps companies access cloud computing information in a way that does not require optimization or re-structuring of the data, it’s also important to understand the key components. A data lake often involves machine learning, which is a way to understand and process data using automated methods.

In the case of a retailer who needs to access product information, machine learning can determine which SKUs are stored in a data lake and pull that data into an app. Information Technology service management personnel do not need to organize the data first.

Another key component is analytics. With most structured business data, it’s important to have a database whereby IT professionals can generate reports, run SQL queries, or make use of the data in a logical, predictable way. Think of the typical health-care company that needs to have structured data available to medical staff in order to run analytics and reporting -- it typically has to be in a centralized cloud database and optimized for use (e.g., stored in a data warehouse). However, companies can still run analytics on a data lake without having to first optimize the data, and that is one of the key advantages. In fact, as machine learning and data optimization improve, a data lake of structured and unstructured data becomes even more valuable.

One last component of a data lake: It is not always assumed that the data will be used in the cloud. While a data warehouse might be optimized for on-premise use or in the cloud, a data lake can involve moving data for on-premise use in an internal app (one that pulls data from your own servers) or can be used externally (using online cloud storage and computing data stores).

How the company benefits

One of the keys to understanding the term data lake is to think about how companies access data in the first place. It is not quite as “clean” as you would think. Sometimes, data arrives in a haphazard fashion (called unstructured data) and it’s dumped out to a repository; companies don’t always known the original source of the data. Sometimes, it’s stored in a relational database used for a business app, sometimes it’s a collection of social media data or something that feeds a mobile app used by external customers. The main point to make here is that a data lake provides increased flexibility over how a company can use the data.

So, while a data warehouse is more structured and optimized way of cloud hosting data, and meant for a specific purpose, a data lake is flexible enough for multiple purposes. There’s no need to first create a clear and obvious usage model for the data and to house it in a specific way in a database. It is always available, can be used for multiple purposes and disparate apps, and intended for on-premise processing on your own servers or access from the cloud. It’s ready for anything.

Posted in Uncategorised